diff mbox series

[V3] RISC-V: Support gather_load/scatter RVV auto-vectorization

Message ID 20230707010023.863732-1-juzhe.zhong@rivai.ai
State New
Headers show
Series [V3] RISC-V: Support gather_load/scatter RVV auto-vectorization | expand

Commit Message

juzhe.zhong@rivai.ai July 7, 2023, 1 a.m. UTC
This patch fully support gather_load/scatter_store:
1. Support single-rgroup on both RV32/RV64.
2. Support indexed element width can be same as or smaller than Pmode.
3. Support VLA SLP with gather/scatter.
4. Fully tested all gather/scatter with LMUL = M1/M2/M4/M8 both VLA and VLS.
5. Fix bug of handling (subreg:SI (const_poly_int:DI))
6. Fix bug on vec_perm which is used by gather/scatter SLP.

All kinds of GATHER/SCATTER are normalized into LEN_MASK_*.
We fully supported these 4 kinds of gather/scatter:
1. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with dummy length and dummy mask (Full vector).
2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with dummy length and real mask.
2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with real length and dummy mask.
2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with real length and real mask.

We use vluxei/vsuxei (un-ordered indexed loads/stores of RVV to code generate gather/scatter).

Also, we support strided loads/stores with vlse.v/vsse.v. Consider this following case:
#define TEST_LOOP(DATA_TYPE, BITS)                                             \
  void __attribute__ ((noinline, noclone))                                     \
  f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,   \
			  INDEX##BITS stride, INDEX##BITS n)                   \
  {                                                                            \
    for (INDEX##BITS i = 0; i < n; ++i)                                        \
      dest[i] += src[i * stride];                                              \
  }

Codegen:
f_int8_t_8:
	ble	a3,zero,.L10
	li	a5,1
	mv	a4,a0
	bne	a2,a5,.L4
	li	a2,1
.L6:
	vsetvli	a5,a3,e8,m2,ta,ma
	vle8.v	v2,0(a0)
	vlse8.v	v4,0(a1),a2
	vsetvli	a6,zero,e8,m2,ta,ma
	sub	a3,a3,a5
	vadd.vv	v2,v2,v4
	vsetvli	zero,a5,e8,m2,ta,ma
	vse8.v	v2,0(a4)
	add	a0,a0,a5
	add	a1,a1,a5
	add	a4,a4,a5
	bne	a3,zero,.L6
.L10:
	ret

We use vlse.v instead of vluxei.

This patch has been tested on both RV32 and RV64.

gcc/ChangeLog:

        * config/riscv/autovec.md (len_mask_gather_load<VNX1_QHSD:mode><VNX1_QHSDI:mode>): New pattern.
        (len_mask_gather_load<VNX2_QHSD:mode><VNX2_QHSDI:mode>): Ditto.
        (len_mask_gather_load<VNX4_QHSD:mode><VNX4_QHSDI:mode>): Ditto.
        (len_mask_gather_load<VNX8_QHSD:mode><VNX8_QHSDI:mode>): Ditto.
        (len_mask_gather_load<VNX16_QHSD:mode><VNX16_QHSDI:mode>): Ditto.
        (len_mask_gather_load<VNX32_QHS:mode><VNX32_QHSI:mode>): Ditto.
        (len_mask_gather_load<VNX64_QH:mode><VNX64_QHI:mode>): Ditto.
        (len_mask_gather_load<mode><mode>): Ditto.
        (len_mask_scatter_store<VNX1_QHSD:mode><VNX1_QHSDI:mode>): Ditto.
        (len_mask_scatter_store<VNX2_QHSD:mode><VNX2_QHSDI:mode>): Ditto.
        (len_mask_scatter_store<VNX4_QHSD:mode><VNX4_QHSDI:mode>): Ditto.
        (len_mask_scatter_store<VNX8_QHSD:mode><VNX8_QHSDI:mode>): Ditto.
        (len_mask_scatter_store<VNX16_QHSD:mode><VNX16_QHSDI:mode>): Ditto.
        (len_mask_scatter_store<VNX32_QHS:mode><VNX32_QHSI:mode>): Ditto.
        (len_mask_scatter_store<VNX64_QH:mode><VNX64_QHI:mode>): Ditto.
        (len_mask_scatter_store<mode><mode>): Ditto.
        * config/riscv/predicates.md (const_1_operand): New predicate.
        (vector_gs_offset_operand): Ditto.
        (vector_gs_scale_operand_16): Ditto.
        (vector_gs_scale_operand_32): Ditto.
        (vector_gs_scale_operand_64): Ditto.
        (vector_gs_extension_operand): Ditto.
        (vector_gs_scale_operand_16_rv32): Ditto.
        (vector_gs_scale_operand_32_rv32): Ditto.
        * config/riscv/riscv-protos.h (enum insn_type): Add gather/scatter.
        (expand_gather_scatter): New function.
        * config/riscv/riscv-v.cc (gen_const_vector_dup): Add gather/scatter.
        (emit_vlmax_masked_store_insn): New function.
        (emit_nonvlmax_masked_store_insn): Ditto.
        (modulo_sel_indices): Ditto.
        (expand_vec_perm): Fix SLP for gather/scatter.
        (prepare_gather_scatter): New function.
        (strided_load_store_p): Ditto.
        (expand_gather_scatter): Ditto.
        * config/riscv/riscv.cc (riscv_legitimize_move): Fix bug of (subreg:SI (DI CONST_POLY_INT)).
        * config/riscv/vector-iterators.md: Add gather/scatter.
        * config/riscv/vector.md (vec_duplicate<mode>): Use "@" instead.
        (@vec_duplicate<mode>): Ditto.
        (@pred_indexed_<order>store<VNX16_QHS:mode><VNX16_QHSDI:mode>): Fix name.
        (@pred_indexed_<order>store<VNX16_QHSD:mode><VNX16_QHSDI:mode>): Ditto.

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/rvv/rvv.exp: Add gather/scatter tests.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-1.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-10.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-11.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-12.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-2.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-3.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-4.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-5.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-6.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-7.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-8.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-9.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-1.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-10.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-11.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-2.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-3.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-4.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-5.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-6.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-7.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-8.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-9.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-1.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-10.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-11.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-2.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-3.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-4.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-5.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-6.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-7.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-8.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-9.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-1.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-10.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-2.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-3.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-4.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-5.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-6.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-7.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-8.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-9.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-1.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-10.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-2.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-3.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-4.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-5.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-6.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-7.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-8.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-9.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-1.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-10.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-2.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-3.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-4.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-5.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-6.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-7.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-8.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-9.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-1.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-10.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-2.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-3.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-4.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-5.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-6.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-7.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-8.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-9.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-1.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-10.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-2.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-3.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-4.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-5.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-6.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-7.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-8.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-9.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c: New test.

---
 gcc/config/riscv/autovec.md                   | 256 +++++++++++++
 gcc/config/riscv/predicates.md                |  39 +-
 gcc/config/riscv/riscv-protos.h               |   3 +
 gcc/config/riscv/riscv-v.cc                   | 357 +++++++++++++++++-
 gcc/config/riscv/riscv.cc                     |  25 ++
 gcc/config/riscv/vector-iterators.md          | 118 +++++-
 gcc/config/riscv/vector.md                    |  30 +-
 .../autovec/gather-scatter/gather_load-1.c    |  38 ++
 .../autovec/gather-scatter/gather_load-10.c   |  35 ++
 .../autovec/gather-scatter/gather_load-11.c   |  32 ++
 .../autovec/gather-scatter/gather_load-12.c   | 112 ++++++
 .../autovec/gather-scatter/gather_load-2.c    |  38 ++
 .../autovec/gather-scatter/gather_load-3.c    |  35 ++
 .../autovec/gather-scatter/gather_load-4.c    |  35 ++
 .../autovec/gather-scatter/gather_load-5.c    |  35 ++
 .../autovec/gather-scatter/gather_load-6.c    |  35 ++
 .../autovec/gather-scatter/gather_load-7.c    |  35 ++
 .../autovec/gather-scatter/gather_load-8.c    |  35 ++
 .../autovec/gather-scatter/gather_load-9.c    |  35 ++
 .../gather-scatter/gather_load_run-1.c        |  41 ++
 .../gather-scatter/gather_load_run-10.c       |  41 ++
 .../gather-scatter/gather_load_run-11.c       |  39 ++
 .../gather-scatter/gather_load_run-12.c       | 124 ++++++
 .../gather-scatter/gather_load_run-2.c        |  41 ++
 .../gather-scatter/gather_load_run-3.c        |  41 ++
 .../gather-scatter/gather_load_run-4.c        |  41 ++
 .../gather-scatter/gather_load_run-5.c        |  41 ++
 .../gather-scatter/gather_load_run-6.c        |  41 ++
 .../gather-scatter/gather_load_run-7.c        |  41 ++
 .../gather-scatter/gather_load_run-8.c        |  41 ++
 .../gather-scatter/gather_load_run-9.c        |  41 ++
 .../gather-scatter/mask_gather_load-1.c       |  39 ++
 .../gather-scatter/mask_gather_load-10.c      |  36 ++
 .../gather-scatter/mask_gather_load-11.c      | 116 ++++++
 .../gather-scatter/mask_gather_load-2.c       |  39 ++
 .../gather-scatter/mask_gather_load-3.c       |  36 ++
 .../gather-scatter/mask_gather_load-4.c       |  36 ++
 .../gather-scatter/mask_gather_load-5.c       |  36 ++
 .../gather-scatter/mask_gather_load-6.c       |  36 ++
 .../gather-scatter/mask_gather_load-7.c       |  36 ++
 .../gather-scatter/mask_gather_load-8.c       |  36 ++
 .../gather-scatter/mask_gather_load-9.c       |  36 ++
 .../gather-scatter/mask_gather_load_run-1.c   |  48 +++
 .../gather-scatter/mask_gather_load_run-10.c  |  48 +++
 .../gather-scatter/mask_gather_load_run-11.c  | 140 +++++++
 .../gather-scatter/mask_gather_load_run-2.c   |  48 +++
 .../gather-scatter/mask_gather_load_run-3.c   |  48 +++
 .../gather-scatter/mask_gather_load_run-4.c   |  48 +++
 .../gather-scatter/mask_gather_load_run-5.c   |  48 +++
 .../gather-scatter/mask_gather_load_run-6.c   |  48 +++
 .../gather-scatter/mask_gather_load_run-7.c   |  48 +++
 .../gather-scatter/mask_gather_load_run-8.c   |  48 +++
 .../gather-scatter/mask_gather_load_run-9.c   |  48 +++
 .../gather-scatter/mask_scatter_store-1.c     |  39 ++
 .../gather-scatter/mask_scatter_store-10.c    |  36 ++
 .../gather-scatter/mask_scatter_store-2.c     |  39 ++
 .../gather-scatter/mask_scatter_store-3.c     |  36 ++
 .../gather-scatter/mask_scatter_store-4.c     |  36 ++
 .../gather-scatter/mask_scatter_store-5.c     |  36 ++
 .../gather-scatter/mask_scatter_store-6.c     |  36 ++
 .../gather-scatter/mask_scatter_store-7.c     |  36 ++
 .../gather-scatter/mask_scatter_store-8.c     |  36 ++
 .../gather-scatter/mask_scatter_store-9.c     |  36 ++
 .../gather-scatter/mask_scatter_store_run-1.c |  48 +++
 .../mask_scatter_store_run-10.c               |  48 +++
 .../gather-scatter/mask_scatter_store_run-2.c |  48 +++
 .../gather-scatter/mask_scatter_store_run-3.c |  48 +++
 .../gather-scatter/mask_scatter_store_run-4.c |  48 +++
 .../gather-scatter/mask_scatter_store_run-5.c |  48 +++
 .../gather-scatter/mask_scatter_store_run-6.c |  48 +++
 .../gather-scatter/mask_scatter_store_run-7.c |  48 +++
 .../gather-scatter/mask_scatter_store_run-8.c |  48 +++
 .../gather-scatter/mask_scatter_store_run-9.c |  48 +++
 .../autovec/gather-scatter/scatter_store-1.c  |  38 ++
 .../autovec/gather-scatter/scatter_store-10.c |  35 ++
 .../autovec/gather-scatter/scatter_store-2.c  |  38 ++
 .../autovec/gather-scatter/scatter_store-3.c  |  35 ++
 .../autovec/gather-scatter/scatter_store-4.c  |  35 ++
 .../autovec/gather-scatter/scatter_store-5.c  |  35 ++
 .../autovec/gather-scatter/scatter_store-6.c  |  35 ++
 .../autovec/gather-scatter/scatter_store-7.c  |  35 ++
 .../autovec/gather-scatter/scatter_store-8.c  |  35 ++
 .../autovec/gather-scatter/scatter_store-9.c  |  35 ++
 .../gather-scatter/scatter_store_run-1.c      |  40 ++
 .../gather-scatter/scatter_store_run-10.c     |  40 ++
 .../gather-scatter/scatter_store_run-2.c      |  40 ++
 .../gather-scatter/scatter_store_run-3.c      |  40 ++
 .../gather-scatter/scatter_store_run-4.c      |  40 ++
 .../gather-scatter/scatter_store_run-5.c      |  40 ++
 .../gather-scatter/scatter_store_run-6.c      |  40 ++
 .../gather-scatter/scatter_store_run-7.c      |  40 ++
 .../gather-scatter/scatter_store_run-8.c      |  40 ++
 .../gather-scatter/scatter_store_run-9.c      |  40 ++
 .../autovec/gather-scatter/strided_load-1.c   |  46 +++
 .../autovec/gather-scatter/strided_load-2.c   |  46 +++
 .../gather-scatter/strided_load_run-1.c       |  84 +++++
 .../gather-scatter/strided_load_run-2.c       |  84 +++++
 .../autovec/gather-scatter/strided_store-1.c  |  46 +++
 .../autovec/gather-scatter/strided_store-2.c  |  46 +++
 .../gather-scatter/strided_store_run-1.c      |  82 ++++
 .../gather-scatter/strided_store_run-2.c      |  82 ++++
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp    |  23 ++
 102 files changed, 5097 insertions(+), 47 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-11.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-12.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-9.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-11.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-9.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-11.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-9.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-9.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-9.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-9.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-9.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-9.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c

Comments

Robin Dapp July 7, 2023, 9:43 a.m. UTC | #1
Hi Juzhe,

thanks, that's quite a chunk :) and it took me a while to
go through it.

> @@ -564,7 +565,14 @@ const_vec_all_in_range_p (rtx vec, poly_int64 minval, poly_int64 maxval)
>  static rtx
>  gen_const_vector_dup (machine_mode mode, poly_int64 val)
>  {
> -  rtx c = gen_int_mode (val, GET_MODE_INNER (mode));
> +  scalar_mode smode = GET_MODE_INNER (mode);
> +  rtx c = gen_int_mode (val, smode);
> +  if (!val.is_constant () && GET_MODE_SIZE (smode) > GET_MODE_SIZE (Pmode))
> +    {
> +      rtx dup = gen_reg_rtx (mode);
> +      emit_insn (gen_vec_duplicate (mode, dup, c));
> +      return dup;
> +    }
>    return gen_const_vec_duplicate (mode, c);
>  }

It's a bit weird that the function now also emits an insn.  It's not
similar to the aarch64 variant anymore then, I suppose.  If so, please
remove the comment.

> +
>  /* This function emits a masked instruction.  */
>  void
>  emit_vlmax_masked_mu_insn (unsigned icode, int op_num, rtx *ops)
> @@ -1162,7 +1203,6 @@ expand_const_vector (rtx target, rtx src)
>  	}
>        else
>  	{
> -	  elt = force_reg (elt_mode, elt);
>  	  rtx ops[] = {tmp, elt};
>  	  emit_vlmax_insn (code_for_pred_broadcast (mode), RVV_UNOP, ops);
>  	}
> @@ -2431,6 +2471,25 @@ expand_vec_cmp_float (rtx target, rtx_code code, rtx op0, rtx op1,
>    return false;
>  }

elt_mode is unused after your patch.  Please remove it or we will have
a bootstrap error.

> @@ -2444,42 +2503,47 @@ expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
>       index is in range of [0, nunits - 1]. A single vrgather instructions is
>       enough. Since we will use vrgatherei16.vv for variable-length vector,
>       it is never out of range and we don't need to modulo the index.  */
> -  if (!nunits.is_constant () || const_vec_all_in_range_p (sel, 0, nunits - 1))
> +  if (nunits.is_constant () && const_vec_all_in_range_p (sel, 0, nunits - 1))
>      {
>        emit_vlmax_gather_insn (target, op0, sel);
>        return;
>      }
>  
>    /* Check if the two values vectors are the same.  */
> -  if (rtx_equal_p (op0, op1) || const_vec_duplicate_p (sel))
> +  if (rtx_equal_p (op0, op1))
>      {
>        /* Note: vec_perm indices are supposed to wrap when they go beyond the
>  	 size of the two value vectors, i.e. the upper bits of the indices
>  	 are effectively ignored.  RVV vrgather instead produces 0 for any
>  	 out-of-range indices, so we need to modulo all the vec_perm indices
>  	 to ensure they are all in range of [0, nunits - 1].  */
> -      rtx max_sel = gen_const_vector_dup (sel_mode, nunits - 1);
> -      rtx sel_mod = expand_simple_binop (sel_mode, AND, sel, max_sel, NULL, 0,
> -					 OPTAB_DIRECT);
> -      emit_vlmax_gather_insn (target, op1, sel_mod);
> +      rtx sel_mod = modulo_sel_indices (sel, nunits - 1);
> +      emit_vlmax_gather_insn (target, op0, sel_mod);
>        return;
>      }

When reading it I considered unifying both cases and have modulo_sel_indices
just do nothing when the constant already satisfies the range requirement.
Would that work?

> -				     OPTAB_DIRECT);
> +      poly_uint64 value = rtx_to_poly_int64 (elt);
> +      rtx op = op0;
> +      if (maybe_gt (value, nunits - 1))
> +	{
> +	  sel = gen_const_vector_dup (sel_mode, value - nunits);
> +	  op = op1;
> +	}
> +      emit_vlmax_gather_insn (target, op, sel);
>      }

That's again a "modulo".   Could that also fit in modulo_sel_indices?
Your call in the end, it just feels like we do the same thing kind
of differently in several places but no strict preference here.

> +/* Return true if it is the strided load/store.  */
> +static bool
> +strided_load_store_p (rtx vec_offset, rtx *base, rtx *step)
> +{
> +  if (const_vec_series_p (vec_offset, base, step))
> +    return true;

The vectorizer will never emit this but we wouldn't want a step
of 1 here, right?

> +
> +  /* For strided load/store, vectorizer always generates
> +     VEC_SERIES_EXPR for vec_offset.  */
> +  tree expr = REG_EXPR (vec_offset);
> +  if (!expr || TREE_CODE (expr) != SSA_NAME)
> +    return false;
> +
> +  /* Check if it is GIMPLE like: _88 = VEC_SERIES_EXPR <0, _87>;  */
> +  gimple *def_stmt = SSA_NAME_DEF_STMT (expr);
> +  if (!def_stmt || !is_gimple_assign (def_stmt)
> +      || gimple_assign_rhs_code (def_stmt) != VEC_SERIES_EXPR)
> +    return false;

Interesting to query the gimple here.  As long as the
vectorizer doesn't do strided stores separately, I guess we can
live with that.

> +  rtx ptr, vec_offset, vec_reg, len, mask;
> +  bool zero_extend_p;
> +  int scale_log2;
> +  if (is_load)
> +    {
> +      vec_reg = ops[0];
> +      ptr = ops[1];
> +      vec_offset = ops[2];
> +      zero_extend_p = INTVAL (ops[3]);
> +      scale_log2 = exact_log2 (INTVAL (ops[4]));
> +      len = ops[5];
> +      mask = ops[7];
> +    }
> +  else
> +    {
> +      vec_reg = ops[4];
> +      ptr = ops[0];
> +      vec_offset = ops[1];
> +      zero_extend_p = INTVAL (ops[2]);
> +      scale_log2 = exact_log2 (INTVAL (ops[3]));
> +      len = ops[5];
> +      mask = ops[7];
> +    }

> +  bool is_dummp_len = poly_int_rtx_p (len, &value) && known_eq (value, nunits);

What's dummp?  dumb?  It looks like it's used for switching between
vlmax/nonvlmax so a different name might be advisable.

> +	      emit_insn (gen_extend_insn (extend_step, step, Pmode,
> +					  GET_MODE (step),
> +					  zero_extend_p ? true : false));
> +	      step = extend_step;
> +	    }
> +	}

See below.

> +      rtx mem = validize_mem (gen_rtx_MEM (vec_mode, ptr));
> +      /* Emit vlse.v if it's load. Otherwise, emit vsse.v.  */
> +      if (is_load)
> +	{
> +	  insn_code icode = code_for_pred_strided_load (vec_mode);
> +	  if (is_dummp_len)
> +	    {
> +	      rtx load_ops[]
> +		= {vec_reg, mask, RVV_VUNDEF (vec_mode), mem, step};
> +	      emit_vlmax_masked_insn (icode, RVV_GATHER_M_OP, load_ops);
> +	    }
> +	  else
> +	    {
> +	      rtx load_ops[]
> +		= {vec_reg, mask, RVV_VUNDEF (vec_mode), mem, step};
> +	      emit_nonvlmax_masked_insn (icode, RVV_GATHER_M_OP, load_ops, len);
> +	    }
> +	}

The ops are similar, better to define them outside of the if/else.
I would also rather have both in the same emit helper but that was
discussed before.  The following similar patterns all look a bit
"boilerplate-ish".  Tolerable for now I guess.

> +  if (inner_offsize < inner_vsize)
> +    {
> +      /* 7.2. Vector Load/Store Addressing Modes.
> +	 If the vector offset elements are narrower than XLEN, they are
> +	 zero-extended to XLEN before adding to the ptr effective address. If
> +	 the vector offset elements are wider than XLEN, the least-significant
> +	 XLEN bits are used in the address calculation. An implementation must
> +	 raise an illegal instruction exception if the EEW is not supported for
> +	 offset elements.  */
> +      if (!zero_extend_p || (zero_extend_p && scale_log2 != 0))
> +	{
> +	  if (zero_extend_p)
> +	    inner_idx_mode
> +	      = int_mode_for_size (inner_offsize * 2, 0).require ();
> +	  else
> +	    inner_idx_mode = int_mode_for_size (BITS_PER_WORD, 0).require ();
> +	  machine_mode new_idx_mode
> +	    = get_vector_mode (inner_idx_mode, nunits).require ();
> +	  rtx tmp = gen_reg_rtx (new_idx_mode);
> +	  emit_insn (gen_extend_insn (tmp, vec_offset, new_idx_mode, idx_mode,
> +				      zero_extend_p ? true : false));
> +	  vec_offset = tmp;
> +	  idx_mode = new_idx_mode;
> +	}
> +    }

This one I don't get.  Why do we still need to zero_extend when the
hardware does it for us?  Shouldn't we only sign extend when the
expander says so?  Actually we should even scan-assembler-not for
(v)zext?  Additionally maybe also scan-assembler (v)sext for the
respective cases.
Besides, couldn't we do a widening shift when combining it with
scale_log != 0?

> +  if (SUBREG_P (src) && CONST_POLY_INT_P (SUBREG_REG (src)))
> +    {
> +      /* Since VLENB result feteched by csrr is always within Pmode size,
> +	 we always set the upper bits of the REG with larger mode than Pmode
> +	 as 0.  The code sequence is transformed as follows:
> +
> +	     - (set (reg:SI 136)
> +		    (subreg:SI (const_poly_int:DI [16,16]) 0))
> +	     -> (set (reg:SI 136) (const_poly_int:SI [16,16]))
> +
> +	     - (set (reg:SI 136)
> +		    (subreg:SI (const_poly_int:DI [16,16]) 4))
> +	     -> (set (reg:SI 136) (const_int 0))  */
> +      rtx poly_rtx = SUBREG_REG (src);
> +      poly_int64 value = rtx_to_poly_int64 (poly_rtx);
> +      /* The only possible case is handling CONST_POLY_INT:DI in RV32 system. */
> +      gcc_assert (!TARGET_64BIT && mode == SImode
> +		  && GET_MODE (poly_rtx) == DImode && known_ge (value, 0U));
> +      if (known_eq (SUBREG_BYTE (src), GET_MODE_SIZE (Pmode)))
> +	emit_move_insn (dest, const0_rtx);
> +      else
> +	emit_move_insn (dest, gen_int_mode (value, Pmode));
> +      return true;
> +    }

As discussed separately, I'd rather go with

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index d40d430db1b..6735c7763b5 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2061,7 +2061,14 @@ riscv_legitimize_poly_move (machine_mode mode, rtx dest, rtx tmp, rtx src)
      (m, n) = base * magn + constant.
      This calculation doesn't need div operation.  */
 
-  emit_move_insn (tmp, gen_int_mode (BYTES_PER_RISCV_VECTOR, mode));
+    if (mode <= Pmode)
+      emit_move_insn (tmp, gen_int_mode (BYTES_PER_RISCV_VECTOR, mode));
+    else
+      {
+       emit_move_insn (gen_highpart (Pmode, tmp), CONST0_RTX (Pmode));
+       emit_move_insn (gen_lowpart (Pmode, tmp),
+                       gen_int_mode (BYTES_PER_RISCV_VECTOR, Pmode));
+      }
 
   if (BYTES_PER_RISCV_VECTOR.is_constant ())
     {
@@ -2168,7 +2175,7 @@ riscv_legitimize_move (machine_mode mode, rtx dest, rtx src)
          return false;
        }
 
-      if (satisfies_constraint_vp (src))
+      if (satisfies_constraint_vp (src) && GET_MODE (src) == Pmode)
        return false;

instead.  This avoid the (surprising) weird subreg being generated
at all and we don't need to ensure, probably redundantly, that the
const_poly value is in range etc.

> +           && (immediate_operand (operands[3], Pmode)
> +	       || (CONST_POLY_INT_P (operands[3])uldn't we do a widening shift when combined with
scale_log != 0?
> +	           && known_ge (rtx_to_poly_int64 (operands[3]), 0U)
> +		   && known_le (rtx_to_poly_int64 (operands[3]), GET_MODE_SIZE (<MODE>mode)))))
> +    {
> +      rtx tmp = gen_reg_rtx (Pmode);
> +      poly_int64 value = rtx_to_poly_int64 (operands[3]);
> +      emit_move_insn (tmp, gen_int_mode (value, Pmode));
> +      operands[3] = gen_rtx_SIGN_EXTEND (<VEL>mode, tmp);
> +    }

Maybe add a single-line comment as for the other existing cases.

> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c
> @@ -0,0 +1,82 @@
> +/* { dg-do run { target { riscv_vector } } } */
As discussed separately,  I would prefer separate run tests for
zvfh(min) but I guess we can live with them just not being vectorized
for now.

Regards
 Robin
juzhe.zhong@rivai.ai July 7, 2023, 10:07 a.m. UTC | #2
>> It's a bit weird that the function now also emits an insn.  It's not
>> similar to the aarch64 variant anymore then, I suppose.  If so, please
>> remove the comment.
Ok.

>> elt_mode is unused after your patch.  Please remove it or we will have
>> a bootstrap error.
Ok

>>When reading it I considered unifying both cases and have modulo_sel_indices
>>just do nothing when the constant already satisfies the range requirement.
>>Would that work?
I tried but it turns out to cause execution faile.


>>The vectorizer will never emit this but we wouldn't want a step
>>of 1 here, right?
No, you can take a look at strided_load-2.c

>>What's dummp?  dumb?  It looks like it's used for switching between
>>vlmax/nonvlmax so a different name might be advisable.
Sorry for typo, it should be dummy_len means length= vf that is the trick I play on middle-end meaning such vector operations do not affect by length.


>>The ops are similar, better to define them outside of the if/else.
>>I would also rather have both in the same emit helper but that was
>>discussed before.  The following similar patterns all look a bit
>>"boilerplate-ish".  Tolerable for now I guess.
Ok, will have a try.


>>This one I don't get.  Why do we still need to zero_extend when the
>>hardware does it for us?  Shouldn't we only sign extend when the
>>expander says so?  Actually we should even scan-assembler-not for
>>(v)zext?  Additionally maybe also scan-assembler (v)sext for the
>>respective cases.

For zero_extend with scale != 1, we need to first zero_extend then multiple the scale.

>>Besides, couldn't we do a widening shift when combining it with
>>scale_log != 0?
RVV has widening shift? I didn't know that. Current natural approach is first extend offset then shift.

>>instead.  This avoid the (surprising) weird subreg being generated
>>at all and we don't need to ensure, probably redundantly, that the
>>const_poly value is in range etc.
Ok.

>>Maybe add a single-line comment as for the other existing cases.
oK.


juzhe.zhong@rivai.ai
 
From: Robin Dapp
Date: 2023-07-07 17:43
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw
Subject: Re: [PATCH V3] RISC-V: Support gather_load/scatter RVV auto-vectorization
Hi Juzhe,
 
thanks, that's quite a chunk :) and it took me a while to
go through it.
 
> @@ -564,7 +565,14 @@ const_vec_all_in_range_p (rtx vec, poly_int64 minval, poly_int64 maxval)
>  static rtx
>  gen_const_vector_dup (machine_mode mode, poly_int64 val)
>  {
> -  rtx c = gen_int_mode (val, GET_MODE_INNER (mode));
> +  scalar_mode smode = GET_MODE_INNER (mode);
> +  rtx c = gen_int_mode (val, smode);
> +  if (!val.is_constant () && GET_MODE_SIZE (smode) > GET_MODE_SIZE (Pmode))
> +    {
> +      rtx dup = gen_reg_rtx (mode);
> +      emit_insn (gen_vec_duplicate (mode, dup, c));
> +      return dup;
> +    }
>    return gen_const_vec_duplicate (mode, c);
>  }
 
It's a bit weird that the function now also emits an insn.  It's not
similar to the aarch64 variant anymore then, I suppose.  If so, please
remove the comment.
 
> +
>  /* This function emits a masked instruction.  */
>  void
>  emit_vlmax_masked_mu_insn (unsigned icode, int op_num, rtx *ops)
> @@ -1162,7 +1203,6 @@ expand_const_vector (rtx target, rtx src)
>  }
>        else
>  {
> -   elt = force_reg (elt_mode, elt);
>    rtx ops[] = {tmp, elt};
>    emit_vlmax_insn (code_for_pred_broadcast (mode), RVV_UNOP, ops);
>  }
> @@ -2431,6 +2471,25 @@ expand_vec_cmp_float (rtx target, rtx_code code, rtx op0, rtx op1,
>    return false;
>  }
 
elt_mode is unused after your patch.  Please remove it or we will have
a bootstrap error.
 
> @@ -2444,42 +2503,47 @@ expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
>       index is in range of [0, nunits - 1]. A single vrgather instructions is
>       enough. Since we will use vrgatherei16.vv for variable-length vector,
>       it is never out of range and we don't need to modulo the index.  */
> -  if (!nunits.is_constant () || const_vec_all_in_range_p (sel, 0, nunits - 1))
> +  if (nunits.is_constant () && const_vec_all_in_range_p (sel, 0, nunits - 1))
>      {
>        emit_vlmax_gather_insn (target, op0, sel);
>        return;
>      }
>  
>    /* Check if the two values vectors are the same.  */
> -  if (rtx_equal_p (op0, op1) || const_vec_duplicate_p (sel))
> +  if (rtx_equal_p (op0, op1))
>      {
>        /* Note: vec_perm indices are supposed to wrap when they go beyond the
>  size of the two value vectors, i.e. the upper bits of the indices
>  are effectively ignored.  RVV vrgather instead produces 0 for any
>  out-of-range indices, so we need to modulo all the vec_perm indices
>  to ensure they are all in range of [0, nunits - 1].  */
> -      rtx max_sel = gen_const_vector_dup (sel_mode, nunits - 1);
> -      rtx sel_mod = expand_simple_binop (sel_mode, AND, sel, max_sel, NULL, 0,
> - OPTAB_DIRECT);
> -      emit_vlmax_gather_insn (target, op1, sel_mod);
> +      rtx sel_mod = modulo_sel_indices (sel, nunits - 1);
> +      emit_vlmax_gather_insn (target, op0, sel_mod);
>        return;
>      }
 
When reading it I considered unifying both cases and have modulo_sel_indices
just do nothing when the constant already satisfies the range requirement.
Would that work?
 
> -      OPTAB_DIRECT);
> +      poly_uint64 value = rtx_to_poly_int64 (elt);
> +      rtx op = op0;
> +      if (maybe_gt (value, nunits - 1))
> + {
> +   sel = gen_const_vector_dup (sel_mode, value - nunits);
> +   op = op1;
> + }
> +      emit_vlmax_gather_insn (target, op, sel);
>      }
 
That's again a "modulo".   Could that also fit in modulo_sel_indices?
Your call in the end, it just feels like we do the same thing kind
of differently in several places but no strict preference here.
 
> +/* Return true if it is the strided load/store.  */
> +static bool
> +strided_load_store_p (rtx vec_offset, rtx *base, rtx *step)
> +{
> +  if (const_vec_series_p (vec_offset, base, step))
> +    return true;
 
The vectorizer will never emit this but we wouldn't want a step
of 1 here, right?
 
> +
> +  /* For strided load/store, vectorizer always generates
> +     VEC_SERIES_EXPR for vec_offset.  */
> +  tree expr = REG_EXPR (vec_offset);
> +  if (!expr || TREE_CODE (expr) != SSA_NAME)
> +    return false;
> +
> +  /* Check if it is GIMPLE like: _88 = VEC_SERIES_EXPR <0, _87>;  */
> +  gimple *def_stmt = SSA_NAME_DEF_STMT (expr);
> +  if (!def_stmt || !is_gimple_assign (def_stmt)
> +      || gimple_assign_rhs_code (def_stmt) != VEC_SERIES_EXPR)
> +    return false;
 
Interesting to query the gimple here.  As long as the
vectorizer doesn't do strided stores separately, I guess we can
live with that.
 
> +  rtx ptr, vec_offset, vec_reg, len, mask;
> +  bool zero_extend_p;
> +  int scale_log2;
> +  if (is_load)
> +    {
> +      vec_reg = ops[0];
> +      ptr = ops[1];
> +      vec_offset = ops[2];
> +      zero_extend_p = INTVAL (ops[3]);
> +      scale_log2 = exact_log2 (INTVAL (ops[4]));
> +      len = ops[5];
> +      mask = ops[7];
> +    }
> +  else
> +    {
> +      vec_reg = ops[4];
> +      ptr = ops[0];
> +      vec_offset = ops[1];
> +      zero_extend_p = INTVAL (ops[2]);
> +      scale_log2 = exact_log2 (INTVAL (ops[3]));
> +      len = ops[5];
> +      mask = ops[7];
> +    }
 
> +  bool is_dummp_len = poly_int_rtx_p (len, &value) && known_eq (value, nunits);
 
What's dummp?  dumb?  It looks like it's used for switching between
vlmax/nonvlmax so a different name might be advisable.
 
> +       emit_insn (gen_extend_insn (extend_step, step, Pmode,
> +   GET_MODE (step),
> +   zero_extend_p ? true : false));
> +       step = extend_step;
> +     }
> + }
 
See below.
 
> +      rtx mem = validize_mem (gen_rtx_MEM (vec_mode, ptr));
> +      /* Emit vlse.v if it's load. Otherwise, emit vsse.v.  */
> +      if (is_load)
> + {
> +   insn_code icode = code_for_pred_strided_load (vec_mode);
> +   if (is_dummp_len)
> +     {
> +       rtx load_ops[]
> + = {vec_reg, mask, RVV_VUNDEF (vec_mode), mem, step};
> +       emit_vlmax_masked_insn (icode, RVV_GATHER_M_OP, load_ops);
> +     }
> +   else
> +     {
> +       rtx load_ops[]
> + = {vec_reg, mask, RVV_VUNDEF (vec_mode), mem, step};
> +       emit_nonvlmax_masked_insn (icode, RVV_GATHER_M_OP, load_ops, len);
> +     }
> + }
 
The ops are similar, better to define them outside of the if/else.
I would also rather have both in the same emit helper but that was
discussed before.  The following similar patterns all look a bit
"boilerplate-ish".  Tolerable for now I guess.
 
> +  if (inner_offsize < inner_vsize)
> +    {
> +      /* 7.2. Vector Load/Store Addressing Modes.
> + If the vector offset elements are narrower than XLEN, they are
> + zero-extended to XLEN before adding to the ptr effective address. If
> + the vector offset elements are wider than XLEN, the least-significant
> + XLEN bits are used in the address calculation. An implementation must
> + raise an illegal instruction exception if the EEW is not supported for
> + offset elements.  */
> +      if (!zero_extend_p || (zero_extend_p && scale_log2 != 0))
> + {
> +   if (zero_extend_p)
> +     inner_idx_mode
> +       = int_mode_for_size (inner_offsize * 2, 0).require ();
> +   else
> +     inner_idx_mode = int_mode_for_size (BITS_PER_WORD, 0).require ();
> +   machine_mode new_idx_mode
> +     = get_vector_mode (inner_idx_mode, nunits).require ();
> +   rtx tmp = gen_reg_rtx (new_idx_mode);
> +   emit_insn (gen_extend_insn (tmp, vec_offset, new_idx_mode, idx_mode,
> +       zero_extend_p ? true : false));
> +   vec_offset = tmp;
> +   idx_mode = new_idx_mode;
> + }
> +    }
 
This one I don't get.  Why do we still need to zero_extend when the
hardware does it for us?  Shouldn't we only sign extend when the
expander says so?  Actually we should even scan-assembler-not for
(v)zext?  Additionally maybe also scan-assembler (v)sext for the
respective cases.
Besides, couldn't we do a widening shift when combining it with
scale_log != 0?
 
> +  if (SUBREG_P (src) && CONST_POLY_INT_P (SUBREG_REG (src)))
> +    {
> +      /* Since VLENB result feteched by csrr is always within Pmode size,
> + we always set the upper bits of the REG with larger mode than Pmode
> + as 0.  The code sequence is transformed as follows:
> +
> +      - (set (reg:SI 136)
> +     (subreg:SI (const_poly_int:DI [16,16]) 0))
> +      -> (set (reg:SI 136) (const_poly_int:SI [16,16]))
> +
> +      - (set (reg:SI 136)
> +     (subreg:SI (const_poly_int:DI [16,16]) 4))
> +      -> (set (reg:SI 136) (const_int 0))  */
> +      rtx poly_rtx = SUBREG_REG (src);
> +      poly_int64 value = rtx_to_poly_int64 (poly_rtx);
> +      /* The only possible case is handling CONST_POLY_INT:DI in RV32 system. */
> +      gcc_assert (!TARGET_64BIT && mode == SImode
> +   && GET_MODE (poly_rtx) == DImode && known_ge (value, 0U));
> +      if (known_eq (SUBREG_BYTE (src), GET_MODE_SIZE (Pmode)))
> + emit_move_insn (dest, const0_rtx);
> +      else
> + emit_move_insn (dest, gen_int_mode (value, Pmode));
> +      return true;
> +    }
 
As discussed separately, I'd rather go with
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index d40d430db1b..6735c7763b5 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2061,7 +2061,14 @@ riscv_legitimize_poly_move (machine_mode mode, rtx dest, rtx tmp, rtx src)
      (m, n) = base * magn + constant.
      This calculation doesn't need div operation.  */
-  emit_move_insn (tmp, gen_int_mode (BYTES_PER_RISCV_VECTOR, mode));
+    if (mode <= Pmode)
+      emit_move_insn (tmp, gen_int_mode (BYTES_PER_RISCV_VECTOR, mode));
+    else
+      {
+       emit_move_insn (gen_highpart (Pmode, tmp), CONST0_RTX (Pmode));
+       emit_move_insn (gen_lowpart (Pmode, tmp),
+                       gen_int_mode (BYTES_PER_RISCV_VECTOR, Pmode));
+      }
   if (BYTES_PER_RISCV_VECTOR.is_constant ())
     {
@@ -2168,7 +2175,7 @@ riscv_legitimize_move (machine_mode mode, rtx dest, rtx src)
          return false;
        }
-      if (satisfies_constraint_vp (src))
+      if (satisfies_constraint_vp (src) && GET_MODE (src) == Pmode)
        return false;
 
instead.  This avoid the (surprising) weird subreg being generated
at all and we don't need to ensure, probably redundantly, that the
const_poly value is in range etc.
 
> +           && (immediate_operand (operands[3], Pmode)
> +        || (CONST_POLY_INT_P (operands[3])uldn't we do a widening shift when combined with
scale_log != 0?
> +            && known_ge (rtx_to_poly_int64 (operands[3]), 0U)
> +    && known_le (rtx_to_poly_int64 (operands[3]), GET_MODE_SIZE (<MODE>mode)))))
> +    {
> +      rtx tmp = gen_reg_rtx (Pmode);
> +      poly_int64 value = rtx_to_poly_int64 (operands[3]);
> +      emit_move_insn (tmp, gen_int_mode (value, Pmode));
> +      operands[3] = gen_rtx_SIGN_EXTEND (<VEL>mode, tmp);
> +    }
 
Maybe add a single-line comment as for the other existing cases.
 
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c
> @@ -0,0 +1,82 @@
> +/* { dg-do run { target { riscv_vector } } } */
As discussed separately,  I would prefer separate run tests for
zvfh(min) but I guess we can live with them just not being vectorized
for now.
 
Regards
Robin
juzhe.zhong@rivai.ai July 7, 2023, 10:51 a.m. UTC | #3
Hi, Robin.

I have fixed all issues for you with V4 patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623856.html 

1. Apply your approach on poly int
2. Fix comments.
3. Normalize "modulo" codes and make it no redundancy.
...

Could you take a look at it?

Thanks.


juzhe.zhong@rivai.ai
 
From: Robin Dapp
Date: 2023-07-07 17:43
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw
Subject: Re: [PATCH V3] RISC-V: Support gather_load/scatter RVV auto-vectorization
Hi Juzhe,
 
thanks, that's quite a chunk :) and it took me a while to
go through it.
 
> @@ -564,7 +565,14 @@ const_vec_all_in_range_p (rtx vec, poly_int64 minval, poly_int64 maxval)
>  static rtx
>  gen_const_vector_dup (machine_mode mode, poly_int64 val)
>  {
> -  rtx c = gen_int_mode (val, GET_MODE_INNER (mode));
> +  scalar_mode smode = GET_MODE_INNER (mode);
> +  rtx c = gen_int_mode (val, smode);
> +  if (!val.is_constant () && GET_MODE_SIZE (smode) > GET_MODE_SIZE (Pmode))
> +    {
> +      rtx dup = gen_reg_rtx (mode);
> +      emit_insn (gen_vec_duplicate (mode, dup, c));
> +      return dup;
> +    }
>    return gen_const_vec_duplicate (mode, c);
>  }
 
It's a bit weird that the function now also emits an insn.  It's not
similar to the aarch64 variant anymore then, I suppose.  If so, please
remove the comment.
 
> +
>  /* This function emits a masked instruction.  */
>  void
>  emit_vlmax_masked_mu_insn (unsigned icode, int op_num, rtx *ops)
> @@ -1162,7 +1203,6 @@ expand_const_vector (rtx target, rtx src)
>  }
>        else
>  {
> -   elt = force_reg (elt_mode, elt);
>    rtx ops[] = {tmp, elt};
>    emit_vlmax_insn (code_for_pred_broadcast (mode), RVV_UNOP, ops);
>  }
> @@ -2431,6 +2471,25 @@ expand_vec_cmp_float (rtx target, rtx_code code, rtx op0, rtx op1,
>    return false;
>  }
 
elt_mode is unused after your patch.  Please remove it or we will have
a bootstrap error.
 
> @@ -2444,42 +2503,47 @@ expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
>       index is in range of [0, nunits - 1]. A single vrgather instructions is
>       enough. Since we will use vrgatherei16.vv for variable-length vector,
>       it is never out of range and we don't need to modulo the index.  */
> -  if (!nunits.is_constant () || const_vec_all_in_range_p (sel, 0, nunits - 1))
> +  if (nunits.is_constant () && const_vec_all_in_range_p (sel, 0, nunits - 1))
>      {
>        emit_vlmax_gather_insn (target, op0, sel);
>        return;
>      }
>  
>    /* Check if the two values vectors are the same.  */
> -  if (rtx_equal_p (op0, op1) || const_vec_duplicate_p (sel))
> +  if (rtx_equal_p (op0, op1))
>      {
>        /* Note: vec_perm indices are supposed to wrap when they go beyond the
>  size of the two value vectors, i.e. the upper bits of the indices
>  are effectively ignored.  RVV vrgather instead produces 0 for any
>  out-of-range indices, so we need to modulo all the vec_perm indices
>  to ensure they are all in range of [0, nunits - 1].  */
> -      rtx max_sel = gen_const_vector_dup (sel_mode, nunits - 1);
> -      rtx sel_mod = expand_simple_binop (sel_mode, AND, sel, max_sel, NULL, 0,
> - OPTAB_DIRECT);
> -      emit_vlmax_gather_insn (target, op1, sel_mod);
> +      rtx sel_mod = modulo_sel_indices (sel, nunits - 1);
> +      emit_vlmax_gather_insn (target, op0, sel_mod);
>        return;
>      }
 
When reading it I considered unifying both cases and have modulo_sel_indices
just do nothing when the constant already satisfies the range requirement.
Would that work?
 
> -      OPTAB_DIRECT);
> +      poly_uint64 value = rtx_to_poly_int64 (elt);
> +      rtx op = op0;
> +      if (maybe_gt (value, nunits - 1))
> + {
> +   sel = gen_const_vector_dup (sel_mode, value - nunits);
> +   op = op1;
> + }
> +      emit_vlmax_gather_insn (target, op, sel);
>      }
 
That's again a "modulo".   Could that also fit in modulo_sel_indices?
Your call in the end, it just feels like we do the same thing kind
of differently in several places but no strict preference here.
 
> +/* Return true if it is the strided load/store.  */
> +static bool
> +strided_load_store_p (rtx vec_offset, rtx *base, rtx *step)
> +{
> +  if (const_vec_series_p (vec_offset, base, step))
> +    return true;
 
The vectorizer will never emit this but we wouldn't want a step
of 1 here, right?
 
> +
> +  /* For strided load/store, vectorizer always generates
> +     VEC_SERIES_EXPR for vec_offset.  */
> +  tree expr = REG_EXPR (vec_offset);
> +  if (!expr || TREE_CODE (expr) != SSA_NAME)
> +    return false;
> +
> +  /* Check if it is GIMPLE like: _88 = VEC_SERIES_EXPR <0, _87>;  */
> +  gimple *def_stmt = SSA_NAME_DEF_STMT (expr);
> +  if (!def_stmt || !is_gimple_assign (def_stmt)
> +      || gimple_assign_rhs_code (def_stmt) != VEC_SERIES_EXPR)
> +    return false;
 
Interesting to query the gimple here.  As long as the
vectorizer doesn't do strided stores separately, I guess we can
live with that.
 
> +  rtx ptr, vec_offset, vec_reg, len, mask;
> +  bool zero_extend_p;
> +  int scale_log2;
> +  if (is_load)
> +    {
> +      vec_reg = ops[0];
> +      ptr = ops[1];
> +      vec_offset = ops[2];
> +      zero_extend_p = INTVAL (ops[3]);
> +      scale_log2 = exact_log2 (INTVAL (ops[4]));
> +      len = ops[5];
> +      mask = ops[7];
> +    }
> +  else
> +    {
> +      vec_reg = ops[4];
> +      ptr = ops[0];
> +      vec_offset = ops[1];
> +      zero_extend_p = INTVAL (ops[2]);
> +      scale_log2 = exact_log2 (INTVAL (ops[3]));
> +      len = ops[5];
> +      mask = ops[7];
> +    }
 
> +  bool is_dummp_len = poly_int_rtx_p (len, &value) && known_eq (value, nunits);
 
What's dummp?  dumb?  It looks like it's used for switching between
vlmax/nonvlmax so a different name might be advisable.
 
> +       emit_insn (gen_extend_insn (extend_step, step, Pmode,
> +   GET_MODE (step),
> +   zero_extend_p ? true : false));
> +       step = extend_step;
> +     }
> + }
 
See below.
 
> +      rtx mem = validize_mem (gen_rtx_MEM (vec_mode, ptr));
> +      /* Emit vlse.v if it's load. Otherwise, emit vsse.v.  */
> +      if (is_load)
> + {
> +   insn_code icode = code_for_pred_strided_load (vec_mode);
> +   if (is_dummp_len)
> +     {
> +       rtx load_ops[]
> + = {vec_reg, mask, RVV_VUNDEF (vec_mode), mem, step};
> +       emit_vlmax_masked_insn (icode, RVV_GATHER_M_OP, load_ops);
> +     }
> +   else
> +     {
> +       rtx load_ops[]
> + = {vec_reg, mask, RVV_VUNDEF (vec_mode), mem, step};
> +       emit_nonvlmax_masked_insn (icode, RVV_GATHER_M_OP, load_ops, len);
> +     }
> + }
 
The ops are similar, better to define them outside of the if/else.
I would also rather have both in the same emit helper but that was
discussed before.  The following similar patterns all look a bit
"boilerplate-ish".  Tolerable for now I guess.
 
> +  if (inner_offsize < inner_vsize)
> +    {
> +      /* 7.2. Vector Load/Store Addressing Modes.
> + If the vector offset elements are narrower than XLEN, they are
> + zero-extended to XLEN before adding to the ptr effective address. If
> + the vector offset elements are wider than XLEN, the least-significant
> + XLEN bits are used in the address calculation. An implementation must
> + raise an illegal instruction exception if the EEW is not supported for
> + offset elements.  */
> +      if (!zero_extend_p || (zero_extend_p && scale_log2 != 0))
> + {
> +   if (zero_extend_p)
> +     inner_idx_mode
> +       = int_mode_for_size (inner_offsize * 2, 0).require ();
> +   else
> +     inner_idx_mode = int_mode_for_size (BITS_PER_WORD, 0).require ();
> +   machine_mode new_idx_mode
> +     = get_vector_mode (inner_idx_mode, nunits).require ();
> +   rtx tmp = gen_reg_rtx (new_idx_mode);
> +   emit_insn (gen_extend_insn (tmp, vec_offset, new_idx_mode, idx_mode,
> +       zero_extend_p ? true : false));
> +   vec_offset = tmp;
> +   idx_mode = new_idx_mode;
> + }
> +    }
 
This one I don't get.  Why do we still need to zero_extend when the
hardware does it for us?  Shouldn't we only sign extend when the
expander says so?  Actually we should even scan-assembler-not for
(v)zext?  Additionally maybe also scan-assembler (v)sext for the
respective cases.
Besides, couldn't we do a widening shift when combining it with
scale_log != 0?
 
> +  if (SUBREG_P (src) && CONST_POLY_INT_P (SUBREG_REG (src)))
> +    {
> +      /* Since VLENB result feteched by csrr is always within Pmode size,
> + we always set the upper bits of the REG with larger mode than Pmode
> + as 0.  The code sequence is transformed as follows:
> +
> +      - (set (reg:SI 136)
> +     (subreg:SI (const_poly_int:DI [16,16]) 0))
> +      -> (set (reg:SI 136) (const_poly_int:SI [16,16]))
> +
> +      - (set (reg:SI 136)
> +     (subreg:SI (const_poly_int:DI [16,16]) 4))
> +      -> (set (reg:SI 136) (const_int 0))  */
> +      rtx poly_rtx = SUBREG_REG (src);
> +      poly_int64 value = rtx_to_poly_int64 (poly_rtx);
> +      /* The only possible case is handling CONST_POLY_INT:DI in RV32 system. */
> +      gcc_assert (!TARGET_64BIT && mode == SImode
> +   && GET_MODE (poly_rtx) == DImode && known_ge (value, 0U));
> +      if (known_eq (SUBREG_BYTE (src), GET_MODE_SIZE (Pmode)))
> + emit_move_insn (dest, const0_rtx);
> +      else
> + emit_move_insn (dest, gen_int_mode (value, Pmode));
> +      return true;
> +    }
 
As discussed separately, I'd rather go with
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index d40d430db1b..6735c7763b5 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2061,7 +2061,14 @@ riscv_legitimize_poly_move (machine_mode mode, rtx dest, rtx tmp, rtx src)
      (m, n) = base * magn + constant.
      This calculation doesn't need div operation.  */
-  emit_move_insn (tmp, gen_int_mode (BYTES_PER_RISCV_VECTOR, mode));
+    if (mode <= Pmode)
+      emit_move_insn (tmp, gen_int_mode (BYTES_PER_RISCV_VECTOR, mode));
+    else
+      {
+       emit_move_insn (gen_highpart (Pmode, tmp), CONST0_RTX (Pmode));
+       emit_move_insn (gen_lowpart (Pmode, tmp),
+                       gen_int_mode (BYTES_PER_RISCV_VECTOR, Pmode));
+      }
   if (BYTES_PER_RISCV_VECTOR.is_constant ())
     {
@@ -2168,7 +2175,7 @@ riscv_legitimize_move (machine_mode mode, rtx dest, rtx src)
          return false;
        }
-      if (satisfies_constraint_vp (src))
+      if (satisfies_constraint_vp (src) && GET_MODE (src) == Pmode)
        return false;
 
instead.  This avoid the (surprising) weird subreg being generated
at all and we don't need to ensure, probably redundantly, that the
const_poly value is in range etc.
 
> +           && (immediate_operand (operands[3], Pmode)
> +        || (CONST_POLY_INT_P (operands[3])uldn't we do a widening shift when combined with
scale_log != 0?
> +            && known_ge (rtx_to_poly_int64 (operands[3]), 0U)
> +    && known_le (rtx_to_poly_int64 (operands[3]), GET_MODE_SIZE (<MODE>mode)))))
> +    {
> +      rtx tmp = gen_reg_rtx (Pmode);
> +      poly_int64 value = rtx_to_poly_int64 (operands[3]);
> +      emit_move_insn (tmp, gen_int_mode (value, Pmode));
> +      operands[3] = gen_rtx_SIGN_EXTEND (<VEL>mode, tmp);
> +    }
 
Maybe add a single-line comment as for the other existing cases.
 
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c
> @@ -0,0 +1,82 @@
> +/* { dg-do run { target { riscv_vector } } } */
As discussed separately,  I would prefer separate run tests for
zvfh(min) but I guess we can live with them just not being vectorized
for now.
 
Regards
Robin
diff mbox series

Patch

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 9e61b2e41d8..78b9b5a2edb 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -57,6 +57,262 @@ 
   }
 )
 
+;; =========================================================================
+;; == Gather Load
+;; =========================================================================
+
+(define_expand "len_mask_gather_load<VNX1_QHSD:mode><VNX1_QHSDI:mode>"
+  [(match_operand:VNX1_QHSD 0 "register_operand")
+   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:VNX1_QHSDI 2 "vector_gs_offset_operand")
+   (match_operand 3 "<VNX1_QHSD:gs_extension>")
+   (match_operand 4 "<VNX1_QHSD:gs_scale>")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX1_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, true);
+  DONE;
+})
+
+(define_expand "len_mask_gather_load<VNX2_QHSD:mode><VNX2_QHSDI:mode>"
+  [(match_operand:VNX2_QHSD 0 "register_operand")
+   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:VNX2_QHSDI 2 "vector_gs_offset_operand")
+   (match_operand 3 "<VNX2_QHSD:gs_extension>")
+   (match_operand 4 "<VNX2_QHSD:gs_scale>")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX2_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, true);
+  DONE;
+})
+
+(define_expand "len_mask_gather_load<VNX4_QHSD:mode><VNX4_QHSDI:mode>"
+  [(match_operand:VNX4_QHSD 0 "register_operand")
+   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:VNX4_QHSDI 2 "vector_gs_offset_operand")
+   (match_operand 3 "<VNX4_QHSD:gs_extension>")
+   (match_operand 4 "<VNX4_QHSD:gs_scale>")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX4_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, true);
+  DONE;
+})
+
+(define_expand "len_mask_gather_load<VNX8_QHSD:mode><VNX8_QHSDI:mode>"
+  [(match_operand:VNX8_QHSD 0 "register_operand")
+   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:VNX8_QHSDI 2 "vector_gs_offset_operand")
+   (match_operand 3 "<VNX8_QHSD:gs_extension>")
+   (match_operand 4 "<VNX8_QHSD:gs_scale>")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX8_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, true);
+  DONE;
+})
+
+(define_expand "len_mask_gather_load<VNX16_QHSD:mode><VNX16_QHSDI:mode>"
+  [(match_operand:VNX16_QHSD 0 "register_operand")
+   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:VNX16_QHSDI 2 "vector_gs_offset_operand")
+   (match_operand 3 "<VNX16_QHSD:gs_extension>")
+   (match_operand 4 "<VNX16_QHSD:gs_scale>")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX16_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, true);
+  DONE;
+})
+
+(define_expand "len_mask_gather_load<VNX32_QHS:mode><VNX32_QHSI:mode>"
+  [(match_operand:VNX32_QHS 0 "register_operand")
+   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:VNX32_QHSI 2 "vector_gs_offset_operand")
+   (match_operand 3 "<VNX32_QHS:gs_extension>")
+   (match_operand 4 "<VNX32_QHS:gs_scale>")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX32_QHS:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, true);
+  DONE;
+})
+
+(define_expand "len_mask_gather_load<VNX64_QH:mode><VNX64_QHI:mode>"
+  [(match_operand:VNX64_QH 0 "register_operand")
+   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:VNX64_QHI 2 "vector_gs_offset_operand")
+   (match_operand 3 "<VNX64_QH:gs_extension>")
+   (match_operand 4 "<VNX64_QH:gs_scale>")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX64_QH:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, true);
+  DONE;
+})
+
+;; When SEW = 8 and LMUL = 8, we can't find any index mode with
+;; larger SEW. Since RVV indexed load/store support zero extend
+;; implicitly and not support scaling, we should only allow
+;; operands[3] and operands[4] to be const_1_operand.
+(define_expand "len_mask_gather_load<mode><mode>"
+  [(match_operand:VNX128_Q 0 "register_operand")
+   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:VNX128_Q 2 "vector_gs_offset_operand")
+   (match_operand 3 "const_1_operand")
+   (match_operand 4 "const_1_operand")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, true);
+  DONE;
+})
+
+;; =========================================================================
+;; == Scatter Store
+;; =========================================================================
+
+(define_expand "len_mask_scatter_store<VNX1_QHSD:mode><VNX1_QHSDI:mode>"
+  [(match_operand 0 "pmode_reg_or_0_operand")
+   (match_operand:VNX1_QHSDI 1 "vector_gs_offset_operand")
+   (match_operand 2 "<VNX1_QHSD:gs_extension>")
+   (match_operand 3 "<VNX1_QHSD:gs_scale>")
+   (match_operand:VNX1_QHSD 4 "register_operand")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX1_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, false);
+  DONE;
+})
+
+(define_expand "len_mask_scatter_store<VNX2_QHSD:mode><VNX2_QHSDI:mode>"
+  [(match_operand 0 "pmode_reg_or_0_operand")
+   (match_operand:VNX2_QHSDI 1 "vector_gs_offset_operand")
+   (match_operand 2 "<VNX2_QHSD:gs_extension>")
+   (match_operand 3 "<VNX2_QHSD:gs_scale>")
+   (match_operand:VNX2_QHSD 4 "register_operand")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX2_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, false);
+  DONE;
+})
+
+(define_expand "len_mask_scatter_store<VNX4_QHSD:mode><VNX4_QHSDI:mode>"
+  [(match_operand 0 "pmode_reg_or_0_operand")
+   (match_operand:VNX4_QHSDI 1 "vector_gs_offset_operand")
+   (match_operand 2 "<VNX4_QHSD:gs_extension>")
+   (match_operand 3 "<VNX4_QHSD:gs_scale>")
+   (match_operand:VNX4_QHSD 4 "register_operand")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX4_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, false);
+  DONE;
+})
+
+(define_expand "len_mask_scatter_store<VNX8_QHSD:mode><VNX8_QHSDI:mode>"
+  [(match_operand 0 "pmode_reg_or_0_operand")
+   (match_operand:VNX8_QHSDI 1 "vector_gs_offset_operand")
+   (match_operand 2 "<VNX8_QHSD:gs_extension>")
+   (match_operand 3 "<VNX8_QHSD:gs_scale>")
+   (match_operand:VNX8_QHSD 4 "register_operand")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX8_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, false);
+  DONE;
+})
+
+(define_expand "len_mask_scatter_store<VNX16_QHSD:mode><VNX16_QHSDI:mode>"
+  [(match_operand 0 "pmode_reg_or_0_operand")
+   (match_operand:VNX16_QHSDI 1 "vector_gs_offset_operand")
+   (match_operand 2 "<VNX16_QHSD:gs_extension>")
+   (match_operand 3 "<VNX16_QHSD:gs_scale>")
+   (match_operand:VNX16_QHSD 4 "register_operand")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX16_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, false);
+  DONE;
+})
+
+(define_expand "len_mask_scatter_store<VNX32_QHS:mode><VNX32_QHSI:mode>"
+  [(match_operand 0 "pmode_reg_or_0_operand")
+   (match_operand:VNX32_QHSI 1 "vector_gs_offset_operand")
+   (match_operand 2 "<VNX32_QHS:gs_extension>")
+   (match_operand 3 "<VNX32_QHS:gs_scale>")
+   (match_operand:VNX32_QHS 4 "register_operand")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX32_QHS:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, false);
+  DONE;
+})
+
+(define_expand "len_mask_scatter_store<VNX64_QH:mode><VNX64_QHI:mode>"
+  [(match_operand 0 "pmode_reg_or_0_operand")
+   (match_operand:VNX64_QHI 1 "vector_gs_offset_operand")
+   (match_operand 2 "<VNX64_QH:gs_extension>")
+   (match_operand 3 "<VNX64_QH:gs_scale>")
+   (match_operand:VNX64_QH 4 "register_operand")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX64_QH:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, false);
+  DONE;
+})
+
+;; When SEW = 8 and LMUL = 8, we can't find any index mode with
+;; larger SEW. Since RVV indexed load/store support zero extend
+;; implicitly and not support scaling, we should only allow
+;; operands[3] and operands[4] to be const_1_operand.
+(define_expand "len_mask_scatter_store<mode><mode>"
+  [(match_operand 0 "pmode_reg_or_0_operand")
+   (match_operand:VNX128_Q 1 "vector_gs_offset_operand")
+   (match_operand 2 "const_1_operand")
+   (match_operand 3 "const_1_operand")
+   (match_operand:VNX128_Q 4 "register_operand")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, false);
+  DONE;
+})
+
 ;; =========================================================================
 ;; == Vector creation
 ;; =========================================================================
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index eb975eaf994..5a65334e943 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -61,6 +61,10 @@ 
   (and (match_code "const_int,const_wide_int,const_vector")
        (match_test "op == CONST0_RTX (GET_MODE (op))")))
 
+(define_predicate "const_1_operand"
+  (and (match_code "const_int,const_wide_int,const_vector")
+       (match_test "op == CONST1_RTX (GET_MODE (op))")))
+
 (define_predicate "reg_or_0_operand"
   (ior (match_operand 0 "const_0_operand")
        (match_operand 0 "register_operand")))
@@ -341,6 +345,39 @@ 
   (ior (match_operand 0 "register_operand")
        (match_code "const_vector")))
 
+(define_predicate "vector_gs_offset_operand"
+  (ior (match_operand 0 "register_operand")
+       (and (match_code "const_vector")
+            (match_test "CONST_VECTOR_NPATTERNS (op) == 1
+	                 && !CONST_VECTOR_DUPLICATE_P (op)"))))
+
+(define_predicate "vector_gs_scale_operand_16"
+  (and (match_code "const_int")
+       (match_test "INTVAL (op) == 1 || INTVAL (op) == 2")))
+
+(define_predicate "vector_gs_scale_operand_32"
+  (and (match_code "const_int")
+       (match_test "INTVAL (op) == 1 || INTVAL (op) == 4")))
+
+(define_predicate "vector_gs_scale_operand_64"
+  (and (match_code "const_int")
+       (match_test "INTVAL (op) == 1 || (INTVAL (op) == 8 && Pmode == DImode)")))
+
+(define_predicate "vector_gs_extension_operand"
+  (ior (match_operand 0 "const_1_operand")
+       (and (match_operand 0 "const_0_operand")
+            (match_test "Pmode == SImode"))))
+
+(define_predicate "vector_gs_scale_operand_16_rv32"
+  (and (match_code "const_int")
+       (match_test "INTVAL (op) == 1
+		    || (INTVAL (op) == 2 && Pmode == SImode)")))
+
+(define_predicate "vector_gs_scale_operand_32_rv32"
+  (and (match_code "const_int")
+       (match_test "INTVAL (op) == 1
+		    || (INTVAL (op) == 4 && Pmode == SImode)")))
+
 (define_predicate "ltge_operator"
   (match_code "lt,ltu,ge,geu"))
 
@@ -376,7 +413,7 @@ 
 		|| rtx_equal_p (op, CONST0_RTX (GET_MODE (op))))
 		&& maybe_gt (GET_MODE_BITSIZE (GET_MODE (op)), GET_MODE_BITSIZE (Pmode)))")
     (ior (match_test "rtx_equal_p (op, CONST0_RTX (GET_MODE (op)))")
-         (ior (match_operand 0 "const_int_operand")
+         (ior (match_code "const_int,const_poly_int")
               (ior (match_operand 0 "register_operand")
                    (match_test "satisfies_constraint_Wdm (op)"))))))
 
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 5766e3597e8..fd6caccc183 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -148,6 +148,8 @@  enum insn_type
   RVV_WIDEN_TERNOP = 4,
   RVV_SCALAR_MOV_OP = 4, /* +1 for VUNDEF according to vector.md.  */
   RVV_SLIDE_OP = 4,      /* Dest, VUNDEF, source and offset.  */
+  RVV_GATHER_M_OP = 5,
+  RVV_SCATTER_M_OP = 4,
 };
 enum vlmul_type
 {
@@ -255,6 +257,7 @@  void expand_vec_init (rtx, rtx);
 void expand_vec_perm (rtx, rtx, rtx, rtx);
 void expand_select_vl (rtx *);
 void expand_load_store (rtx *, bool);
+void expand_gather_scatter (rtx *, bool);
 
 /* Rounding mode bitfield for fixed point VXRM.  */
 enum fixed_point_rounding_mode
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 8d5bed7ebe4..ed57781d103 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -49,6 +49,7 @@ 
 #include "tm-constrs.h"
 #include "rtx-vector-builder.h"
 #include "targhooks.h"
+#include "gimple.h"
 
 using namespace riscv_vector;
 
@@ -564,7 +565,14 @@  const_vec_all_in_range_p (rtx vec, poly_int64 minval, poly_int64 maxval)
 static rtx
 gen_const_vector_dup (machine_mode mode, poly_int64 val)
 {
-  rtx c = gen_int_mode (val, GET_MODE_INNER (mode));
+  scalar_mode smode = GET_MODE_INNER (mode);
+  rtx c = gen_int_mode (val, smode);
+  if (!val.is_constant () && GET_MODE_SIZE (smode) > GET_MODE_SIZE (Pmode))
+    {
+      rtx dup = gen_reg_rtx (mode);
+      emit_insn (gen_vec_duplicate (mode, dup, c));
+      return dup;
+    }
   return gen_const_vec_duplicate (mode, c);
 }
 
@@ -901,6 +909,39 @@  emit_nonvlmax_masked_insn (unsigned icode, int op_num, rtx *ops, rtx avl)
   e.emit_insn ((enum insn_code) icode, ops);
 }
 
+/* This function emits a VLMAX masked store instruction.  */
+static void
+emit_vlmax_masked_store_insn (unsigned icode, int op_num, rtx *ops)
+{
+  machine_mode dest_mode = GET_MODE (ops[0]);
+  machine_mode mask_mode = get_mask_mode (dest_mode).require ();
+  insn_expander<RVV_INSN_OPERANDS_MAX> e (/*OP_NUM*/ op_num,
+					  /*HAS_DEST_P*/ false,
+					  /*FULLY_UNMASKED_P*/ false,
+					  /*USE_REAL_MERGE_P*/ true,
+					  /*HAS_AVL_P*/ true,
+					  /*VLMAX_P*/ true, dest_mode,
+					  mask_mode);
+  e.emit_insn ((enum insn_code) icode, ops);
+}
+
+/* This function emits a non-VLMAX masked store instruction.  */
+static void
+emit_nonvlmax_masked_store_insn (unsigned icode, int op_num, rtx *ops, rtx avl)
+{
+  machine_mode dest_mode = GET_MODE (ops[0]);
+  machine_mode mask_mode = get_mask_mode (dest_mode).require ();
+  insn_expander<RVV_INSN_OPERANDS_MAX> e (/*OP_NUM*/ op_num,
+					  /*HAS_DEST_P*/ false,
+					  /*FULLY_UNMASKED_P*/ false,
+					  /*USE_REAL_MERGE_P*/ true,
+					  /*HAS_AVL_P*/ true,
+					  /*VLMAX_P*/ false, dest_mode,
+					  mask_mode);
+  e.set_vl (avl);
+  e.emit_insn ((enum insn_code) icode, ops);
+}
+
 /* This function emits a masked instruction.  */
 void
 emit_vlmax_masked_mu_insn (unsigned icode, int op_num, rtx *ops)
@@ -1162,7 +1203,6 @@  expand_const_vector (rtx target, rtx src)
 	}
       else
 	{
-	  elt = force_reg (elt_mode, elt);
 	  rtx ops[] = {tmp, elt};
 	  emit_vlmax_insn (code_for_pred_broadcast (mode), RVV_UNOP, ops);
 	}
@@ -2431,6 +2471,25 @@  expand_vec_cmp_float (rtx target, rtx_code code, rtx op0, rtx op1,
   return false;
 }
 
+/* Modulo all SEL indices to ensure they are all in range if [0, MAX_SEL].  */
+static rtx
+modulo_sel_indices (rtx sel, poly_uint64 max_sel)
+{
+  rtx sel_mod;
+  machine_mode sel_mode = GET_MODE (sel);
+  poly_uint64 nunits = GET_MODE_NUNITS (sel_mode);
+  /* If SEL is variable-length CONST_VECTOR, we don't need to modulo it.  */
+  if (!nunits.is_constant () && CONST_VECTOR_P (sel))
+    sel_mod = sel;
+  else
+    {
+      rtx mod = gen_const_vector_dup (sel_mode, max_sel);
+      sel_mod
+	= expand_simple_binop (sel_mode, AND, sel, mod, NULL, 0, OPTAB_DIRECT);
+    }
+  return sel_mod;
+}
+
 /* Implement vec_perm<mode>.  */
 
 void
@@ -2444,42 +2503,47 @@  expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
      index is in range of [0, nunits - 1]. A single vrgather instructions is
      enough. Since we will use vrgatherei16.vv for variable-length vector,
      it is never out of range and we don't need to modulo the index.  */
-  if (!nunits.is_constant () || const_vec_all_in_range_p (sel, 0, nunits - 1))
+  if (nunits.is_constant () && const_vec_all_in_range_p (sel, 0, nunits - 1))
     {
       emit_vlmax_gather_insn (target, op0, sel);
       return;
     }
 
   /* Check if the two values vectors are the same.  */
-  if (rtx_equal_p (op0, op1) || const_vec_duplicate_p (sel))
+  if (rtx_equal_p (op0, op1))
     {
       /* Note: vec_perm indices are supposed to wrap when they go beyond the
 	 size of the two value vectors, i.e. the upper bits of the indices
 	 are effectively ignored.  RVV vrgather instead produces 0 for any
 	 out-of-range indices, so we need to modulo all the vec_perm indices
 	 to ensure they are all in range of [0, nunits - 1].  */
-      rtx max_sel = gen_const_vector_dup (sel_mode, nunits - 1);
-      rtx sel_mod = expand_simple_binop (sel_mode, AND, sel, max_sel, NULL, 0,
-					 OPTAB_DIRECT);
-      emit_vlmax_gather_insn (target, op1, sel_mod);
+      rtx sel_mod = modulo_sel_indices (sel, nunits - 1);
+      emit_vlmax_gather_insn (target, op0, sel_mod);
       return;
     }
 
-  rtx sel_mod = sel;
-  rtx max_sel = gen_const_vector_dup (sel_mode, 2 * nunits - 1);
-  /* We don't need to modulo indices for VLA vector.
-     Since we should gurantee they aren't out of range before.  */
-  if (nunits.is_constant ())
+  /* Check if all the indices are same.  */
+  rtx elt;
+  if (const_vec_duplicate_p (sel, &elt))
     {
-      /* Note: vec_perm indices are supposed to wrap when they go beyond the
-	 size of the two value vectors, i.e. the upper bits of the indices
-	 are effectively ignored.  RVV vrgather instead produces 0 for any
-	 out-of-range indices, so we need to modulo all the vec_perm indices
-	 to ensure they are all in range of [0, 2 * nunits - 1].  */
-      sel_mod = expand_simple_binop (sel_mode, AND, sel, max_sel, NULL, 0,
-				     OPTAB_DIRECT);
+      poly_uint64 value = rtx_to_poly_int64 (elt);
+      rtx op = op0;
+      if (maybe_gt (value, nunits - 1))
+	{
+	  sel = gen_const_vector_dup (sel_mode, value - nunits);
+	  op = op1;
+	}
+      emit_vlmax_gather_insn (target, op, sel);
     }
 
+  /* Note: vec_perm indices are supposed to wrap when they go beyond the
+     size of the two value vectors, i.e. the upper bits of the indices
+     are effectively ignored.  RVV vrgather instead produces 0 for any
+     out-of-range indices, so we need to modulo all the vec_perm indices
+     to ensure they are all in range of [0, 2 * nunits - 1].  */
+  rtx sel_mod = modulo_sel_indices (sel, 2 * nunits - 1);
+  rtx max_sel = gen_const_vector_dup (sel_mode, 2 * nunits - 1);
+
   /* This following sequence is handling the case that:
      __builtin_shufflevector (vec1, vec2, index...), the index can be any
      value in range of [0, 2 * nunits - 1].  */
@@ -2812,4 +2876,257 @@  expand_load_store (rtx *ops, bool is_load)
     }
 }
 
+/* Prepare insn_code for gather_load/scatter_store according to
+   the vector mode and index mode.  */
+static insn_code
+prepare_gather_scatter (machine_mode vec_mode, machine_mode idx_mode,
+			bool is_load)
+{
+  if (!is_load)
+    return code_for_pred_indexed_store (UNSPEC_UNORDERED, vec_mode, idx_mode);
+  else
+    {
+      unsigned src_eew_bitsize = GET_MODE_BITSIZE (GET_MODE_INNER (idx_mode));
+      unsigned dst_eew_bitsize = GET_MODE_BITSIZE (GET_MODE_INNER (vec_mode));
+      if (dst_eew_bitsize == src_eew_bitsize)
+	return code_for_pred_indexed_load_same_eew (UNSPEC_UNORDERED, vec_mode);
+      else if (dst_eew_bitsize > src_eew_bitsize)
+	{
+	  unsigned factor = dst_eew_bitsize / src_eew_bitsize;
+	  switch (factor)
+	    {
+	    case 2:
+	      return code_for_pred_indexed_load_x2_greater_eew (
+		UNSPEC_UNORDERED, vec_mode);
+	    case 4:
+	      return code_for_pred_indexed_load_x4_greater_eew (
+		UNSPEC_UNORDERED, vec_mode);
+	    case 8:
+	      return code_for_pred_indexed_load_x8_greater_eew (
+		UNSPEC_UNORDERED, vec_mode);
+	    default:
+	      gcc_unreachable ();
+	    }
+	}
+      else
+	{
+	  unsigned factor = src_eew_bitsize / dst_eew_bitsize;
+	  switch (factor)
+	    {
+	    case 2:
+	      return code_for_pred_indexed_load_x2_smaller_eew (
+		UNSPEC_UNORDERED, vec_mode);
+	    case 4:
+	      return code_for_pred_indexed_load_x4_smaller_eew (
+		UNSPEC_UNORDERED, vec_mode);
+	    case 8:
+	      return code_for_pred_indexed_load_x8_smaller_eew (
+		UNSPEC_UNORDERED, vec_mode);
+	    default:
+	      gcc_unreachable ();
+	    }
+	}
+    }
+}
+
+/* Return true if it is the strided load/store.  */
+static bool
+strided_load_store_p (rtx vec_offset, rtx *base, rtx *step)
+{
+  if (const_vec_series_p (vec_offset, base, step))
+    return true;
+
+  /* For strided load/store, vectorizer always generates
+     VEC_SERIES_EXPR for vec_offset.  */
+  tree expr = REG_EXPR (vec_offset);
+  if (!expr || TREE_CODE (expr) != SSA_NAME)
+    return false;
+
+  /* Check if it is GIMPLE like: _88 = VEC_SERIES_EXPR <0, _87>;  */
+  gimple *def_stmt = SSA_NAME_DEF_STMT (expr);
+  if (!def_stmt || !is_gimple_assign (def_stmt)
+      || gimple_assign_rhs_code (def_stmt) != VEC_SERIES_EXPR)
+    return false;
+
+  tree baset = gimple_assign_rhs1 (def_stmt);
+  tree stept = gimple_assign_rhs2 (def_stmt);
+  *base = expand_normal (baset);
+  *step = expand_normal (stept);
+
+  if (!rtx_equal_p (*base, const0_rtx))
+    return false;
+  return true;
+}
+
+/* Expand LEN_MASK_{GATHER_LOAD,SCATTER_STORE}.  */
+void
+expand_gather_scatter (rtx *ops, bool is_load)
+{
+  rtx ptr, vec_offset, vec_reg, len, mask;
+  bool zero_extend_p;
+  int scale_log2;
+  if (is_load)
+    {
+      vec_reg = ops[0];
+      ptr = ops[1];
+      vec_offset = ops[2];
+      zero_extend_p = INTVAL (ops[3]);
+      scale_log2 = exact_log2 (INTVAL (ops[4]));
+      len = ops[5];
+      mask = ops[7];
+    }
+  else
+    {
+      vec_reg = ops[4];
+      ptr = ops[0];
+      vec_offset = ops[1];
+      zero_extend_p = INTVAL (ops[2]);
+      scale_log2 = exact_log2 (INTVAL (ops[3]));
+      len = ops[5];
+      mask = ops[7];
+    }
+
+  machine_mode vec_mode = GET_MODE (vec_reg);
+  machine_mode idx_mode = GET_MODE (vec_offset);
+  scalar_mode inner_vec_mode = GET_MODE_INNER (vec_mode);
+  scalar_mode inner_idx_mode = GET_MODE_INNER (idx_mode);
+  unsigned inner_vsize = GET_MODE_BITSIZE (inner_vec_mode);
+  unsigned inner_offsize = GET_MODE_BITSIZE (inner_idx_mode);
+  poly_int64 nunits = GET_MODE_NUNITS (vec_mode);
+  poly_int64 value;
+  bool is_dummp_len = poly_int_rtx_p (len, &value) && known_eq (value, nunits);
+
+  /* We use vlse.v/vsse.v instead of indexed load/store by default
+     if it is strided load/store.
+
+     FIXME: vlse.v/vsse.v may not always be better than vluxei.v/vsuxei.v.
+     We may need COST MODE to adjust it.  */
+  rtx base, step;
+  if (strided_load_store_p (vec_offset, &base, &step))
+    {
+      if (GET_MODE (step) != Pmode)
+	{
+	  if (CONSTANT_P (step))
+	    step = force_reg (Pmode, step);
+	  else
+	    {
+	      rtx extend_step = gen_reg_rtx (Pmode);
+	      emit_insn (gen_extend_insn (extend_step, step, Pmode,
+					  GET_MODE (step),
+					  zero_extend_p ? true : false));
+	      step = extend_step;
+	    }
+	}
+      if (scale_log2 != 0)
+	{
+	  rtx scale_step = gen_reg_rtx (Pmode);
+	  rtx tmp = expand_simple_binop (Pmode, ASHIFT, step,
+					 gen_int_mode (scale_log2, Pmode),
+					 NULL_RTX, false, OPTAB_DIRECT);
+	  emit_move_insn (scale_step, tmp);
+	  step = scale_step;
+	}
+
+      rtx mem = validize_mem (gen_rtx_MEM (vec_mode, ptr));
+      /* Emit vlse.v if it's load. Otherwise, emit vsse.v.  */
+      if (is_load)
+	{
+	  insn_code icode = code_for_pred_strided_load (vec_mode);
+	  if (is_dummp_len)
+	    {
+	      rtx load_ops[]
+		= {vec_reg, mask, RVV_VUNDEF (vec_mode), mem, step};
+	      emit_vlmax_masked_insn (icode, RVV_GATHER_M_OP, load_ops);
+	    }
+	  else
+	    {
+	      rtx load_ops[]
+		= {vec_reg, mask, RVV_VUNDEF (vec_mode), mem, step};
+	      emit_nonvlmax_masked_insn (icode, RVV_GATHER_M_OP, load_ops, len);
+	    }
+	}
+      else
+	{
+	  if (is_dummp_len)
+	    {
+	      rtx dumpp_len = gen_reg_rtx (Pmode);
+	      emit_vlmax_vsetvl (vec_mode, dumpp_len);
+	      emit_insn (gen_pred_strided_store (vec_mode, mem, mask, step,
+						 vec_reg, dumpp_len,
+						 get_avl_type_rtx (VLMAX)));
+	    }
+	  else
+	    emit_insn (gen_pred_strided_store (vec_mode, mem, mask, step,
+					       vec_reg, len,
+					       get_avl_type_rtx (NONVLMAX)));
+	}
+      return;
+    }
+
+  if (inner_offsize < inner_vsize)
+    {
+      /* 7.2. Vector Load/Store Addressing Modes.
+	 If the vector offset elements are narrower than XLEN, they are
+	 zero-extended to XLEN before adding to the ptr effective address. If
+	 the vector offset elements are wider than XLEN, the least-significant
+	 XLEN bits are used in the address calculation. An implementation must
+	 raise an illegal instruction exception if the EEW is not supported for
+	 offset elements.  */
+      if (!zero_extend_p || (zero_extend_p && scale_log2 != 0))
+	{
+	  if (zero_extend_p)
+	    inner_idx_mode
+	      = int_mode_for_size (inner_offsize * 2, 0).require ();
+	  else
+	    inner_idx_mode = int_mode_for_size (BITS_PER_WORD, 0).require ();
+	  machine_mode new_idx_mode
+	    = get_vector_mode (inner_idx_mode, nunits).require ();
+	  rtx tmp = gen_reg_rtx (new_idx_mode);
+	  emit_insn (gen_extend_insn (tmp, vec_offset, new_idx_mode, idx_mode,
+				      zero_extend_p ? true : false));
+	  vec_offset = tmp;
+	  idx_mode = new_idx_mode;
+	}
+    }
+
+  if (scale_log2 != 0)
+    {
+      rtx tmp = expand_binop (idx_mode, ashl_optab, vec_offset,
+			      gen_int_mode (scale_log2, Pmode), NULL_RTX, 0,
+			      OPTAB_DIRECT);
+      vec_offset = tmp;
+    }
+
+  insn_code icode = prepare_gather_scatter (vec_mode, idx_mode, is_load);
+  if (is_dummp_len)
+    {
+      if (is_load)
+	{
+	  rtx load_ops[]
+	    = {vec_reg, mask, RVV_VUNDEF (vec_mode), ptr, vec_offset};
+	  emit_vlmax_masked_insn (icode, RVV_GATHER_M_OP, load_ops);
+	}
+      else
+	{
+	  rtx store_ops[] = {mask, ptr, vec_offset, vec_reg};
+	  emit_vlmax_masked_store_insn (icode, RVV_SCATTER_M_OP, store_ops);
+	}
+    }
+  else
+    {
+      if (is_load)
+	{
+	  rtx load_ops[]
+	    = {vec_reg, mask, RVV_VUNDEF (vec_mode), ptr, vec_offset};
+	  emit_nonvlmax_masked_insn (icode, RVV_GATHER_M_OP, load_ops, len);
+	}
+      else
+	{
+	  rtx store_ops[] = {mask, ptr, vec_offset, vec_reg};
+	  emit_nonvlmax_masked_store_insn (icode, RVV_SCATTER_M_OP, store_ops,
+					   len);
+	}
+    }
+}
+
 } // namespace riscv_vector
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index e4dc8115e69..96c3a485d41 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2245,6 +2245,31 @@  riscv_legitimize_move (machine_mode mode, rtx dest, rtx src)
       return true;
     }
 
+  if (SUBREG_P (src) && CONST_POLY_INT_P (SUBREG_REG (src)))
+    {
+      /* Since VLENB result feteched by csrr is always within Pmode size,
+	 we always set the upper bits of the REG with larger mode than Pmode
+	 as 0.  The code sequence is transformed as follows:
+
+	     - (set (reg:SI 136)
+		    (subreg:SI (const_poly_int:DI [16,16]) 0))
+	     -> (set (reg:SI 136) (const_poly_int:SI [16,16]))
+
+	     - (set (reg:SI 136)
+		    (subreg:SI (const_poly_int:DI [16,16]) 4))
+	     -> (set (reg:SI 136) (const_int 0))  */
+      rtx poly_rtx = SUBREG_REG (src);
+      poly_int64 value = rtx_to_poly_int64 (poly_rtx);
+      /* The only possible case is handling CONST_POLY_INT:DI in RV32 system. */
+      gcc_assert (!TARGET_64BIT && mode == SImode
+		  && GET_MODE (poly_rtx) == DImode && known_ge (value, 0U));
+      if (known_eq (SUBREG_BYTE (src), GET_MODE_SIZE (Pmode)))
+	emit_move_insn (dest, const0_rtx);
+      else
+	emit_move_insn (dest, gen_int_mode (value, Pmode));
+      return true;
+    }
+
   /* We need to deal with constants that would be legitimate
      immediate_operands but aren't legitimate move_operands.  */
   if (CONSTANT_P (src) && !move_operand (src, mode))
diff --git a/gcc/config/riscv/vector-iterators.md b/gcc/config/riscv/vector-iterators.md
index 8afd3dcaddd..ec49544bcf6 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -115,6 +115,9 @@ 
 
 (define_mode_iterator VEEWEXT2 [
   (VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI VNx8HI VNx16HI (VNx32HI "TARGET_MIN_VLEN > 32") (VNx64HI "TARGET_MIN_VLEN >= 128")
+  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128") (VNx2HF "TARGET_VECTOR_ELEN_FP_16") (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx8HF "TARGET_VECTOR_ELEN_FP_16") (VNx16HF "TARGET_VECTOR_ELEN_FP_16") (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
+  (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
   (VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI VNx4SI VNx8SI (VNx16SI "TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128")
   (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI "TARGET_VECTOR_ELEN_64")
   (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
@@ -161,6 +164,8 @@ 
 (define_mode_iterator VEEWTRUNC2 [
   (VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI VNx4QI VNx8QI VNx16QI VNx32QI (VNx64QI "TARGET_MIN_VLEN >= 128")
   (VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI VNx8HI VNx16HI (VNx32HI "TARGET_MIN_VLEN >= 128")
+  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128") (VNx2HF "TARGET_VECTOR_ELEN_FP_16") (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx8HF "TARGET_VECTOR_ELEN_FP_16") (VNx16HF "TARGET_VECTOR_ELEN_FP_16") (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
   (VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI VNx4SI VNx8SI (VNx16SI "TARGET_MIN_VLEN >= 128")
   (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
   (VNx2SF "TARGET_VECTOR_ELEN_FP_32")
@@ -172,6 +177,8 @@ 
 (define_mode_iterator VEEWTRUNC4 [
   (VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI VNx4QI VNx8QI VNx16QI (VNx32QI "TARGET_MIN_VLEN >= 128")
   (VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI VNx8HI (VNx16HI "TARGET_MIN_VLEN >= 128")
+  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128") (VNx2HF "TARGET_VECTOR_ELEN_FP_16") (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx8HF "TARGET_VECTOR_ELEN_FP_16") (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
 ])
 
 (define_mode_iterator VEEWTRUNC8 [
@@ -362,46 +369,67 @@ 
 ])
 
 (define_mode_iterator VNX1_QHSD [
-  (VNx1QI "TARGET_MIN_VLEN < 128") (VNx1HI "TARGET_MIN_VLEN < 128") (VNx1SI "TARGET_MIN_VLEN < 128")
+  (VNx1QI "TARGET_MIN_VLEN < 128")
+  (VNx1HI "TARGET_MIN_VLEN < 128")
+  (VNx1SI "TARGET_MIN_VLEN < 128")
   (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128")
+  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
   (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
   (VNx1DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN < 128")
 ])
 
 (define_mode_iterator VNX2_QHSD [
-  VNx2QI VNx2HI VNx2SI
+  VNx2QI
+  VNx2HI
+  VNx2SI
   (VNx2DI "TARGET_VECTOR_ELEN_64")
+  (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx2SF "TARGET_VECTOR_ELEN_FP_32")
   (VNx2DF "TARGET_VECTOR_ELEN_FP_64")
 ])
 
 (define_mode_iterator VNX4_QHSD [
-  VNx4QI VNx4HI VNx4SI
+  VNx4QI
+  VNx4HI
+  VNx4SI
   (VNx4DI "TARGET_VECTOR_ELEN_64")
+  (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx4SF "TARGET_VECTOR_ELEN_FP_32")
   (VNx4DF "TARGET_VECTOR_ELEN_FP_64")
 ])
 
 (define_mode_iterator VNX8_QHSD [
-  VNx8QI VNx8HI VNx8SI
+  VNx8QI
+  VNx8HI
+  VNx8SI
   (VNx8DI "TARGET_VECTOR_ELEN_64")
+  (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx8SF "TARGET_VECTOR_ELEN_FP_32")
   (VNx8DF "TARGET_VECTOR_ELEN_FP_64")
 ])
 
-(define_mode_iterator VNX16_QHS [
-  VNx16QI VNx16HI (VNx16SI "TARGET_MIN_VLEN > 32")
+(define_mode_iterator VNX16_QHSD [
+  VNx16QI
+  VNx16HI
+  (VNx16SI "TARGET_MIN_VLEN > 32")
+  (VNx16DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
+  (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx16SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN > 32")
-  (VNx16DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128") (VNx16DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN >= 128")
+  (VNx16DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN >= 128")
 ])
 
 (define_mode_iterator VNX32_QHS [
-  VNx32QI (VNx32HI "TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128") (VNx32SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
+  VNx32QI
+  (VNx32HI "TARGET_MIN_VLEN > 32")
+  (VNx32SI "TARGET_MIN_VLEN >= 128")
+  (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
+  (VNx32SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
 ])
 
 (define_mode_iterator VNX64_QH [
   (VNx64QI "TARGET_MIN_VLEN > 32")
   (VNx64HI "TARGET_MIN_VLEN >= 128")
+  (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
 ])
 
 (define_mode_iterator VNX128_Q [
@@ -409,35 +437,49 @@ 
 ])
 
 (define_mode_iterator VNX1_QHSDI [
-  (VNx1QI "TARGET_MIN_VLEN < 128") (VNx1HI "TARGET_MIN_VLEN < 128") (VNx1SI "TARGET_MIN_VLEN < 128")
-  (VNx1DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128")
+  (VNx1QI "TARGET_MIN_VLEN < 128")
+  (VNx1HI "TARGET_MIN_VLEN < 128")
+  (VNx1SI "TARGET_MIN_VLEN < 128")
+  (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128 && TARGET_64BIT")
 ])
 
 (define_mode_iterator VNX2_QHSDI [
-  VNx2QI VNx2HI VNx2SI
-  (VNx2DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64")
+  VNx2QI
+  VNx2HI
+  VNx2SI
+  (VNx2DI "TARGET_VECTOR_ELEN_64 && TARGET_64BIT")
 ])
 
 (define_mode_iterator VNX4_QHSDI [
-  VNx4QI VNx4HI VNx4SI
-  (VNx4DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64")
+  VNx4QI
+  VNx4HI
+  VNx4SI
+  (VNx4DI "TARGET_VECTOR_ELEN_64 && TARGET_64BIT")
 ])
 
 (define_mode_iterator VNX8_QHSDI [
-  VNx8QI VNx8HI VNx8SI
-  (VNx8DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64")
+  VNx8QI
+  VNx8HI
+  VNx8SI
+  (VNx8DI "TARGET_VECTOR_ELEN_64 && TARGET_64BIT")
 ])
 
 (define_mode_iterator VNX16_QHSDI [
-  VNx16QI VNx16HI (VNx16SI "TARGET_MIN_VLEN > 32") (VNx16DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
+  VNx16QI
+  VNx16HI
+  (VNx16SI "TARGET_MIN_VLEN > 32")
+  (VNx16DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128 && TARGET_64BIT")
 ])
 
 (define_mode_iterator VNX32_QHSI [
-  VNx32QI (VNx32HI "TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128")
+  VNx32QI
+  (VNx32HI "TARGET_MIN_VLEN > 32")
+  (VNx32SI "TARGET_MIN_VLEN >= 128")
 ])
 
 (define_mode_iterator VNX64_QHI [
-  VNx64QI (VNx64HI "TARGET_MIN_VLEN >= 128")
+  (VNx64QI "TARGET_MIN_VLEN > 32")
+  (VNx64HI "TARGET_MIN_VLEN >= 128")
 ])
 
 (define_mode_iterator V_WHOLE [
@@ -1393,6 +1435,8 @@ 
 (define_mode_attr VINDEX_DOUBLE_TRUNC [
   (VNx1HI "VNx1QI") (VNx2HI "VNx2QI")  (VNx4HI "VNx4QI")  (VNx8HI "VNx8QI")
   (VNx16HI "VNx16QI") (VNx32HI "VNx32QI") (VNx64HI "VNx64QI")
+  (VNx1HF "VNx1QI") (VNx2HF "VNx2QI")  (VNx4HF "VNx4QI")  (VNx8HF "VNx8QI")
+  (VNx16HF "VNx16QI") (VNx32HF "VNx32QI") (VNx64HF "VNx64QI")
   (VNx1SI "VNx1HI") (VNx2SI "VNx2HI") (VNx4SI "VNx4HI") (VNx8SI "VNx8HI")
   (VNx16SI "VNx16HI") (VNx32SI "VNx32HI")
   (VNx1SF "VNx1HI") (VNx2SF "VNx2HI") (VNx4SF "VNx4HI") (VNx8SF "VNx8HI")
@@ -1420,6 +1464,7 @@ 
 (define_mode_attr VINDEX_DOUBLE_EXT [
   (VNx1QI "VNx1HI") (VNx2QI "VNx2HI") (VNx4QI "VNx4HI") (VNx8QI "VNx8HI") (VNx16QI "VNx16HI") (VNx32QI "VNx32HI") (VNx64QI "VNx64HI")
   (VNx1HI "VNx1SI") (VNx2HI "VNx2SI") (VNx4HI "VNx4SI") (VNx8HI "VNx8SI") (VNx16HI "VNx16SI") (VNx32HI "VNx32SI")
+  (VNx1HF "VNx1SI") (VNx2HF "VNx2SI") (VNx4HF "VNx4SI") (VNx8HF "VNx8SI") (VNx16HF "VNx16SI") (VNx32HF "VNx32SI")
   (VNx1SI "VNx1DI") (VNx2SI "VNx2DI") (VNx4SI "VNx4DI") (VNx8SI "VNx8DI") (VNx16SI "VNx16DI")
   (VNx1SF "VNx1DI") (VNx2SF "VNx2DI") (VNx4SF "VNx4DI") (VNx8SF "VNx8DI") (VNx16SF "VNx16DI")
 ])
@@ -1427,6 +1472,7 @@ 
 (define_mode_attr VINDEX_QUAD_EXT [
   (VNx1QI "VNx1SI") (VNx2QI "VNx2SI") (VNx4QI "VNx4SI") (VNx8QI "VNx8SI") (VNx16QI "VNx16SI") (VNx32QI "VNx32SI")
   (VNx1HI "VNx1DI") (VNx2HI "VNx2DI") (VNx4HI "VNx4DI") (VNx8HI "VNx8DI") (VNx16HI "VNx16DI")
+  (VNx1HF "VNx1DI") (VNx2HF "VNx2DI") (VNx4HF "VNx4DI") (VNx8HF "VNx8DI") (VNx16HF "VNx16DI")
 ])
 
 (define_mode_attr VINDEX_OCT_EXT [
@@ -1471,6 +1517,40 @@ 
   (VNx4DI "VNx8BI") (VNx8DI "VNx16BI") (VNx16DI "VNx32BI")
 ])
 
+(define_mode_attr gs_extension [
+  (VNx1QI "immediate_operand") (VNx2QI "immediate_operand") (VNx4QI "immediate_operand") (VNx8QI "immediate_operand") (VNx16QI "immediate_operand")
+  (VNx32QI "vector_gs_extension_operand") (VNx64QI "const_1_operand")
+  (VNx1HI "immediate_operand") (VNx2HI "immediate_operand") (VNx4HI "immediate_operand") (VNx8HI "immediate_operand") (VNx16HI "immediate_operand")
+  (VNx32HI "vector_gs_extension_operand") (VNx64HI "const_1_operand")
+  (VNx1SI "immediate_operand") (VNx2SI "immediate_operand") (VNx4SI "immediate_operand") (VNx8SI "immediate_operand") (VNx16SI "immediate_operand")
+  (VNx32SI "vector_gs_extension_operand")
+  (VNx1DI "immediate_operand") (VNx2DI "immediate_operand") (VNx4DI "immediate_operand") (VNx8DI "immediate_operand") (VNx16DI "immediate_operand")
+
+  (VNx1HF "immediate_operand") (VNx2HF "immediate_operand") (VNx4HF "immediate_operand") (VNx8HF "immediate_operand") (VNx16HF "immediate_operand")
+  (VNx32HF "vector_gs_extension_operand") (VNx64HF "const_1_operand")
+  (VNx1SF "immediate_operand") (VNx2SF "immediate_operand") (VNx4SF "immediate_operand") (VNx8SF "immediate_operand") (VNx16SF "immediate_operand")
+  (VNx32SF "vector_gs_extension_operand")
+  (VNx1DF "immediate_operand") (VNx2DF "immediate_operand") (VNx4DF "immediate_operand") (VNx8DF "immediate_operand") (VNx16DF "immediate_operand")
+])
+
+(define_mode_attr gs_scale [
+  (VNx1QI "const_1_operand") (VNx2QI "const_1_operand") (VNx4QI "const_1_operand") (VNx8QI "const_1_operand")
+  (VNx16QI "const_1_operand") (VNx32QI "const_1_operand") (VNx64QI "const_1_operand")
+  (VNx1HI "vector_gs_scale_operand_16") (VNx2HI "vector_gs_scale_operand_16") (VNx4HI "vector_gs_scale_operand_16") (VNx8HI "vector_gs_scale_operand_16")
+  (VNx16HI "vector_gs_scale_operand_16") (VNx32HI "vector_gs_scale_operand_16_rv32") (VNx64HI "const_1_operand")
+  (VNx1SI "vector_gs_scale_operand_32") (VNx2SI "vector_gs_scale_operand_32") (VNx4SI "vector_gs_scale_operand_32") (VNx8SI "vector_gs_scale_operand_32")
+  (VNx16SI "vector_gs_scale_operand_32") (VNx32SI "vector_gs_scale_operand_32_rv32")
+  (VNx1DI "vector_gs_scale_operand_64") (VNx2DI "vector_gs_scale_operand_64") (VNx4DI "vector_gs_scale_operand_64") (VNx8DI "vector_gs_scale_operand_64")
+  (VNx16DI "vector_gs_scale_operand_64")
+
+  (VNx1HF "vector_gs_scale_operand_16") (VNx2HF "vector_gs_scale_operand_16") (VNx4HF "vector_gs_scale_operand_16") (VNx8HF "vector_gs_scale_operand_16")
+  (VNx16HF "vector_gs_scale_operand_16") (VNx32HF "vector_gs_scale_operand_16_rv32") (VNx64HF "const_1_operand")
+  (VNx1SF "vector_gs_scale_operand_32") (VNx2SF "vector_gs_scale_operand_32") (VNx4SF "vector_gs_scale_operand_32") (VNx8SF "vector_gs_scale_operand_32")
+  (VNx16SF "vector_gs_scale_operand_32") (VNx32SF "vector_gs_scale_operand_32_rv32")
+  (VNx1DF "vector_gs_scale_operand_64") (VNx2DF "vector_gs_scale_operand_64") (VNx4DF "vector_gs_scale_operand_64") (VNx8DF "vector_gs_scale_operand_64")
+  (VNx16DF "vector_gs_scale_operand_64")
+])
+
 (define_int_iterator WREDUC [UNSPEC_WREDUC_SUM UNSPEC_WREDUC_USUM])
 
 (define_int_iterator ORDER [UNSPEC_ORDERED UNSPEC_UNORDERED])
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 9df40e44113..2af960716d2 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -818,7 +818,7 @@ 
 ;; This pattern only handles duplicates of non-constant inputs.
 ;; Constant vectors go through the movm pattern instead.
 ;; So "direct_broadcast_operand" can only be mem or reg, no CONSTANT.
-(define_expand "vec_duplicate<mode>"
+(define_expand "@vec_duplicate<mode>"
   [(set (match_operand:V 0 "register_operand")
 	(vec_duplicate:V
 	  (match_operand:<VEL> 1 "direct_broadcast_operand")))]
@@ -1357,8 +1357,16 @@ 
 	}
     }
   else if (GET_MODE_BITSIZE (<VEL>mode) > GET_MODE_BITSIZE (Pmode)
-           && immediate_operand (operands[3], Pmode))
-    operands[3] = gen_rtx_SIGN_EXTEND (<VEL>mode, force_reg (Pmode, operands[3]));
+           && (immediate_operand (operands[3], Pmode)
+	       || (CONST_POLY_INT_P (operands[3])
+	           && known_ge (rtx_to_poly_int64 (operands[3]), 0U)
+		   && known_le (rtx_to_poly_int64 (operands[3]), GET_MODE_SIZE (<MODE>mode)))))
+    {
+      rtx tmp = gen_reg_rtx (Pmode);
+      poly_int64 value = rtx_to_poly_int64 (operands[3]);
+      emit_move_insn (tmp, gen_int_mode (value, Pmode));
+      operands[3] = gen_rtx_SIGN_EXTEND (<VEL>mode, tmp);
+    }
   else
     operands[3] = force_reg (<VEL>mode, operands[3]);
 })
@@ -1387,7 +1395,8 @@ 
    vlse<sew>.v\t%0,%3,zero
    vmv.s.x\t%0,%3
    vmv.s.x\t%0,%3"
-  "register_operand (operands[3], <VEL>mode)
+  "(register_operand (operands[3], <VEL>mode)
+  || CONST_POLY_INT_P (operands[3]))
   && GET_MODE_BITSIZE (<VEL>mode) > GET_MODE_BITSIZE (Pmode)"
   [(set (match_dup 0)
 	(if_then_else:VI (unspec:<VM> [(match_dup 1) (match_dup 4)
@@ -1397,6 +1406,12 @@ 
 	  (match_dup 2)))]
   {
     gcc_assert (can_create_pseudo_p ());
+    if (CONST_POLY_INT_P (operands[3]))
+      {
+        rtx tmp = gen_reg_rtx (<VEL>mode);
+	emit_move_insn (tmp, operands[3]);
+	operands[3] = tmp;
+      }
     rtx m = assign_stack_local (<VEL>mode, GET_MODE_SIZE (<VEL>mode),
 				GET_MODE_ALIGNMENT (<VEL>mode));
     m = validize_mem (m);
@@ -1483,6 +1498,7 @@ 
 	     (match_operand 5 "vector_length_operand"    "   rK,    rK,    rK")
 	     (match_operand 6 "const_int_operand"        "    i,     i,     i")
 	     (match_operand 7 "const_int_operand"        "    i,     i,     i")
+	     (match_operand 8 "const_int_operand"        "    i,     i,     i")
 	     (reg:SI VL_REGNUM)
 	     (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
 	  (unspec:V
@@ -1738,7 +1754,7 @@ 
   [(set_attr "type" "vst<order>x")
    (set_attr "mode" "<VNX8_QHSD:MODE>")])
 
-(define_insn "@pred_indexed_<order>store<VNX16_QHS:mode><VNX16_QHSDI:mode>"
+(define_insn "@pred_indexed_<order>store<VNX16_QHSD:mode><VNX16_QHSDI:mode>"
   [(set (mem:BLK (scratch))
 	(unspec:BLK
 	  [(unspec:<VM>
@@ -1749,11 +1765,11 @@ 
 	     (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
 	   (match_operand 1 "pmode_reg_or_0_operand"      "  rJ")
 	   (match_operand:VNX16_QHSDI 2 "register_operand" "  vr")
-	   (match_operand:VNX16_QHS 3 "register_operand"  "  vr")] ORDER))]
+	   (match_operand:VNX16_QHSD 3 "register_operand"  "  vr")] ORDER))]
   "TARGET_VECTOR"
   "vs<order>xei<VNX16_QHSDI:sew>.v\t%3,(%z1),%2%p0"
   [(set_attr "type" "vst<order>x")
-   (set_attr "mode" "<VNX16_QHS:MODE>")])
+   (set_attr "mode" "<VNX16_QHSD:MODE>")])
 
 (define_insn "@pred_indexed_<order>store<VNX32_QHS:mode><VNX32_QHSI:mode>"
   [(set (mem:BLK (scratch))
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-1.c
new file mode 100644
index 00000000000..dffe13f6a8a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-1.c
@@ -0,0 +1,38 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 uint8_t
+#define INDEX16 uint16_t
+#define INDEX32 uint32_t
+#define INDEX64 uint64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-10.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-10.c
new file mode 100644
index 00000000000..a622e516f06
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-10.c
@@ -0,0 +1,35 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX64 int64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 64)                                                                \
+  T (uint8_t, 64)                                                               \
+  T (int16_t, 64)                                                               \
+  T (uint16_t, 64)                                                              \
+  T (_Float16, 64)                                                              \
+  T (int32_t, 64)                                                               \
+  T (uint32_t, 64)                                                              \
+  T (float, 64)                                                                 \
+  T (int64_t, 64)                                                               \
+  T (uint64_t, 64)                                                              \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-11.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-11.c
new file mode 100644
index 00000000000..4692380233d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-11.c
@@ -0,0 +1,32 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_LOOP(DATA_TYPE)                                                   \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict *src)           \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += *src[i];                                                      \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t)                                                                   \
+  T (uint8_t)                                                                  \
+  T (int16_t)                                                                  \
+  T (uint16_t)                                                                 \
+  T (_Float16)                                                                 \
+  T (int32_t)                                                                  \
+  T (uint32_t)                                                                 \
+  T (float)                                                                    \
+  T (int64_t)                                                                  \
+  T (uint64_t)                                                                 \
+  T (double)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-12.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-12.c
new file mode 100644
index 00000000000..71a3dd466fa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-12.c
@@ -0,0 +1,112 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-vect-cost-model -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_LOOP(DATA_TYPE, INDEX_TYPE)                                       \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE##_##INDEX_TYPE (DATA_TYPE *restrict y, DATA_TYPE *restrict x,  \
+				INDEX_TYPE *restrict index)                    \
+  {                                                                            \
+    for (int i = 0; i < 100; ++i)                                              \
+      {                                                                        \
+	y[i * 2] = x[index[i * 2]] + 1;                                        \
+	y[i * 2 + 1] = x[index[i * 2 + 1]] + 2;                                \
+      }                                                                        \
+  }
+
+TEST_LOOP (int8_t, int8_t)
+TEST_LOOP (uint8_t, int8_t)
+TEST_LOOP (int16_t, int8_t)
+TEST_LOOP (uint16_t, int8_t)
+TEST_LOOP (int32_t, int8_t)
+TEST_LOOP (uint32_t, int8_t)
+TEST_LOOP (int64_t, int8_t)
+TEST_LOOP (uint64_t, int8_t)
+TEST_LOOP (_Float16, int8_t)
+TEST_LOOP (float, int8_t)
+TEST_LOOP (double, int8_t)
+TEST_LOOP (int8_t, int16_t)
+TEST_LOOP (uint8_t, int16_t)
+TEST_LOOP (int16_t, int16_t)
+TEST_LOOP (uint16_t, int16_t)
+TEST_LOOP (int32_t, int16_t)
+TEST_LOOP (uint32_t, int16_t)
+TEST_LOOP (int64_t, int16_t)
+TEST_LOOP (uint64_t, int16_t)
+TEST_LOOP (_Float16, int16_t)
+TEST_LOOP (float, int16_t)
+TEST_LOOP (double, int16_t)
+TEST_LOOP (int8_t, int32_t)
+TEST_LOOP (uint8_t, int32_t)
+TEST_LOOP (int16_t, int32_t)
+TEST_LOOP (uint16_t, int32_t)
+TEST_LOOP (int32_t, int32_t)
+TEST_LOOP (uint32_t, int32_t)
+TEST_LOOP (int64_t, int32_t)
+TEST_LOOP (uint64_t, int32_t)
+TEST_LOOP (_Float16, int32_t)
+TEST_LOOP (float, int32_t)
+TEST_LOOP (double, int32_t)
+TEST_LOOP (int8_t, int64_t)
+TEST_LOOP (uint8_t, int64_t)
+TEST_LOOP (int16_t, int64_t)
+TEST_LOOP (uint16_t, int64_t)
+TEST_LOOP (int32_t, int64_t)
+TEST_LOOP (uint32_t, int64_t)
+TEST_LOOP (int64_t, int64_t)
+TEST_LOOP (uint64_t, int64_t)
+TEST_LOOP (_Float16, int64_t)
+TEST_LOOP (float, int64_t)
+TEST_LOOP (double, int64_t)
+TEST_LOOP (int8_t, uint8_t)
+TEST_LOOP (uint8_t, uint8_t)
+TEST_LOOP (int16_t, uint8_t)
+TEST_LOOP (uint16_t, uint8_t)
+TEST_LOOP (int32_t, uint8_t)
+TEST_LOOP (uint32_t, uint8_t)
+TEST_LOOP (int64_t, uint8_t)
+TEST_LOOP (uint64_t, uint8_t)
+TEST_LOOP (_Float16, uint8_t)
+TEST_LOOP (float, uint8_t)
+TEST_LOOP (double, uint8_t)
+TEST_LOOP (int8_t, uint16_t)
+TEST_LOOP (uint8_t, uint16_t)
+TEST_LOOP (int16_t, uint16_t)
+TEST_LOOP (uint16_t, uint16_t)
+TEST_LOOP (int32_t, uint16_t)
+TEST_LOOP (uint32_t, uint16_t)
+TEST_LOOP (int64_t, uint16_t)
+TEST_LOOP (uint64_t, uint16_t)
+TEST_LOOP (_Float16, uint16_t)
+TEST_LOOP (float, uint16_t)
+TEST_LOOP (double, uint16_t)
+TEST_LOOP (int8_t, uint32_t)
+TEST_LOOP (uint8_t, uint32_t)
+TEST_LOOP (int16_t, uint32_t)
+TEST_LOOP (uint16_t, uint32_t)
+TEST_LOOP (int32_t, uint32_t)
+TEST_LOOP (uint32_t, uint32_t)
+TEST_LOOP (int64_t, uint32_t)
+TEST_LOOP (uint64_t, uint32_t)
+TEST_LOOP (_Float16, uint32_t)
+TEST_LOOP (float, uint32_t)
+TEST_LOOP (double, uint32_t)
+TEST_LOOP (int8_t, uint64_t)
+TEST_LOOP (uint8_t, uint64_t)
+TEST_LOOP (int16_t, uint64_t)
+TEST_LOOP (uint16_t, uint64_t)
+TEST_LOOP (int32_t, uint64_t)
+TEST_LOOP (uint32_t, uint64_t)
+TEST_LOOP (int64_t, uint64_t)
+TEST_LOOP (uint64_t, uint64_t)
+TEST_LOOP (_Float16, uint64_t)
+TEST_LOOP (float, uint64_t)
+TEST_LOOP (double, uint64_t)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 88 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-assembler-not "vluxei64\.v" } } */
+/* { dg-final { scan-assembler-not "vsuxei64\.v" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-2.c
new file mode 100644
index 00000000000..785550c4b2d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-2.c
@@ -0,0 +1,38 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 int8_t
+#define INDEX16 int16_t
+#define INDEX32 int32_t
+#define INDEX64 int64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-3.c
new file mode 100644
index 00000000000..22aeb889221
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-3.c
@@ -0,0 +1,35 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 uint8_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 8)                                                               \
+  T (uint16_t, 8)                                                              \
+  T (_Float16, 8)                                                              \
+  T (int32_t, 8)                                                               \
+  T (uint32_t, 8)                                                              \
+  T (float, 8)                                                                 \
+  T (int64_t, 8)                                                               \
+  T (uint64_t, 8)                                                              \
+  T (double, 8)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-4.c
new file mode 100644
index 00000000000..d74a83415d7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-4.c
@@ -0,0 +1,35 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 int8_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 8)                                                               \
+  T (uint16_t, 8)                                                              \
+  T (_Float16, 8)                                                              \
+  T (int32_t, 8)                                                               \
+  T (uint32_t, 8)                                                              \
+  T (float, 8)                                                                 \
+  T (int64_t, 8)                                                               \
+  T (uint64_t, 8)                                                              \
+  T (double, 8)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-5.c
new file mode 100644
index 00000000000..2b6c0a87c18
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-5.c
@@ -0,0 +1,35 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX16 uint16_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 16)                                                                \
+  T (uint8_t, 16)                                                               \
+  T (int16_t, 16)                                                               \
+  T (uint16_t, 16)                                                              \
+  T (_Float16, 16)                                                              \
+  T (int32_t, 16)                                                               \
+  T (uint32_t, 16)                                                              \
+  T (float, 16)                                                                 \
+  T (int64_t, 16)                                                               \
+  T (uint64_t, 16)                                                              \
+  T (double, 16)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-6.c
new file mode 100644
index 00000000000..407cc8a5a73
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-6.c
@@ -0,0 +1,35 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX16 int16_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 16)                                                                \
+  T (uint8_t, 16)                                                               \
+  T (int16_t, 16)                                                               \
+  T (uint16_t, 16)                                                              \
+  T (_Float16, 16)                                                              \
+  T (int32_t, 16)                                                               \
+  T (uint32_t, 16)                                                              \
+  T (float, 16)                                                                 \
+  T (int64_t, 16)                                                               \
+  T (uint64_t, 16)                                                              \
+  T (double, 16)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-7.c
new file mode 100644
index 00000000000..81b31ef26aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-7.c
@@ -0,0 +1,35 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX32 uint32_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 32)                                                                \
+  T (uint8_t, 32)                                                               \
+  T (int16_t, 32)                                                               \
+  T (uint16_t, 32)                                                              \
+  T (_Float16, 32)                                                              \
+  T (int32_t, 32)                                                               \
+  T (uint32_t, 32)                                                              \
+  T (float, 32)                                                                 \
+  T (int64_t, 32)                                                               \
+  T (uint64_t, 32)                                                              \
+  T (double, 32)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-8.c
new file mode 100644
index 00000000000..0bfdb8f0acf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-8.c
@@ -0,0 +1,35 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX32 int32_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 32)                                                                \
+  T (uint8_t, 32)                                                               \
+  T (int16_t, 32)                                                               \
+  T (uint16_t, 32)                                                              \
+  T (_Float16, 32)                                                              \
+  T (int32_t, 32)                                                               \
+  T (uint32_t, 32)                                                              \
+  T (float, 32)                                                                 \
+  T (int64_t, 32)                                                               \
+  T (uint64_t, 32)                                                              \
+  T (double, 32)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-9.c
new file mode 100644
index 00000000000..46f791105ed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-9.c
@@ -0,0 +1,35 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX64 uint64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 64)                                                                \
+  T (uint8_t, 64)                                                               \
+  T (int16_t, 64)                                                               \
+  T (uint16_t, 64)                                                              \
+  T (_Float16, 64)                                                              \
+  T (int32_t, 64)                                                               \
+  T (uint32_t, 64)                                                              \
+  T (float, 64)                                                                 \
+  T (int64_t, 64)                                                               \
+  T (uint64_t, 64)                                                              \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-1.c
new file mode 100644
index 00000000000..0d3c5b71e5b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-1.c
@@ -0,0 +1,41 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-1.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+	    == (dest2_##DATA_TYPE[i]                                           \
+		+ src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-10.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-10.c
new file mode 100644
index 00000000000..145df1e7797
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-10.c
@@ -0,0 +1,41 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-10.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+	    == (dest2_##DATA_TYPE[i]                                           \
+		+ src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-11.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-11.c
new file mode 100644
index 00000000000..d36b6f025f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-11.c
@@ -0,0 +1,39 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-11.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE *src_##DATA_TYPE[128];                                             \
+  DATA_TYPE src2_##DATA_TYPE[128];                                             \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = src2_##DATA_TYPE + i;                               \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE);                           \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+	    == (dest2_##DATA_TYPE[i] + src_##DATA_TYPE[i][0]));
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c
new file mode 100644
index 00000000000..b4e2ead8ca9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c
@@ -0,0 +1,124 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-12.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, INDEX_TYPE)                                        \
+  DATA_TYPE dest_##DATA_TYPE##_##INDEX_TYPE[202] = {0};                        \
+  DATA_TYPE src_##DATA_TYPE##_##INDEX_TYPE[202] = {0};                         \
+  INDEX_TYPE index_##DATA_TYPE##_##INDEX_TYPE[202] = {0};                      \
+  for (int i = 0; i < 202; i++)                                                \
+    {                                                                          \
+      src_##DATA_TYPE##_##INDEX_TYPE[i]                                        \
+	= (DATA_TYPE) ((i * 19 + 735) & (sizeof (DATA_TYPE) * 7 - 1));         \
+      index_##DATA_TYPE##_##INDEX_TYPE[i] = (i * 7) % (55);                    \
+    }                                                                          \
+  f_##DATA_TYPE##_##INDEX_TYPE (dest_##DATA_TYPE##_##INDEX_TYPE,               \
+				src_##DATA_TYPE##_##INDEX_TYPE,                \
+				index_##DATA_TYPE##_##INDEX_TYPE);             \
+  for (int i = 0; i < 100; i++)                                                \
+    {                                                                          \
+      assert (dest_##DATA_TYPE##_##INDEX_TYPE[i * 2]                           \
+	      == (src_##DATA_TYPE##_##INDEX_TYPE                               \
+		    [index_##DATA_TYPE##_##INDEX_TYPE[i * 2]]                  \
+		  + 1));                                                       \
+      assert (dest_##DATA_TYPE##_##INDEX_TYPE[i * 2 + 1]                       \
+	      == (src_##DATA_TYPE##_##INDEX_TYPE                               \
+		    [index_##DATA_TYPE##_##INDEX_TYPE[i * 2 + 1]]              \
+		  + 2));                                                       \
+    }
+
+  RUN_LOOP (int8_t, int8_t)
+  RUN_LOOP (uint8_t, int8_t)
+  RUN_LOOP (int16_t, int8_t)
+  RUN_LOOP (uint16_t, int8_t)
+  RUN_LOOP (int32_t, int8_t)
+  RUN_LOOP (uint32_t, int8_t)
+  RUN_LOOP (int64_t, int8_t)
+  RUN_LOOP (uint64_t, int8_t)
+  RUN_LOOP (_Float16, int8_t)
+  RUN_LOOP (float, int8_t)
+  RUN_LOOP (double, int8_t)
+  RUN_LOOP (int8_t, int16_t)
+  RUN_LOOP (uint8_t, int16_t)
+  RUN_LOOP (int16_t, int16_t)
+  RUN_LOOP (uint16_t, int16_t)
+  RUN_LOOP (int32_t, int16_t)
+  RUN_LOOP (uint32_t, int16_t)
+  RUN_LOOP (int64_t, int16_t)
+  RUN_LOOP (uint64_t, int16_t)
+  RUN_LOOP (_Float16, int16_t)
+  RUN_LOOP (float, int16_t)
+  RUN_LOOP (double, int16_t)
+  RUN_LOOP (int8_t, int32_t)
+  RUN_LOOP (uint8_t, int32_t)
+  RUN_LOOP (int16_t, int32_t)
+  RUN_LOOP (uint16_t, int32_t)
+  RUN_LOOP (int32_t, int32_t)
+  RUN_LOOP (uint32_t, int32_t)
+  RUN_LOOP (int64_t, int32_t)
+  RUN_LOOP (uint64_t, int32_t)
+  RUN_LOOP (_Float16, int32_t)
+  RUN_LOOP (float, int32_t)
+  RUN_LOOP (double, int32_t)
+  RUN_LOOP (int8_t, int64_t)
+  RUN_LOOP (uint8_t, int64_t)
+  RUN_LOOP (int16_t, int64_t)
+  RUN_LOOP (uint16_t, int64_t)
+  RUN_LOOP (int32_t, int64_t)
+  RUN_LOOP (uint32_t, int64_t)
+  RUN_LOOP (int64_t, int64_t)
+  RUN_LOOP (uint64_t, int64_t)
+  RUN_LOOP (_Float16, int64_t)
+  RUN_LOOP (float, int64_t)
+  RUN_LOOP (double, int64_t)
+  RUN_LOOP (int8_t, uint8_t)
+  RUN_LOOP (uint8_t, uint8_t)
+  RUN_LOOP (int16_t, uint8_t)
+  RUN_LOOP (uint16_t, uint8_t)
+  RUN_LOOP (int32_t, uint8_t)
+  RUN_LOOP (uint32_t, uint8_t)
+  RUN_LOOP (int64_t, uint8_t)
+  RUN_LOOP (uint64_t, uint8_t)
+  RUN_LOOP (_Float16, uint8_t)
+  RUN_LOOP (float, uint8_t)
+  RUN_LOOP (double, uint8_t)
+  RUN_LOOP (int8_t, uint16_t)
+  RUN_LOOP (uint8_t, uint16_t)
+  RUN_LOOP (int16_t, uint16_t)
+  RUN_LOOP (uint16_t, uint16_t)
+  RUN_LOOP (int32_t, uint16_t)
+  RUN_LOOP (uint32_t, uint16_t)
+  RUN_LOOP (int64_t, uint16_t)
+  RUN_LOOP (uint64_t, uint16_t)
+  RUN_LOOP (_Float16, uint16_t)
+  RUN_LOOP (float, uint16_t)
+  RUN_LOOP (double, uint16_t)
+  RUN_LOOP (int8_t, uint32_t)
+  RUN_LOOP (uint8_t, uint32_t)
+  RUN_LOOP (int16_t, uint32_t)
+  RUN_LOOP (uint16_t, uint32_t)
+  RUN_LOOP (int32_t, uint32_t)
+  RUN_LOOP (uint32_t, uint32_t)
+  RUN_LOOP (int64_t, uint32_t)
+  RUN_LOOP (uint64_t, uint32_t)
+  RUN_LOOP (_Float16, uint32_t)
+  RUN_LOOP (float, uint32_t)
+  RUN_LOOP (double, uint32_t)
+  RUN_LOOP (int8_t, uint64_t)
+  RUN_LOOP (uint8_t, uint64_t)
+  RUN_LOOP (int16_t, uint64_t)
+  RUN_LOOP (uint16_t, uint64_t)
+  RUN_LOOP (int32_t, uint64_t)
+  RUN_LOOP (uint32_t, uint64_t)
+  RUN_LOOP (int64_t, uint64_t)
+  RUN_LOOP (uint64_t, uint64_t)
+  RUN_LOOP (_Float16, uint64_t)
+  RUN_LOOP (float, uint64_t)
+  RUN_LOOP (double, uint64_t)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-2.c
new file mode 100644
index 00000000000..76c6df32e6c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-2.c
@@ -0,0 +1,41 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-2.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+	    == (dest2_##DATA_TYPE[i]                                           \
+		+ src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-3.c
new file mode 100644
index 00000000000..0fd64260082
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-3.c
@@ -0,0 +1,41 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-3.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+	    == (dest2_##DATA_TYPE[i]                                           \
+		+ src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-4.c
new file mode 100644
index 00000000000..069d232b912
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-4.c
@@ -0,0 +1,41 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-4.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+	    == (dest2_##DATA_TYPE[i]                                           \
+		+ src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-5.c
new file mode 100644
index 00000000000..499e555c1d0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-5.c
@@ -0,0 +1,41 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-5.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+	    == (dest2_##DATA_TYPE[i]                                           \
+		+ src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-6.c
new file mode 100644
index 00000000000..ec6587aa4e5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-6.c
@@ -0,0 +1,41 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-6.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+	    == (dest2_##DATA_TYPE[i]                                           \
+		+ src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-7.c
new file mode 100644
index 00000000000..c16287955a6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-7.c
@@ -0,0 +1,41 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-7.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+	    == (dest2_##DATA_TYPE[i]                                           \
+		+ src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-8.c
new file mode 100644
index 00000000000..e1744f60dbb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-8.c
@@ -0,0 +1,41 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-8.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+	    == (dest2_##DATA_TYPE[i]                                           \
+		+ src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-9.c
new file mode 100644
index 00000000000..3ad6d33087f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-9.c
@@ -0,0 +1,41 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-9.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+	    == (dest2_##DATA_TYPE[i]                                           \
+		+ src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-1.c
new file mode 100644
index 00000000000..a5de0deccbe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-1.c
@@ -0,0 +1,39 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 uint8_t
+#define INDEX16 uint16_t
+#define INDEX32 uint32_t
+#define INDEX64 uint64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-10.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-10.c
new file mode 100644
index 00000000000..74a0d05b37d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-10.c
@@ -0,0 +1,36 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX64 int64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 64)                                                               \
+  T (uint8_t, 64)                                                              \
+  T (int16_t, 64)                                                              \
+  T (uint16_t, 64)                                                             \
+  T (_Float16, 64)                                                             \
+  T (int32_t, 64)                                                              \
+  T (uint32_t, 64)                                                             \
+  T (float, 64)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-11.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-11.c
new file mode 100644
index 00000000000..98c5b4678b7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-11.c
@@ -0,0 +1,116 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-vect-cost-model -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_LOOP(DATA_TYPE, INDEX_TYPE)                                       \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE##_##INDEX_TYPE (DATA_TYPE *restrict y, DATA_TYPE *restrict x,  \
+				INDEX_TYPE *restrict index,                    \
+				INDEX_TYPE *restrict cond)                     \
+  {                                                                            \
+    for (int i = 0; i < 100; ++i)                                              \
+      {                                                                        \
+	if (cond[i * 2])                                                       \
+	  y[i * 2] = x[index[i * 2]] + 1;                                      \
+	if (cond[i * 2 + 1])                                                   \
+	  y[i * 2 + 1] = x[index[i * 2 + 1]] + 2;                              \
+      }                                                                        \
+  }
+
+TEST_LOOP (int8_t, int8_t)
+TEST_LOOP (uint8_t, int8_t)
+TEST_LOOP (int16_t, int8_t)
+TEST_LOOP (uint16_t, int8_t)
+TEST_LOOP (int32_t, int8_t)
+TEST_LOOP (uint32_t, int8_t)
+TEST_LOOP (int64_t, int8_t)
+TEST_LOOP (uint64_t, int8_t)
+TEST_LOOP (_Float16, int8_t)
+TEST_LOOP (float, int8_t)
+TEST_LOOP (double, int8_t)
+TEST_LOOP (int8_t, int16_t)
+TEST_LOOP (uint8_t, int16_t)
+TEST_LOOP (int16_t, int16_t)
+TEST_LOOP (uint16_t, int16_t)
+TEST_LOOP (int32_t, int16_t)
+TEST_LOOP (uint32_t, int16_t)
+TEST_LOOP (int64_t, int16_t)
+TEST_LOOP (uint64_t, int16_t)
+TEST_LOOP (_Float16, int16_t)
+TEST_LOOP (float, int16_t)
+TEST_LOOP (double, int16_t)
+TEST_LOOP (int8_t, int32_t)
+TEST_LOOP (uint8_t, int32_t)
+TEST_LOOP (int16_t, int32_t)
+TEST_LOOP (uint16_t, int32_t)
+TEST_LOOP (int32_t, int32_t)
+TEST_LOOP (uint32_t, int32_t)
+TEST_LOOP (int64_t, int32_t)
+TEST_LOOP (uint64_t, int32_t)
+TEST_LOOP (_Float16, int32_t)
+TEST_LOOP (float, int32_t)
+TEST_LOOP (double, int32_t)
+TEST_LOOP (int8_t, int64_t)
+TEST_LOOP (uint8_t, int64_t)
+TEST_LOOP (int16_t, int64_t)
+TEST_LOOP (uint16_t, int64_t)
+TEST_LOOP (int32_t, int64_t)
+TEST_LOOP (uint32_t, int64_t)
+TEST_LOOP (int64_t, int64_t)
+TEST_LOOP (uint64_t, int64_t)
+TEST_LOOP (_Float16, int64_t)
+TEST_LOOP (float, int64_t)
+TEST_LOOP (double, int64_t)
+TEST_LOOP (int8_t, uint8_t)
+TEST_LOOP (uint8_t, uint8_t)
+TEST_LOOP (int16_t, uint8_t)
+TEST_LOOP (uint16_t, uint8_t)
+TEST_LOOP (int32_t, uint8_t)
+TEST_LOOP (uint32_t, uint8_t)
+TEST_LOOP (int64_t, uint8_t)
+TEST_LOOP (uint64_t, uint8_t)
+TEST_LOOP (_Float16, uint8_t)
+TEST_LOOP (float, uint8_t)
+TEST_LOOP (double, uint8_t)
+TEST_LOOP (int8_t, uint16_t)
+TEST_LOOP (uint8_t, uint16_t)
+TEST_LOOP (int16_t, uint16_t)
+TEST_LOOP (uint16_t, uint16_t)
+TEST_LOOP (int32_t, uint16_t)
+TEST_LOOP (uint32_t, uint16_t)
+TEST_LOOP (int64_t, uint16_t)
+TEST_LOOP (uint64_t, uint16_t)
+TEST_LOOP (_Float16, uint16_t)
+TEST_LOOP (float, uint16_t)
+TEST_LOOP (double, uint16_t)
+TEST_LOOP (int8_t, uint32_t)
+TEST_LOOP (uint8_t, uint32_t)
+TEST_LOOP (int16_t, uint32_t)
+TEST_LOOP (uint16_t, uint32_t)
+TEST_LOOP (int32_t, uint32_t)
+TEST_LOOP (uint32_t, uint32_t)
+TEST_LOOP (int64_t, uint32_t)
+TEST_LOOP (uint64_t, uint32_t)
+TEST_LOOP (_Float16, uint32_t)
+TEST_LOOP (float, uint32_t)
+TEST_LOOP (double, uint32_t)
+TEST_LOOP (int8_t, uint64_t)
+TEST_LOOP (uint8_t, uint64_t)
+TEST_LOOP (int16_t, uint64_t)
+TEST_LOOP (uint16_t, uint64_t)
+TEST_LOOP (int32_t, uint64_t)
+TEST_LOOP (uint32_t, uint64_t)
+TEST_LOOP (int64_t, uint64_t)
+TEST_LOOP (uint64_t, uint64_t)
+TEST_LOOP (_Float16, uint64_t)
+TEST_LOOP (float, uint64_t)
+TEST_LOOP (double, uint64_t)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 88 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-assembler-not "vluxei64\.v" } } */
+/* { dg-final { scan-assembler-not "vsuxei64\.v" } } */
+/* { dg-final { scan-assembler-not {vlse64\.v\s+v[0-9]+,\s*0\([a-x0-9]+\),\s*zero} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-2.c
new file mode 100644
index 00000000000..03f84ce962c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-2.c
@@ -0,0 +1,39 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 int8_t
+#define INDEX16 int16_t
+#define INDEX32 int32_t
+#define INDEX64 int64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-3.c
new file mode 100644
index 00000000000..8578001ef41
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-3.c
@@ -0,0 +1,36 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 uint8_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 8)                                                               \
+  T (uint16_t, 8)                                                              \
+  T (_Float16, 8)                                                              \
+  T (int32_t, 8)                                                               \
+  T (uint32_t, 8)                                                              \
+  T (float, 8)                                                                 \
+  T (int64_t, 8)                                                               \
+  T (uint64_t, 8)                                                              \
+  T (double, 8)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-4.c
new file mode 100644
index 00000000000..b273caa0bfe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-4.c
@@ -0,0 +1,36 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 int8_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 8)                                                               \
+  T (uint16_t, 8)                                                              \
+  T (_Float16, 8)                                                              \
+  T (int32_t, 8)                                                               \
+  T (uint32_t, 8)                                                              \
+  T (float, 8)                                                                 \
+  T (int64_t, 8)                                                               \
+  T (uint64_t, 8)                                                              \
+  T (double, 8)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-5.c
new file mode 100644
index 00000000000..5055d886d62
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-5.c
@@ -0,0 +1,36 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX16 uint16_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 16)                                                               \
+  T (uint8_t, 16)                                                              \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 16)                                                              \
+  T (uint32_t, 16)                                                             \
+  T (float, 16)                                                                \
+  T (int64_t, 16)                                                              \
+  T (uint64_t, 16)                                                             \
+  T (double, 16)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-6.c
new file mode 100644
index 00000000000..2a4ae58588f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-6.c
@@ -0,0 +1,36 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX16 int16_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 16)                                                               \
+  T (uint8_t, 16)                                                              \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 16)                                                              \
+  T (uint32_t, 16)                                                             \
+  T (float, 16)                                                                \
+  T (int64_t, 16)                                                              \
+  T (uint64_t, 16)                                                             \
+  T (double, 16)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-7.c
new file mode 100644
index 00000000000..31d9414c549
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-7.c
@@ -0,0 +1,36 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX32 uint32_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 32)                                                               \
+  T (uint8_t, 32)                                                              \
+  T (int16_t, 32)                                                              \
+  T (uint16_t, 32)                                                             \
+  T (_Float16, 32)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 32)                                                              \
+  T (uint64_t, 32)                                                             \
+  T (double, 32)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-8.c
new file mode 100644
index 00000000000..73ed23042fb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-8.c
@@ -0,0 +1,36 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX32 int32_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 32)                                                               \
+  T (uint8_t, 32)                                                              \
+  T (int16_t, 32)                                                              \
+  T (uint16_t, 32)                                                             \
+  T (_Float16, 32)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 32)                                                              \
+  T (uint64_t, 32)                                                             \
+  T (double, 32)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-9.c
new file mode 100644
index 00000000000..2f64e805759
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-9.c
@@ -0,0 +1,36 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX64 uint64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 64)                                                               \
+  T (uint8_t, 64)                                                              \
+  T (int16_t, 64)                                                              \
+  T (uint16_t, 64)                                                             \
+  T (_Float16, 64)                                                             \
+  T (int32_t, 64)                                                              \
+  T (uint32_t, 64)                                                             \
+  T (float, 64)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-1.c
new file mode 100644
index 00000000000..41f60bd88b9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-1.c
@@ -0,0 +1,48 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_gather_load-1.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[i]                                            \
+		== (dest2_##DATA_TYPE[i]                                       \
+		    + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-10.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-10.c
new file mode 100644
index 00000000000..9840434fa41
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-10.c
@@ -0,0 +1,48 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_gather_load-10.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[i]                                            \
+		== (dest2_##DATA_TYPE[i]                                       \
+		    + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c
new file mode 100644
index 00000000000..105c706dbf9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c
@@ -0,0 +1,140 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "-mcmodel=medany" } */
+
+#include "mask_gather_load-11.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, INDEX_TYPE)                                        \
+  DATA_TYPE dest_##DATA_TYPE##_##INDEX_TYPE[202] = {0};                        \
+  DATA_TYPE dest2_##DATA_TYPE##_##INDEX_TYPE[202] = {0};                       \
+  DATA_TYPE src_##DATA_TYPE##_##INDEX_TYPE[202] = {0};                         \
+  INDEX_TYPE index_##DATA_TYPE##_##INDEX_TYPE[202] = {0};                      \
+  INDEX_TYPE cond_##DATA_TYPE##_##INDEX_TYPE[202] = {0};                       \
+  for (int i = 0; i < 202; i++)                                                \
+    {                                                                          \
+      src_##DATA_TYPE##_##INDEX_TYPE[i]                                        \
+	= (DATA_TYPE) ((i * 19 + 735) & (sizeof (DATA_TYPE) * 7 - 1));         \
+      dest_##DATA_TYPE##_##INDEX_TYPE[i]                                       \
+	= (DATA_TYPE) ((i * 7 + 666) & (sizeof (DATA_TYPE) * 5 - 1));          \
+      dest2_##DATA_TYPE##_##INDEX_TYPE[i]                                      \
+	= (DATA_TYPE) ((i * 7 + 666) & (sizeof (DATA_TYPE) * 5 - 1));          \
+      index_##DATA_TYPE##_##INDEX_TYPE[i] = (i * 7) % (55);                    \
+      cond_##DATA_TYPE##_##INDEX_TYPE[i] = (INDEX_TYPE) ((i & 0x3) == 3);      \
+    }                                                                          \
+  f_##DATA_TYPE##_##INDEX_TYPE (dest_##DATA_TYPE##_##INDEX_TYPE,               \
+				src_##DATA_TYPE##_##INDEX_TYPE,                \
+				index_##DATA_TYPE##_##INDEX_TYPE,              \
+				cond_##DATA_TYPE##_##INDEX_TYPE);              \
+  for (int i = 0; i < 100; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##INDEX_TYPE[i * 2])                              \
+	assert (dest_##DATA_TYPE##_##INDEX_TYPE[i * 2]                         \
+		== (src_##DATA_TYPE##_##INDEX_TYPE                             \
+		      [index_##DATA_TYPE##_##INDEX_TYPE[i * 2]]                \
+		    + 1));                                                     \
+      else                                                                     \
+	assert (dest_##DATA_TYPE##_##INDEX_TYPE[i * 2]                         \
+		== dest2_##DATA_TYPE##_##INDEX_TYPE[i * 2]);                   \
+      if (cond_##DATA_TYPE##_##INDEX_TYPE[i * 2 + 1])                          \
+	assert (dest_##DATA_TYPE##_##INDEX_TYPE[i * 2 + 1]                     \
+		== (src_##DATA_TYPE##_##INDEX_TYPE                             \
+		      [index_##DATA_TYPE##_##INDEX_TYPE[i * 2 + 1]]            \
+		    + 2));                                                     \
+      else                                                                     \
+	assert (dest_##DATA_TYPE##_##INDEX_TYPE[i * 2 + 1]                     \
+		== dest2_##DATA_TYPE##_##INDEX_TYPE[i * 2 + 1]);               \
+    }
+
+  RUN_LOOP (int8_t, int8_t)
+  RUN_LOOP (uint8_t, int8_t)
+  RUN_LOOP (int16_t, int8_t)
+  RUN_LOOP (uint16_t, int8_t)
+  RUN_LOOP (int32_t, int8_t)
+  RUN_LOOP (uint32_t, int8_t)
+  RUN_LOOP (int64_t, int8_t)
+  RUN_LOOP (uint64_t, int8_t)
+  RUN_LOOP (_Float16, int8_t)
+  RUN_LOOP (float, int8_t)
+  RUN_LOOP (double, int8_t)
+  RUN_LOOP (int8_t, int16_t)
+  RUN_LOOP (uint8_t, int16_t)
+  RUN_LOOP (int16_t, int16_t)
+  RUN_LOOP (uint16_t, int16_t)
+  RUN_LOOP (int32_t, int16_t)
+  RUN_LOOP (uint32_t, int16_t)
+  RUN_LOOP (int64_t, int16_t)
+  RUN_LOOP (uint64_t, int16_t)
+  RUN_LOOP (_Float16, int16_t)
+  RUN_LOOP (float, int16_t)
+  RUN_LOOP (double, int16_t)
+  RUN_LOOP (int8_t, int32_t)
+  RUN_LOOP (uint8_t, int32_t)
+  RUN_LOOP (int16_t, int32_t)
+  RUN_LOOP (uint16_t, int32_t)
+  RUN_LOOP (int32_t, int32_t)
+  RUN_LOOP (uint32_t, int32_t)
+  RUN_LOOP (int64_t, int32_t)
+  RUN_LOOP (uint64_t, int32_t)
+  RUN_LOOP (_Float16, int32_t)
+  RUN_LOOP (float, int32_t)
+  RUN_LOOP (double, int32_t)
+  RUN_LOOP (int8_t, int64_t)
+  RUN_LOOP (uint8_t, int64_t)
+  RUN_LOOP (int16_t, int64_t)
+  RUN_LOOP (uint16_t, int64_t)
+  RUN_LOOP (int32_t, int64_t)
+  RUN_LOOP (uint32_t, int64_t)
+  RUN_LOOP (int64_t, int64_t)
+  RUN_LOOP (uint64_t, int64_t)
+  RUN_LOOP (_Float16, int64_t)
+  RUN_LOOP (float, int64_t)
+  RUN_LOOP (double, int64_t)
+  RUN_LOOP (int8_t, uint8_t)
+  RUN_LOOP (uint8_t, uint8_t)
+  RUN_LOOP (int16_t, uint8_t)
+  RUN_LOOP (uint16_t, uint8_t)
+  RUN_LOOP (int32_t, uint8_t)
+  RUN_LOOP (uint32_t, uint8_t)
+  RUN_LOOP (int64_t, uint8_t)
+  RUN_LOOP (uint64_t, uint8_t)
+  RUN_LOOP (_Float16, uint8_t)
+  RUN_LOOP (float, uint8_t)
+  RUN_LOOP (double, uint8_t)
+  RUN_LOOP (int8_t, uint16_t)
+  RUN_LOOP (uint8_t, uint16_t)
+  RUN_LOOP (int16_t, uint16_t)
+  RUN_LOOP (uint16_t, uint16_t)
+  RUN_LOOP (int32_t, uint16_t)
+  RUN_LOOP (uint32_t, uint16_t)
+  RUN_LOOP (int64_t, uint16_t)
+  RUN_LOOP (uint64_t, uint16_t)
+  RUN_LOOP (_Float16, uint16_t)
+  RUN_LOOP (float, uint16_t)
+  RUN_LOOP (double, uint16_t)
+  RUN_LOOP (int8_t, uint32_t)
+  RUN_LOOP (uint8_t, uint32_t)
+  RUN_LOOP (int16_t, uint32_t)
+  RUN_LOOP (uint16_t, uint32_t)
+  RUN_LOOP (int32_t, uint32_t)
+  RUN_LOOP (uint32_t, uint32_t)
+  RUN_LOOP (int64_t, uint32_t)
+  RUN_LOOP (uint64_t, uint32_t)
+  RUN_LOOP (_Float16, uint32_t)
+  RUN_LOOP (float, uint32_t)
+  RUN_LOOP (double, uint32_t)
+  RUN_LOOP (int8_t, uint64_t)
+  RUN_LOOP (uint8_t, uint64_t)
+  RUN_LOOP (int16_t, uint64_t)
+  RUN_LOOP (uint16_t, uint64_t)
+  RUN_LOOP (int32_t, uint64_t)
+  RUN_LOOP (uint32_t, uint64_t)
+  RUN_LOOP (int64_t, uint64_t)
+  RUN_LOOP (uint64_t, uint64_t)
+  RUN_LOOP (_Float16, uint64_t)
+  RUN_LOOP (float, uint64_t)
+  RUN_LOOP (double, uint64_t)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-2.c
new file mode 100644
index 00000000000..33ddb5d9909
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-2.c
@@ -0,0 +1,48 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_gather_load-2.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[i]                                            \
+		== (dest2_##DATA_TYPE[i]                                       \
+		    + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-3.c
new file mode 100644
index 00000000000..9f06fbe4ecf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-3.c
@@ -0,0 +1,48 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_gather_load-3.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[i]                                            \
+		== (dest2_##DATA_TYPE[i]                                       \
+		    + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-4.c
new file mode 100644
index 00000000000..ae578f0c7b6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-4.c
@@ -0,0 +1,48 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_gather_load-4.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[i]                                            \
+		== (dest2_##DATA_TYPE[i]                                       \
+		    + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-5.c
new file mode 100644
index 00000000000..741abd166e1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-5.c
@@ -0,0 +1,48 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_gather_load-5.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[i]                                            \
+		== (dest2_##DATA_TYPE[i]                                       \
+		    + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-6.c
new file mode 100644
index 00000000000..a14a5c4ced1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-6.c
@@ -0,0 +1,48 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_gather_load-6.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[i]                                            \
+		== (dest2_##DATA_TYPE[i]                                       \
+		    + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-7.c
new file mode 100644
index 00000000000..0ccc7dce166
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-7.c
@@ -0,0 +1,48 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_gather_load-7.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[i]                                            \
+		== (dest2_##DATA_TYPE[i]                                       \
+		    + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-8.c
new file mode 100644
index 00000000000..a34688ff339
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-8.c
@@ -0,0 +1,48 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "-mcmodel=medany" } */
+
+#include "mask_gather_load-8.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[i]                                            \
+		== (dest2_##DATA_TYPE[i]                                       \
+		    + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-9.c
new file mode 100644
index 00000000000..1cfdede2060
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-9.c
@@ -0,0 +1,48 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_gather_load-9.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[i]                                            \
+		== (dest2_##DATA_TYPE[i]                                       \
+		    + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-1.c
new file mode 100644
index 00000000000..623de41267b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-1.c
@@ -0,0 +1,39 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 uint8_t
+#define INDEX16 uint16_t
+#define INDEX32 uint32_t
+#define INDEX64 uint64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-10.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-10.c
new file mode 100644
index 00000000000..55112b067fa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-10.c
@@ -0,0 +1,36 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX64 int64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 64)                                                                \
+  T (uint8_t, 64)                                                               \
+  T (int16_t, 64)                                                              \
+  T (uint16_t, 64)                                                             \
+  T (_Float16, 64)                                                             \
+  T (int32_t, 64)                                                              \
+  T (uint32_t, 64)                                                             \
+  T (float, 64)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-2.c
new file mode 100644
index 00000000000..32a572d0064
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-2.c
@@ -0,0 +1,39 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 int8_t
+#define INDEX16 int16_t
+#define INDEX32 int32_t
+#define INDEX64 int64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-3.c
new file mode 100644
index 00000000000..fbaaa9d8a8e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-3.c
@@ -0,0 +1,36 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 uint8_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 8)                                                              \
+  T (uint16_t, 8)                                                             \
+  T (_Float16, 8)                                                             \
+  T (int32_t, 8)                                                              \
+  T (uint32_t, 8)                                                             \
+  T (float, 8)                                                                \
+  T (int64_t, 8)                                                              \
+  T (uint64_t, 8)                                                             \
+  T (double, 8)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-4.c
new file mode 100644
index 00000000000..9b08661f8e6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-4.c
@@ -0,0 +1,36 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 int8_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 8)                                                              \
+  T (uint16_t, 8)                                                             \
+  T (_Float16, 8)                                                             \
+  T (int32_t, 8)                                                              \
+  T (uint32_t, 8)                                                             \
+  T (float, 8)                                                                \
+  T (int64_t, 8)                                                              \
+  T (uint64_t, 8)                                                             \
+  T (double, 8)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-5.c
new file mode 100644
index 00000000000..dd26635f2cb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-5.c
@@ -0,0 +1,36 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX16 uint16_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 16)                                                                \
+  T (uint8_t, 16)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 16)                                                              \
+  T (uint32_t, 16)                                                             \
+  T (float, 16)                                                                \
+  T (int64_t, 16)                                                              \
+  T (uint64_t, 16)                                                             \
+  T (double, 16)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-6.c
new file mode 100644
index 00000000000..fa0206a0ec2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-6.c
@@ -0,0 +1,36 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX16 int16_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 16)                                                                \
+  T (uint8_t, 16)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 16)                                                              \
+  T (uint32_t, 16)                                                             \
+  T (float, 16)                                                                \
+  T (int64_t, 16)                                                              \
+  T (uint64_t, 16)                                                             \
+  T (double, 16)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-7.c
new file mode 100644
index 00000000000..325e86c26a8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-7.c
@@ -0,0 +1,36 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX32 uint32_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 32)                                                                \
+  T (uint8_t, 32)                                                               \
+  T (int16_t, 32)                                                              \
+  T (uint16_t, 32)                                                             \
+  T (_Float16, 32)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 32)                                                              \
+  T (uint64_t, 32)                                                             \
+  T (double, 32)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-8.c
new file mode 100644
index 00000000000..b4b84e9cdda
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-8.c
@@ -0,0 +1,36 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX32 int32_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 32)                                                                \
+  T (uint8_t, 32)                                                               \
+  T (int16_t, 32)                                                              \
+  T (uint16_t, 32)                                                             \
+  T (_Float16, 32)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 32)                                                              \
+  T (uint64_t, 32)                                                             \
+  T (double, 32)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-9.c
new file mode 100644
index 00000000000..77a9af953e0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-9.c
@@ -0,0 +1,36 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX64 uint64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 64)                                                                \
+  T (uint8_t, 64)                                                               \
+  T (int16_t, 64)                                                              \
+  T (uint16_t, 64)                                                             \
+  T (_Float16, 64)                                                             \
+  T (int32_t, 64)                                                              \
+  T (uint32_t, 64)                                                             \
+  T (float, 64)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-1.c
new file mode 100644
index 00000000000..e0d52bf6291
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-1.c
@@ -0,0 +1,48 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-1.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-10.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-10.c
new file mode 100644
index 00000000000..c1af0d30e62
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-10.c
@@ -0,0 +1,48 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-10.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-2.c
new file mode 100644
index 00000000000..6b1b02eae35
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-2.c
@@ -0,0 +1,48 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-2.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-3.c
new file mode 100644
index 00000000000..cef0bdec1d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-3.c
@@ -0,0 +1,48 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-3.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-4.c
new file mode 100644
index 00000000000..88a74d5a632
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-4.c
@@ -0,0 +1,48 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-4.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-5.c
new file mode 100644
index 00000000000..06804ab7111
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-5.c
@@ -0,0 +1,48 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-5.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-6.c
new file mode 100644
index 00000000000..c6c9a676ed6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-6.c
@@ -0,0 +1,48 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-6.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-7.c
new file mode 100644
index 00000000000..8246e964aad
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-7.c
@@ -0,0 +1,48 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-7.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-8.c
new file mode 100644
index 00000000000..8ee35d2e505
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-8.c
@@ -0,0 +1,48 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-8.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-9.c
new file mode 100644
index 00000000000..c27a673e2b0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-9.c
@@ -0,0 +1,48 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-9.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-1.c
new file mode 100644
index 00000000000..6a390261cfb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-1.c
@@ -0,0 +1,38 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 uint8_t
+#define INDEX16 uint16_t
+#define INDEX32 uint32_t
+#define INDEX64 uint64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-10.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-10.c
new file mode 100644
index 00000000000..feb58d7d458
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-10.c
@@ -0,0 +1,35 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX64 int64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 64)                                                                \
+  T (uint8_t, 64)                                                               \
+  T (int16_t, 64)                                                               \
+  T (uint16_t, 64)                                                              \
+  T (_Float16, 64)                                                              \
+  T (int32_t, 64)                                                               \
+  T (uint32_t, 64)                                                              \
+  T (float, 64)                                                                 \
+  T (int64_t, 64)                                                               \
+  T (uint64_t, 64)                                                              \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-2.c
new file mode 100644
index 00000000000..e4c587fd7bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-2.c
@@ -0,0 +1,38 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 int8_t
+#define INDEX16 int16_t
+#define INDEX32 int32_t
+#define INDEX64 int64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-3.c
new file mode 100644
index 00000000000..33ad256d3db
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-3.c
@@ -0,0 +1,35 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 uint8_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 8)                                                               \
+  T (uint16_t, 8)                                                              \
+  T (_Float16, 8)                                                              \
+  T (int32_t, 8)                                                               \
+  T (uint32_t, 8)                                                              \
+  T (float, 8)                                                                 \
+  T (int64_t, 8)                                                               \
+  T (uint64_t, 8)                                                              \
+  T (double, 8)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-4.c
new file mode 100644
index 00000000000..48d305623e6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-4.c
@@ -0,0 +1,35 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 int8_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 8)                                                               \
+  T (uint16_t, 8)                                                              \
+  T (_Float16, 8)                                                              \
+  T (int32_t, 8)                                                               \
+  T (uint32_t, 8)                                                              \
+  T (float, 8)                                                                 \
+  T (int64_t, 8)                                                               \
+  T (uint64_t, 8)                                                              \
+  T (double, 8)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-5.c
new file mode 100644
index 00000000000..83ddc44bf9c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-5.c
@@ -0,0 +1,35 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX16 uint16_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 16)                                                                \
+  T (uint8_t, 16)                                                               \
+  T (int16_t, 16)                                                               \
+  T (uint16_t, 16)                                                              \
+  T (_Float16, 16)                                                              \
+  T (int32_t, 16)                                                               \
+  T (uint32_t, 16)                                                              \
+  T (float, 16)                                                                 \
+  T (int64_t, 16)                                                               \
+  T (uint64_t, 16)                                                              \
+  T (double, 16)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-6.c
new file mode 100644
index 00000000000..11eb68bdb13
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-6.c
@@ -0,0 +1,35 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX16 int16_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 16)                                                                \
+  T (uint8_t, 16)                                                               \
+  T (int16_t, 16)                                                               \
+  T (uint16_t, 16)                                                              \
+  T (_Float16, 16)                                                              \
+  T (int32_t, 16)                                                               \
+  T (uint32_t, 16)                                                              \
+  T (float, 16)                                                                 \
+  T (int64_t, 16)                                                               \
+  T (uint64_t, 16)                                                              \
+  T (double, 16)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-7.c
new file mode 100644
index 00000000000..2e323477258
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-7.c
@@ -0,0 +1,35 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX32 uint32_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 32)                                                                \
+  T (uint8_t, 32)                                                               \
+  T (int16_t, 32)                                                               \
+  T (uint16_t, 32)                                                              \
+  T (_Float16, 32)                                                              \
+  T (int32_t, 32)                                                               \
+  T (uint32_t, 32)                                                              \
+  T (float, 32)                                                                 \
+  T (int64_t, 32)                                                               \
+  T (uint64_t, 32)                                                              \
+  T (double, 32)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-8.c
new file mode 100644
index 00000000000..e6732fe3790
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-8.c
@@ -0,0 +1,35 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX32 int32_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 32)                                                                \
+  T (uint8_t, 32)                                                               \
+  T (int16_t, 32)                                                               \
+  T (uint16_t, 32)                                                              \
+  T (_Float16, 32)                                                              \
+  T (int32_t, 32)                                                               \
+  T (uint32_t, 32)                                                              \
+  T (float, 32)                                                                 \
+  T (int64_t, 32)                                                               \
+  T (uint64_t, 32)                                                              \
+  T (double, 32)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-9.c
new file mode 100644
index 00000000000..766a52b4622
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-9.c
@@ -0,0 +1,35 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX64 uint64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 64)                                                                \
+  T (uint8_t, 64)                                                               \
+  T (int16_t, 64)                                                               \
+  T (uint16_t, 64)                                                              \
+  T (_Float16, 64)                                                              \
+  T (int32_t, 64)                                                               \
+  T (uint32_t, 64)                                                              \
+  T (float, 64)                                                                 \
+  T (int64_t, 64)                                                               \
+  T (uint64_t, 64)                                                              \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-1.c
new file mode 100644
index 00000000000..cafa64f3527
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-1.c
@@ -0,0 +1,40 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "scatter_store-1.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+	    == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-10.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-10.c
new file mode 100644
index 00000000000..79f6885831f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-10.c
@@ -0,0 +1,40 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "scatter_store-10.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+	    == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-2.c
new file mode 100644
index 00000000000..376db088153
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-2.c
@@ -0,0 +1,40 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "-mcmodel=medany" } */
+
+#include "scatter_store-2.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+	    == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-3.c
new file mode 100644
index 00000000000..103b8649d38
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-3.c
@@ -0,0 +1,40 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "scatter_store-3.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+	    == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-4.c
new file mode 100644
index 00000000000..f5f89c0fb4f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-4.c
@@ -0,0 +1,40 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "scatter_store-4.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+	    == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-5.c
new file mode 100644
index 00000000000..049251ec888
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-5.c
@@ -0,0 +1,40 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "scatter_store-5.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+	    == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-6.c
new file mode 100644
index 00000000000..59c8e701dbe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-6.c
@@ -0,0 +1,40 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "scatter_store-6.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+	    == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-7.c
new file mode 100644
index 00000000000..a24401181e3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-7.c
@@ -0,0 +1,40 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "scatter_store-7.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+	    == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-8.c
new file mode 100644
index 00000000000..080c9b83363
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-8.c
@@ -0,0 +1,40 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "scatter_store-8.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+	    == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-9.c
new file mode 100644
index 00000000000..cc9f20f0daa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-9.c
@@ -0,0 +1,40 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "-mcmodel=medany" } */
+
+#include "scatter_store-9.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+	    == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c
new file mode 100644
index 00000000000..c7b990668c7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c
@@ -0,0 +1,46 @@ 
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3 --param riscv-autovec-preference=scalable -fno-vect-cost-model -ffast-math -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+#ifndef INDEX8
+#define INDEX8 int8_t
+#define INDEX16 int16_t
+#define INDEX32 int32_t
+#define INDEX64 int64_t
+#endif
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,   \
+			  INDEX##BITS stride, INDEX##BITS n)                   \
+  {                                                                            \
+    for (INDEX##BITS i = 0; i < n; ++i)                                        \
+      dest[i] += src[i * stride];                                              \
+  }
+
+#define TEST_TYPE(T, DATA_TYPE)                                                \
+  T (DATA_TYPE, 8)                                                             \
+  T (DATA_TYPE, 16)                                                            \
+  T (DATA_TYPE, 32)                                                            \
+  T (DATA_TYPE, 64)
+
+#define TEST_ALL(T)                                                            \
+  TEST_TYPE (T, int8_t)                                                        \
+  TEST_TYPE (T, uint8_t)                                                       \
+  TEST_TYPE (T, int16_t)                                                       \
+  TEST_TYPE (T, uint16_t)                                                      \
+  TEST_TYPE (T, _Float16)                                                      \
+  TEST_TYPE (T, int32_t)                                                       \
+  TEST_TYPE (T, uint32_t)                                                      \
+  TEST_TYPE (T, float)                                                         \
+  TEST_TYPE (T, int64_t)                                                       \
+  TEST_TYPE (T, uint64_t)                                                      \
+  TEST_TYPE (T, double)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times " \.LEN_MASK_GATHER_LOAD" 66 "optimized" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "optimized" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "optimized" } } */
+/* { dg-final { scan-assembler-not "vluxei" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c
new file mode 100644
index 00000000000..37dd7291f9e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c
@@ -0,0 +1,46 @@ 
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3 --param riscv-autovec-preference=scalable -fno-vect-cost-model -ffast-math -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+#ifndef INDEX8
+#define INDEX8 int8_t
+#define INDEX16 int16_t
+#define INDEX32 int32_t
+#define INDEX64 int64_t
+#endif
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,   \
+			  INDEX##BITS stride, INDEX##BITS n)                   \
+  {                                                                            \
+    for (INDEX##BITS i = 0; i < (BITS + 13); ++i)                              \
+      dest[i] += src[i * (BITS - 3)];                                          \
+  }
+
+#define TEST_TYPE(T, DATA_TYPE)                                                \
+  T (DATA_TYPE, 8)                                                             \
+  T (DATA_TYPE, 16)                                                            \
+  T (DATA_TYPE, 32)                                                            \
+  T (DATA_TYPE, 64)
+
+#define TEST_ALL(T)                                                            \
+  TEST_TYPE (T, int8_t)                                                        \
+  TEST_TYPE (T, uint8_t)                                                       \
+  TEST_TYPE (T, int16_t)                                                       \
+  TEST_TYPE (T, uint16_t)                                                      \
+  TEST_TYPE (T, _Float16)                                                      \
+  TEST_TYPE (T, int32_t)                                                       \
+  TEST_TYPE (T, uint32_t)                                                      \
+  TEST_TYPE (T, float)                                                         \
+  TEST_TYPE (T, int64_t)                                                       \
+  TEST_TYPE (T, uint64_t)                                                      \
+  TEST_TYPE (T, double)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times " \.LEN_MASK_GATHER_LOAD" 46 "optimized" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "optimized" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "optimized" } } */
+/* { dg-final { scan-assembler-not "vluxei" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c
new file mode 100644
index 00000000000..4b03c25a907
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c
@@ -0,0 +1,84 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+#include "strided_load-1.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];               \
+  DATA_TYPE dest2_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];              \
+  DATA_TYPE src_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];                \
+  INDEX##BITS stride_##DATA_TYPE##_##BITS = (BITS - 3);                        \
+  INDEX##BITS n_##DATA_TYPE##_##BITS = (BITS + 13);                            \
+  for (INDEX##BITS i = 0;                                                      \
+       i < stride_##DATA_TYPE##_##BITS * n_##DATA_TYPE##_##BITS; i++)          \
+    {                                                                          \
+      dest_##DATA_TYPE##_##BITS[i]                                             \
+	= (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));                           \
+      dest2_##DATA_TYPE##_##BITS[i]                                            \
+	= (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));                           \
+      src_##DATA_TYPE##_##BITS[i]                                              \
+	= (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));                          \
+    }                                                                          \
+  f_##DATA_TYPE##_##BITS (dest_##DATA_TYPE##_##BITS, src_##DATA_TYPE##_##BITS, \
+			  stride_##DATA_TYPE##_##BITS,                         \
+			  n_##DATA_TYPE##_##BITS);                             \
+  for (int i = 0; i < n_##DATA_TYPE##_##BITS; i++)                             \
+    {                                                                          \
+      assert (                                                                 \
+	dest_##DATA_TYPE##_##BITS[i]                                           \
+	== (dest2_##DATA_TYPE##_##BITS[i]                                      \
+	    + src_##DATA_TYPE##_##BITS[i * stride_##DATA_TYPE##_##BITS]));     \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c
new file mode 100644
index 00000000000..8499e4cef24
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c
@@ -0,0 +1,84 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+#include "strided_load-2.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];               \
+  DATA_TYPE dest2_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];              \
+  DATA_TYPE src_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];                \
+  INDEX##BITS stride_##DATA_TYPE##_##BITS = (BITS - 3);                        \
+  INDEX##BITS n_##DATA_TYPE##_##BITS = (BITS + 13);                            \
+  for (INDEX##BITS i = 0;                                                      \
+       i < stride_##DATA_TYPE##_##BITS * n_##DATA_TYPE##_##BITS; i++)          \
+    {                                                                          \
+      dest_##DATA_TYPE##_##BITS[i]                                             \
+	= (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));                           \
+      dest2_##DATA_TYPE##_##BITS[i]                                            \
+	= (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));                           \
+      src_##DATA_TYPE##_##BITS[i]                                              \
+	= (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));                          \
+    }                                                                          \
+  f_##DATA_TYPE##_##BITS (dest_##DATA_TYPE##_##BITS, src_##DATA_TYPE##_##BITS, \
+			  stride_##DATA_TYPE##_##BITS,                         \
+			  n_##DATA_TYPE##_##BITS);                             \
+  for (int i = 0; i < n_##DATA_TYPE##_##BITS; i++)                             \
+    {                                                                          \
+      assert (                                                                 \
+	dest_##DATA_TYPE##_##BITS[i]                                           \
+	== (dest2_##DATA_TYPE##_##BITS[i]                                      \
+	    + src_##DATA_TYPE##_##BITS[i * stride_##DATA_TYPE##_##BITS]));     \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c
new file mode 100644
index 00000000000..df0560c5a31
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c
@@ -0,0 +1,46 @@ 
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3 --param riscv-autovec-preference=scalable -fno-vect-cost-model -ffast-math -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+#ifndef INDEX8
+#define INDEX8 int8_t
+#define INDEX16 int16_t
+#define INDEX32 int32_t
+#define INDEX64 int64_t
+#endif
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,   \
+			  INDEX##BITS stride, INDEX##BITS n)                   \
+  {                                                                            \
+    for (INDEX##BITS i = 0; i < n; ++i)                                        \
+      dest[i * stride] = src[i] + BITS;                                        \
+  }
+
+#define TEST_TYPE(T, DATA_TYPE)                                                \
+  T (DATA_TYPE, 8)                                                             \
+  T (DATA_TYPE, 16)                                                            \
+  T (DATA_TYPE, 32)                                                            \
+  T (DATA_TYPE, 64)
+
+#define TEST_ALL(T)                                                            \
+  TEST_TYPE (T, int8_t)                                                        \
+  TEST_TYPE (T, uint8_t)                                                       \
+  TEST_TYPE (T, int16_t)                                                       \
+  TEST_TYPE (T, uint16_t)                                                      \
+  TEST_TYPE (T, _Float16)                                                      \
+  TEST_TYPE (T, int32_t)                                                       \
+  TEST_TYPE (T, uint32_t)                                                      \
+  TEST_TYPE (T, float)                                                         \
+  TEST_TYPE (T, int64_t)                                                       \
+  TEST_TYPE (T, uint64_t)                                                      \
+  TEST_TYPE (T, double)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times " \.LEN_MASK_SCATTER_STORE" 66 "optimized" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "optimized" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "optimized" } } */
+/* { dg-final { scan-assembler-not "vsuxei" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c
new file mode 100644
index 00000000000..1419cbc91b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c
@@ -0,0 +1,46 @@ 
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3 --param riscv-autovec-preference=scalable -fno-vect-cost-model -ffast-math -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+#ifndef INDEX8
+#define INDEX8 int8_t
+#define INDEX16 int16_t
+#define INDEX32 int32_t
+#define INDEX64 int64_t
+#endif
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,   \
+			  INDEX##BITS stride, INDEX##BITS n)                   \
+  {                                                                            \
+    for (INDEX##BITS i = 0; i < n; ++i)                                        \
+      dest[i * (BITS - 3)] = src[i] + BITS;                                    \
+  }
+
+#define TEST_TYPE(T, DATA_TYPE)                                                \
+  T (DATA_TYPE, 8)                                                             \
+  T (DATA_TYPE, 16)                                                            \
+  T (DATA_TYPE, 32)                                                            \
+  T (DATA_TYPE, 64)
+
+#define TEST_ALL(T)                                                            \
+  TEST_TYPE (T, int8_t)                                                        \
+  TEST_TYPE (T, uint8_t)                                                       \
+  TEST_TYPE (T, int16_t)                                                       \
+  TEST_TYPE (T, uint16_t)                                                      \
+  TEST_TYPE (T, _Float16)                                                      \
+  TEST_TYPE (T, int32_t)                                                       \
+  TEST_TYPE (T, uint32_t)                                                      \
+  TEST_TYPE (T, float)                                                         \
+  TEST_TYPE (T, int64_t)                                                       \
+  TEST_TYPE (T, uint64_t)                                                      \
+  TEST_TYPE (T, double)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times " \.LEN_MASK_SCATTER_STORE" 44 "optimized" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "optimized" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "optimized" } } */
+/* { dg-final { scan-assembler-not "vsuxei" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c
new file mode 100644
index 00000000000..e9dca4672c6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c
@@ -0,0 +1,82 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+#include "strided_store-1.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];               \
+  DATA_TYPE dest2_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];              \
+  DATA_TYPE src_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];                \
+  INDEX##BITS stride_##DATA_TYPE##_##BITS = (BITS - 3);                        \
+  INDEX##BITS n_##DATA_TYPE##_##BITS = (BITS + 13);                            \
+  for (INDEX##BITS i = 0;                                                      \
+       i < stride_##DATA_TYPE##_##BITS * n_##DATA_TYPE##_##BITS; i++)          \
+    {                                                                          \
+      dest_##DATA_TYPE##_##BITS[i]                                             \
+	= (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));                           \
+      dest2_##DATA_TYPE##_##BITS[i]                                            \
+	= (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));                           \
+      src_##DATA_TYPE##_##BITS[i]                                              \
+	= (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));                          \
+    }                                                                          \
+  f_##DATA_TYPE##_##BITS (dest_##DATA_TYPE##_##BITS, src_##DATA_TYPE##_##BITS, \
+			  stride_##DATA_TYPE##_##BITS,                         \
+			  n_##DATA_TYPE##_##BITS);                             \
+  for (int i = 0; i < n_##DATA_TYPE##_##BITS; i++)                             \
+    {                                                                          \
+      assert (dest_##DATA_TYPE##_##BITS[i * stride_##DATA_TYPE##_##BITS]       \
+	      == (src_##DATA_TYPE##_##BITS[i] + BITS));                        \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c
new file mode 100644
index 00000000000..509def789e6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c
@@ -0,0 +1,82 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+
+#include "strided_store-2.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];               \
+  DATA_TYPE dest2_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];              \
+  DATA_TYPE src_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];                \
+  INDEX##BITS stride_##DATA_TYPE##_##BITS = (BITS - 3);                        \
+  INDEX##BITS n_##DATA_TYPE##_##BITS = (BITS + 13);                            \
+  for (INDEX##BITS i = 0;                                                      \
+       i < stride_##DATA_TYPE##_##BITS * n_##DATA_TYPE##_##BITS; i++)          \
+    {                                                                          \
+      dest_##DATA_TYPE##_##BITS[i]                                             \
+	= (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));                           \
+      dest2_##DATA_TYPE##_##BITS[i]                                            \
+	= (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));                           \
+      src_##DATA_TYPE##_##BITS[i]                                              \
+	= (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));                          \
+    }                                                                          \
+  f_##DATA_TYPE##_##BITS (dest_##DATA_TYPE##_##BITS, src_##DATA_TYPE##_##BITS, \
+			  stride_##DATA_TYPE##_##BITS,                         \
+			  n_##DATA_TYPE##_##BITS);                             \
+  for (int i = 0; i < n_##DATA_TYPE##_##BITS; i++)                             \
+    {                                                                          \
+      assert (dest_##DATA_TYPE##_##BITS[i * stride_##DATA_TYPE##_##BITS]       \
+	      == (src_##DATA_TYPE##_##BITS[i] + BITS));                        \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
index 5e69235a268..19589fa9638 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
+++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
@@ -90,5 +90,28 @@  foreach op $AUTOVEC_TEST_OPTS {
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/vls-vlmax/*.\[cS\]]] \
 	"-std=c99 -O3 -ftree-vectorize --param riscv-autovec-preference=fixed-vlmax" $CFLAGS
 
+# gather-scatter tests
+set AUTOVEC_TEST_OPTS [list \
+  {-ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m1 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m2 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m4 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m8 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O2 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m1 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O2 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m2 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O2 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m4 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O2 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m8 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O3 --param riscv-autovec-preference=scalable --param riscv-autovec-lmul=m1 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O3 --param riscv-autovec-preference=scalable --param riscv-autovec-lmul=m2 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O3 --param riscv-autovec-preference=scalable --param riscv-autovec-lmul=m4 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O3 --param riscv-autovec-preference=scalable --param riscv-autovec-lmul=m8 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O2 --param riscv-autovec-preference=scalable --param riscv-autovec-lmul=m1 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O2 --param riscv-autovec-preference=scalable --param riscv-autovec-lmul=m2 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O2 --param riscv-autovec-preference=scalable --param riscv-autovec-lmul=m4 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O2 --param riscv-autovec-preference=scalable --param riscv-autovec-lmul=m8 -fno-vect-cost-model -ffast-math} ]
+foreach op $AUTOVEC_TEST_OPTS {
+  dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/gather-scatter/*.\[cS\]]] \
+    "" "$op"
+}
+
 # All done.
 dg-finish