RISC-V: RISC-V: Support gather_load/scatter RVV auto-vectorization

This patch fully support gather_load/scatter_store: 1. Support single-rgroup on both RV32/RV64. 2. Support indexed element width can be same as or smaller than Pmode. 3. Support VLA SLP with gather/scatter. 4. Fully tested all gather/scatter with LMUL = M1/M2/M4/M8 both VLA and VLS. 5. Fix bug of handling (subreg:SI (const_poly_int:DI)) 6. Fix bug on vec_perm which is used by gather/scatter SLP. All kinds of GATHER/SCATTER are normalized into LEN_MASK_*. We fully supported these 4 kinds of gather/scatter: 1. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with dummy length and dummy mask (Full vector). 2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with dummy length and real mask. 3. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with real length and dummy mask. 4. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with real length and real mask. Base on the disscussions with Richards, we don't lower vlse/vsse in RISC-V backend for strided load/store. Instead, we leave it to the middle-end to handle that. Regression is pass ok for trunk ? gcc/ChangeLog: * config/riscv/autovec.md (len_mask_gather_load<VNX1_QHSD:mode><VNX1_QHSDI:mode>): New pattern. (len_mask_gather_load<VNX2_QHSD:mode><VNX2_QHSDI:mode>): Ditto. (len_mask_gather_load<VNX4_QHSD:mode><VNX4_QHSDI:mode>): Ditto. (len_mask_gather_load<VNX8_QHSD:mode><VNX8_QHSDI:mode>): Ditto. (len_mask_gather_load<VNX16_QHSD:mode><VNX16_QHSDI:mode>): Ditto. (len_mask_gather_load<VNX32_QHS:mode><VNX32_QHSI:mode>): Ditto. (len_mask_gather_load<VNX64_QH:mode><VNX64_QHI:mode>): Ditto. (len_mask_gather_load<mode><mode>): Ditto. (len_mask_scatter_store<VNX1_QHSD:mode><VNX1_QHSDI:mode>): Ditto. (len_mask_scatter_store<VNX2_QHSD:mode><VNX2_QHSDI:mode>): Ditto. (len_mask_scatter_store<VNX4_QHSD:mode><VNX4_QHSDI:mode>): Ditto. (len_mask_scatter_store<VNX8_QHSD:mode><VNX8_QHSDI:mode>): Ditto. (len_mask_scatter_store<VNX16_QHSD:mode><VNX16_QHSDI:mode>): Ditto. (len_mask_scatter_store<VNX32_QHS:mode><VNX32_QHSI:mode>): Ditto. (len_mask_scatter_store<VNX64_QH:mode><VNX64_QHI:mode>): Ditto. (len_mask_scatter_store<mode><mode>): Ditto. * config/riscv/predicates.md (const_1_operand): New predicate. (vector_gs_scale_operand_16): Ditto. (vector_gs_scale_operand_32): Ditto. (vector_gs_scale_operand_64): Ditto. (vector_gs_extension_operand): Ditto. (vector_gs_scale_operand_16_rv32): Ditto. (vector_gs_scale_operand_32_rv32): Ditto. * config/riscv/riscv-protos.h (enum insn_type): Add gather/scatter. (expand_gather_scatter): New function. * config/riscv/riscv-v.cc (gen_const_vector_dup): Add gather/scatter. (emit_vlmax_masked_store_insn): New function. (emit_nonvlmax_masked_store_insn): Ditto. (modulo_sel_indices): Ditto. (expand_vec_perm): Fix SLP for gather/scatter. (prepare_gather_scatter): New function. (expand_gather_scatter): Ditto. * config/riscv/riscv.cc (riscv_legitimize_move): Fix bug of (subreg:SI (DI CONST_POLY_INT)). * config/riscv/vector-iterators.md: Add gather/scatter. * config/riscv/vector.md (vec_duplicate<mode>): Use "@" instead. (@vec_duplicate<mode>): Ditto. (@pred_indexed_<order>store<VNX16_QHS:mode><VNX16_QHSDI:mode>): Fix name. (@pred_indexed_<order>store<VNX16_QHSD:mode><VNX16_QHSDI:mode>): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/rvv.exp: Add gather/scatter tests. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-1.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-10.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-11.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-12.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-2.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-3.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-4.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-5.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-6.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-7.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-8.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-9.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-1.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-10.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-11.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-2.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-3.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-4.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-5.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-6.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-7.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-8.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-9.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-1.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-10.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-11.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-2.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-3.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-4.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-5.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-6.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-7.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-8.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-9.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-1.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-10.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-2.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-3.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-4.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-5.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-6.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-7.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-8.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-9.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-1.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-10.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-2.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-3.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-4.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-5.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-6.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-7.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-8.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-9.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-1.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-10.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-2.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-3.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-4.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-5.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-6.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-7.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-8.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-9.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-1.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-10.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-2.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-3.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-4.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-5.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-6.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-7.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-8.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-9.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-1.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-10.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-2.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-3.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-4.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-5.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-6.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-7.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-8.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-9.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c: New test.
2023-07-12 17:38:49 +08:00 · 2023-07-12 17:38:49 +08:00 · f048af2aa3
commit f048af2aa3
parent 15939bae35
102 changed files with 4987 additions and 62 deletions
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@ -57,6 +57,262 @@
  }
 )

+;; =========================================================================
+;; == Gather Load
+;; =========================================================================
+
+(define_expand "len_mask_gather_load<VNX1_QHSD:mode><VNX1_QHSDI:mode>"
+  [(match_operand:VNX1_QHSD 0 "register_operand")
+   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:VNX1_QHSDI 2 "register_operand")
+   (match_operand 3 "<VNX1_QHSD:gs_extension>")
+   (match_operand 4 "<VNX1_QHSD:gs_scale>")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX1_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, true);
+  DONE;
+})
+
+(define_expand "len_mask_gather_load<VNX2_QHSD:mode><VNX2_QHSDI:mode>"
+  [(match_operand:VNX2_QHSD 0 "register_operand")
+   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:VNX2_QHSDI 2 "register_operand")
+   (match_operand 3 "<VNX2_QHSD:gs_extension>")
+   (match_operand 4 "<VNX2_QHSD:gs_scale>")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX2_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, true);
+  DONE;
+})
+
+(define_expand "len_mask_gather_load<VNX4_QHSD:mode><VNX4_QHSDI:mode>"
+  [(match_operand:VNX4_QHSD 0 "register_operand")
+   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:VNX4_QHSDI 2 "register_operand")
+   (match_operand 3 "<VNX4_QHSD:gs_extension>")
+   (match_operand 4 "<VNX4_QHSD:gs_scale>")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX4_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, true);
+  DONE;
+})
+
+(define_expand "len_mask_gather_load<VNX8_QHSD:mode><VNX8_QHSDI:mode>"
+  [(match_operand:VNX8_QHSD 0 "register_operand")
+   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:VNX8_QHSDI 2 "register_operand")
+   (match_operand 3 "<VNX8_QHSD:gs_extension>")
+   (match_operand 4 "<VNX8_QHSD:gs_scale>")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX8_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, true);
+  DONE;
+})
+
+(define_expand "len_mask_gather_load<VNX16_QHSD:mode><VNX16_QHSDI:mode>"
+  [(match_operand:VNX16_QHSD 0 "register_operand")
+   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:VNX16_QHSDI 2 "register_operand")
+   (match_operand 3 "<VNX16_QHSD:gs_extension>")
+   (match_operand 4 "<VNX16_QHSD:gs_scale>")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX16_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, true);
+  DONE;
+})
+
+(define_expand "len_mask_gather_load<VNX32_QHS:mode><VNX32_QHSI:mode>"
+  [(match_operand:VNX32_QHS 0 "register_operand")
+   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:VNX32_QHSI 2 "register_operand")
+   (match_operand 3 "<VNX32_QHS:gs_extension>")
+   (match_operand 4 "<VNX32_QHS:gs_scale>")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX32_QHS:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, true);
+  DONE;
+})
+
+(define_expand "len_mask_gather_load<VNX64_QH:mode><VNX64_QHI:mode>"
+  [(match_operand:VNX64_QH 0 "register_operand")
+   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:VNX64_QHI 2 "register_operand")
+   (match_operand 3 "<VNX64_QH:gs_extension>")
+   (match_operand 4 "<VNX64_QH:gs_scale>")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX64_QH:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, true);
+  DONE;
+})
+
+;; When SEW = 8 and LMUL = 8, we can't find any index mode with
+;; larger SEW. Since RVV indexed load/store support zero extend
+;; implicitly and not support scaling, we should only allow
+;; operands[3] and operands[4] to be const_1_operand.
+(define_expand "len_mask_gather_load<mode><mode>"
+  [(match_operand:VNX128_Q 0 "register_operand")
+   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:VNX128_Q 2 "register_operand")
+   (match_operand 3 "const_1_operand")
+   (match_operand 4 "const_1_operand")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, true);
+  DONE;
+})
+
+;; =========================================================================
+;; == Scatter Store
+;; =========================================================================
+
+(define_expand "len_mask_scatter_store<VNX1_QHSD:mode><VNX1_QHSDI:mode>"
+  [(match_operand 0 "pmode_reg_or_0_operand")
+   (match_operand:VNX1_QHSDI 1 "register_operand")
+   (match_operand 2 "<VNX1_QHSD:gs_extension>")
+   (match_operand 3 "<VNX1_QHSD:gs_scale>")
+   (match_operand:VNX1_QHSD 4 "register_operand")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX1_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, false);
+  DONE;
+})
+
+(define_expand "len_mask_scatter_store<VNX2_QHSD:mode><VNX2_QHSDI:mode>"
+  [(match_operand 0 "pmode_reg_or_0_operand")
+   (match_operand:VNX2_QHSDI 1 "register_operand")
+   (match_operand 2 "<VNX2_QHSD:gs_extension>")
+   (match_operand 3 "<VNX2_QHSD:gs_scale>")
+   (match_operand:VNX2_QHSD 4 "register_operand")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX2_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, false);
+  DONE;
+})
+
+(define_expand "len_mask_scatter_store<VNX4_QHSD:mode><VNX4_QHSDI:mode>"
+  [(match_operand 0 "pmode_reg_or_0_operand")
+   (match_operand:VNX4_QHSDI 1 "register_operand")
+   (match_operand 2 "<VNX4_QHSD:gs_extension>")
+   (match_operand 3 "<VNX4_QHSD:gs_scale>")
+   (match_operand:VNX4_QHSD 4 "register_operand")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX4_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, false);
+  DONE;
+})
+
+(define_expand "len_mask_scatter_store<VNX8_QHSD:mode><VNX8_QHSDI:mode>"
+  [(match_operand 0 "pmode_reg_or_0_operand")
+   (match_operand:VNX8_QHSDI 1 "register_operand")
+   (match_operand 2 "<VNX8_QHSD:gs_extension>")
+   (match_operand 3 "<VNX8_QHSD:gs_scale>")
+   (match_operand:VNX8_QHSD 4 "register_operand")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX8_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, false);
+  DONE;
+})
+
+(define_expand "len_mask_scatter_store<VNX16_QHSD:mode><VNX16_QHSDI:mode>"
+  [(match_operand 0 "pmode_reg_or_0_operand")
+   (match_operand:VNX16_QHSDI 1 "register_operand")
+   (match_operand 2 "<VNX16_QHSD:gs_extension>")
+   (match_operand 3 "<VNX16_QHSD:gs_scale>")
+   (match_operand:VNX16_QHSD 4 "register_operand")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX16_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, false);
+  DONE;
+})
+
+(define_expand "len_mask_scatter_store<VNX32_QHS:mode><VNX32_QHSI:mode>"
+  [(match_operand 0 "pmode_reg_or_0_operand")
+   (match_operand:VNX32_QHSI 1 "register_operand")
+   (match_operand 2 "<VNX32_QHS:gs_extension>")
+   (match_operand 3 "<VNX32_QHS:gs_scale>")
+   (match_operand:VNX32_QHS 4 "register_operand")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX32_QHS:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, false);
+  DONE;
+})
+
+(define_expand "len_mask_scatter_store<VNX64_QH:mode><VNX64_QHI:mode>"
+  [(match_operand 0 "pmode_reg_or_0_operand")
+   (match_operand:VNX64_QHI 1 "register_operand")
+   (match_operand 2 "<VNX64_QH:gs_extension>")
+   (match_operand 3 "<VNX64_QH:gs_scale>")
+   (match_operand:VNX64_QH 4 "register_operand")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX64_QH:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, false);
+  DONE;
+})
+
+;; When SEW = 8 and LMUL = 8, we can't find any index mode with
+;; larger SEW. Since RVV indexed load/store support zero extend
+;; implicitly and not support scaling, we should only allow
+;; operands[3] and operands[4] to be const_1_operand.
+(define_expand "len_mask_scatter_store<mode><mode>"
+  [(match_operand 0 "pmode_reg_or_0_operand")
+   (match_operand:VNX128_Q 1 "register_operand")
+   (match_operand 2 "const_1_operand")
+   (match_operand 3 "const_1_operand")
+   (match_operand:VNX128_Q 4 "register_operand")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, false);
+  DONE;
+})
+
 ;; =========================================================================
 ;; == Vector creation
 ;; =========================================================================
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@ -61,6 +61,10 @@
  (and (match_code "const_int,const_wide_int,const_vector")
       (match_test "op == CONST0_RTX (GET_MODE (op))")))

+(define_predicate "const_1_operand"
+  (and (match_code "const_int,const_wide_int,const_vector")
+       (match_test "op == CONST1_RTX (GET_MODE (op))")))
+
 (define_predicate "reg_or_0_operand"
  (ior (match_operand 0 "const_0_operand")
       (match_operand 0 "register_operand")))
@ -341,6 +345,33 @@
  (ior (match_operand 0 "register_operand")
       (match_code "const_vector")))

+(define_predicate "vector_gs_scale_operand_16"
+  (and (match_code "const_int")
+       (match_test "INTVAL (op) == 1 || INTVAL (op) == 2")))
+
+(define_predicate "vector_gs_scale_operand_32"
+  (and (match_code "const_int")
+       (match_test "INTVAL (op) == 1 || INTVAL (op) == 4")))
+
+(define_predicate "vector_gs_scale_operand_64"
+  (and (match_code "const_int")
+       (match_test "INTVAL (op) == 1 || (INTVAL (op) == 8 && Pmode == DImode)")))
+
+(define_predicate "vector_gs_extension_operand"
+  (ior (match_operand 0 "const_1_operand")
+       (and (match_operand 0 "const_0_operand")
+            (match_test "Pmode == SImode"))))
+
+(define_predicate "vector_gs_scale_operand_16_rv32"
+  (and (match_code "const_int")
+       (match_test "INTVAL (op) == 1
+		    || (INTVAL (op) == 2 && Pmode == SImode)")))
+
+(define_predicate "vector_gs_scale_operand_32_rv32"
+  (and (match_code "const_int")
+       (match_test "INTVAL (op) == 1
+		    || (INTVAL (op) == 4 && Pmode == SImode)")))
+
 (define_predicate "ltge_operator"
  (match_code "lt,ltu,ge,geu"))

@ -376,7 +407,7 @@
 		|| rtx_equal_p (op, CONST0_RTX (GET_MODE (op))))
 		&& maybe_gt (GET_MODE_BITSIZE (GET_MODE (op)), GET_MODE_BITSIZE (Pmode)))")
    (ior (match_test "rtx_equal_p (op, CONST0_RTX (GET_MODE (op)))")
-         (ior (match_operand 0 "const_int_operand")
+         (ior (match_code "const_int,const_poly_int")
              (ior (match_operand 0 "register_operand")
                   (match_test "satisfies_constraint_Wdm (op)"))))))

--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@ -195,6 +195,8 @@ enum insn_type
  RVV_SCALAR_MOV_OP = 4, /* +1 for VUNDEF according to vector.md.  */
  RVV_SLIDE_OP = 4,      /* Dest, VUNDEF, source and offset.  */
  RVV_COMPRESS_OP = 4,
+  RVV_GATHER_M_OP = 5,
+  RVV_SCATTER_M_OP = 4,
 };
 enum vlmul_type
 {
@ -303,6 +305,7 @@ void expand_vec_init (rtx, rtx);
 void expand_vec_perm (rtx, rtx, rtx, rtx);
 void expand_select_vl (rtx *);
 void expand_load_store (rtx *, bool);
+void expand_gather_scatter (rtx *, bool);

 /* Rounding mode bitfield for fixed point VXRM.  */
 enum fixed_point_rounding_mode
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@ -556,16 +556,23 @@ const_vec_all_in_range_p (rtx vec, poly_int64 minval, poly_int64 maxval)
  return true;
 }

-/* Return a const_int vector of VAL.
-
-   This function also exists in aarch64, we may unify it in middle-end in the
-   future.  */
+/* Return a const vector of VAL. The VAL can be either const_int or
+   const_poly_int.  */

 static rtx
 gen_const_vector_dup (machine_mode mode, poly_int64 val)
 {
-  rtx c = gen_int_mode (val, GET_MODE_INNER (mode));
-  return gen_const_vec_duplicate (mode, c);
+  scalar_mode smode = GET_MODE_INNER (mode);
+  rtx c = gen_int_mode (val, smode);
+  if (!val.is_constant () && GET_MODE_SIZE (smode) > GET_MODE_SIZE (Pmode))
+    {
+      /* When VAL is const_poly_int value, we need to explicitly broadcast
+	 it into a vector using RVV broadcast instruction.  */
+      rtx dup = gen_reg_rtx (mode);
+      emit_insn (gen_vec_duplicate (mode, dup, c));
+      return dup;
+    }
+   return gen_const_vec_duplicate (mode, c);
 }

 /* Emit a vlmax vsetvl instruction.  This should only be used when
@ -901,6 +908,39 @@ emit_nonvlmax_masked_insn (unsigned icode, int op_num, rtx *ops, rtx avl)
  e.emit_insn ((enum insn_code) icode, ops);
 }

+/* This function emits a VLMAX masked store instruction.  */
+static void
+emit_vlmax_masked_store_insn (unsigned icode, int op_num, rtx *ops)
+{
+  machine_mode dest_mode = GET_MODE (ops[0]);
+  machine_mode mask_mode = get_mask_mode (dest_mode).require ();
+  insn_expander<RVV_INSN_OPERANDS_MAX> e (/*OP_NUM*/ op_num,
+					  /*HAS_DEST_P*/ false,
+					  /*FULLY_UNMASKED_P*/ false,
+					  /*USE_REAL_MERGE_P*/ true,
+					  /*HAS_AVL_P*/ true,
+					  /*VLMAX_P*/ true, dest_mode,
+					  mask_mode);
+  e.emit_insn ((enum insn_code) icode, ops);
+}
+
+/* This function emits a non-VLMAX masked store instruction.  */
+static void
+emit_nonvlmax_masked_store_insn (unsigned icode, int op_num, rtx *ops, rtx avl)
+{
+  machine_mode dest_mode = GET_MODE (ops[0]);
+  machine_mode mask_mode = get_mask_mode (dest_mode).require ();
+  insn_expander<RVV_INSN_OPERANDS_MAX> e (/*OP_NUM*/ op_num,
+					  /*HAS_DEST_P*/ false,
+					  /*FULLY_UNMASKED_P*/ false,
+					  /*USE_REAL_MERGE_P*/ true,
+					  /*HAS_AVL_P*/ true,
+					  /*VLMAX_P*/ false, dest_mode,
+					  mask_mode);
+  e.set_vl (avl);
+  e.emit_insn ((enum insn_code) icode, ops);
+}
+
 /* This function emits a masked instruction.  */
 void
 emit_vlmax_masked_mu_insn (unsigned icode, int op_num, rtx *ops)
@ -1194,7 +1234,6 @@ static void
 expand_const_vector (rtx target, rtx src)
 {
  machine_mode mode = GET_MODE (target);
-  scalar_mode elt_mode = GET_MODE_INNER (mode);
  if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL)
    {
      rtx elt;
@ -1219,7 +1258,6 @@ expand_const_vector (rtx target, rtx src)
 	}
      else
 	{
-	  elt = force_reg (elt_mode, elt);
 	  rtx ops[] = {tmp, elt};
 	  emit_vlmax_insn (code_for_pred_broadcast (mode), RVV_UNOP, ops);
 	}
@ -2488,6 +2526,25 @@ expand_vec_cmp_float (rtx target, rtx_code code, rtx op0, rtx op1,
  return false;
 }

+/* Modulo all SEL indices to ensure they are all in range if [0, MAX_SEL].  */
+static rtx
+modulo_sel_indices (rtx sel, poly_uint64 max_sel)
+{
+  rtx sel_mod;
+  machine_mode sel_mode = GET_MODE (sel);
+  poly_uint64 nunits = GET_MODE_NUNITS (sel_mode);
+  /* If SEL is variable-length CONST_VECTOR, we don't need to modulo it.  */
+  if (!nunits.is_constant () && CONST_VECTOR_P (sel))
+    sel_mod = sel;
+  else
+    {
+      rtx mod = gen_const_vector_dup (sel_mode, max_sel);
+      sel_mod
+	= expand_simple_binop (sel_mode, AND, sel, mod, NULL, 0, OPTAB_DIRECT);
+    }
+  return sel_mod;
+}
+
 /* Implement vec_perm<mode>.  */

 void
@ -2501,41 +2558,44 @@ expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
     index is in range of [0, nunits - 1]. A single vrgather instructions is
     enough. Since we will use vrgatherei16.vv for variable-length vector,
     it is never out of range and we don't need to modulo the index.  */
-  if (!nunits.is_constant () || const_vec_all_in_range_p (sel, 0, nunits - 1))
+  if (nunits.is_constant () && const_vec_all_in_range_p (sel, 0, nunits - 1))
    {
      emit_vlmax_gather_insn (target, op0, sel);
      return;
    }

-  /* Check if the two values vectors are the same.  */
-  if (rtx_equal_p (op0, op1) || const_vec_duplicate_p (sel))
+  /* Check if all the indices are same.  */
+  rtx elt;
+  if (const_vec_duplicate_p (sel, &elt))
    {
-      /* Note: vec_perm indices are supposed to wrap when they go beyond the
-	 size of the two value vectors, i.e. the upper bits of the indices
-	 are effectively ignored.  RVV vrgather instead produces 0 for any
-	 out-of-range indices, so we need to modulo all the vec_perm indices
-	 to ensure they are all in range of [0, nunits - 1].  */
-      rtx max_sel = gen_const_vector_dup (sel_mode, nunits - 1);
-      rtx sel_mod = expand_simple_binop (sel_mode, AND, sel, max_sel, NULL, 0,
-					 OPTAB_DIRECT);
-      emit_vlmax_gather_insn (target, op1, sel_mod);
+      poly_uint64 value = rtx_to_poly_int64 (elt);
+      rtx op = op0;
+      if (maybe_gt (value, nunits - 1))
+	{
+	  sel = gen_const_vector_dup (sel_mode, value - nunits);
+	  op = op1;
+	}
+      emit_vlmax_gather_insn (target, op, sel);
+    }
+
+  /* Note: vec_perm indices are supposed to wrap when they go beyond the
+     size of the two value vectors, i.e. the upper bits of the indices
+     are effectively ignored.  RVV vrgather instead produces 0 for any
+     out-of-range indices, so we need to modulo all the vec_perm indices
+     to ensure they are all in range of [0, nunits - 1] when op0 == op1
+     or all in range of [0, 2 * nunits - 1] when op0 != op1.  */
+  rtx sel_mod
+    = modulo_sel_indices (sel,
+			  rtx_equal_p (op0, op1) ? nunits - 1 : 2 * nunits - 1);
+
+  /* Check if the two values vectors are the same.  */
+  if (rtx_equal_p (op0, op1))
+    {
+      emit_vlmax_gather_insn (target, op0, sel_mod);
      return;
    }

-  rtx sel_mod = sel;
  rtx max_sel = gen_const_vector_dup (sel_mode, 2 * nunits - 1);
-  /* We don't need to modulo indices for VLA vector.
-     Since we should gurantee they aren't out of range before.  */
-  if (nunits.is_constant ())
-    {
-      /* Note: vec_perm indices are supposed to wrap when they go beyond the
-	 size of the two value vectors, i.e. the upper bits of the indices
-	 are effectively ignored.  RVV vrgather instead produces 0 for any
-	 out-of-range indices, so we need to modulo all the vec_perm indices
-	 to ensure they are all in range of [0, 2 * nunits - 1].  */
-      sel_mod = expand_simple_binop (sel_mode, AND, sel, max_sel, NULL, 0,
-				     OPTAB_DIRECT);
-    }

  /* This following sequence is handling the case that:
     __builtin_shufflevector (vec1, vec2, index...), the index can be any
@ -3007,6 +3067,7 @@ expand_load_store (rtx *ops, bool is_load)
    }
 }

+
 /* Return true if the operation is the floating-point operation need FRM.  */
 static bool
 needs_fp_rounding (rtx_code code, machine_mode mode)
@ -3047,4 +3108,163 @@ expand_cond_len_binop (rtx_code code, rtx *ops)
    gcc_unreachable ();
 }

+/* Prepare insn_code for gather_load/scatter_store according to
+   the vector mode and index mode.  */
+static insn_code
+prepare_gather_scatter (machine_mode vec_mode, machine_mode idx_mode,
+			bool is_load)
+{
+  if (!is_load)
+    return code_for_pred_indexed_store (UNSPEC_UNORDERED, vec_mode, idx_mode);
+  else
+    {
+      unsigned src_eew_bitsize = GET_MODE_BITSIZE (GET_MODE_INNER (idx_mode));
+      unsigned dst_eew_bitsize = GET_MODE_BITSIZE (GET_MODE_INNER (vec_mode));
+      if (dst_eew_bitsize == src_eew_bitsize)
+	return code_for_pred_indexed_load_same_eew (UNSPEC_UNORDERED, vec_mode);
+      else if (dst_eew_bitsize > src_eew_bitsize)
+	{
+	  unsigned factor = dst_eew_bitsize / src_eew_bitsize;
+	  switch (factor)
+	    {
+	    case 2:
+	      return code_for_pred_indexed_load_x2_greater_eew (
+		UNSPEC_UNORDERED, vec_mode);
+	    case 4:
+	      return code_for_pred_indexed_load_x4_greater_eew (
+		UNSPEC_UNORDERED, vec_mode);
+	    case 8:
+	      return code_for_pred_indexed_load_x8_greater_eew (
+		UNSPEC_UNORDERED, vec_mode);
+	    default:
+	      gcc_unreachable ();
+	    }
+	}
+      else
+	{
+	  unsigned factor = src_eew_bitsize / dst_eew_bitsize;
+	  switch (factor)
+	    {
+	    case 2:
+	      return code_for_pred_indexed_load_x2_smaller_eew (
+		UNSPEC_UNORDERED, vec_mode);
+	    case 4:
+	      return code_for_pred_indexed_load_x4_smaller_eew (
+		UNSPEC_UNORDERED, vec_mode);
+	    case 8:
+	      return code_for_pred_indexed_load_x8_smaller_eew (
+		UNSPEC_UNORDERED, vec_mode);
+	    default:
+	      gcc_unreachable ();
+	    }
+	}
+    }
+}
+
+/* Expand LEN_MASK_{GATHER_LOAD,SCATTER_STORE}.  */
+void
+expand_gather_scatter (rtx *ops, bool is_load)
+{
+  rtx ptr, vec_offset, vec_reg, len, mask;
+  bool zero_extend_p;
+  int scale_log2;
+  if (is_load)
+    {
+      vec_reg = ops[0];
+      ptr = ops[1];
+      vec_offset = ops[2];
+      zero_extend_p = INTVAL (ops[3]);
+      scale_log2 = exact_log2 (INTVAL (ops[4]));
+      len = ops[5];
+      mask = ops[7];
+    }
+  else
+    {
+      vec_reg = ops[4];
+      ptr = ops[0];
+      vec_offset = ops[1];
+      zero_extend_p = INTVAL (ops[2]);
+      scale_log2 = exact_log2 (INTVAL (ops[3]));
+      len = ops[5];
+      mask = ops[7];
+    }
+
+  machine_mode vec_mode = GET_MODE (vec_reg);
+  machine_mode idx_mode = GET_MODE (vec_offset);
+  scalar_mode inner_vec_mode = GET_MODE_INNER (vec_mode);
+  scalar_mode inner_idx_mode = GET_MODE_INNER (idx_mode);
+  unsigned inner_vsize = GET_MODE_BITSIZE (inner_vec_mode);
+  unsigned inner_offsize = GET_MODE_BITSIZE (inner_idx_mode);
+  poly_int64 nunits = GET_MODE_NUNITS (vec_mode);
+  poly_int64 value;
+  bool is_vlmax = poly_int_rtx_p (len, &value) && known_eq (value, nunits);
+
+  if (inner_offsize < inner_vsize)
+    {
+      /* 7.2. Vector Load/Store Addressing Modes.
+	 If the vector offset elements are narrower than XLEN, they are
+	 zero-extended to XLEN before adding to the ptr effective address. If
+	 the vector offset elements are wider than XLEN, the least-significant
+	 XLEN bits are used in the address calculation. An implementation must
+	 raise an illegal instruction exception if the EEW is not supported for
+	 offset elements.
+
+	 RVV spec only refers to the scale_log == 0 case.  */
+      if (!zero_extend_p || (zero_extend_p && scale_log2 != 0))
+	{
+	  if (zero_extend_p)
+	    inner_idx_mode
+	      = int_mode_for_size (inner_offsize * 2, 0).require ();
+	  else
+	    inner_idx_mode = int_mode_for_size (BITS_PER_WORD, 0).require ();
+	  machine_mode new_idx_mode
+	    = get_vector_mode (inner_idx_mode, nunits).require ();
+	  rtx tmp = gen_reg_rtx (new_idx_mode);
+	  emit_insn (gen_extend_insn (tmp, vec_offset, new_idx_mode, idx_mode,
+				      zero_extend_p ? true : false));
+	  vec_offset = tmp;
+	  idx_mode = new_idx_mode;
+	}
+    }
+
+  if (scale_log2 != 0)
+    {
+      rtx tmp = expand_binop (idx_mode, ashl_optab, vec_offset,
+			      gen_int_mode (scale_log2, Pmode), NULL_RTX, 0,
+			      OPTAB_DIRECT);
+      vec_offset = tmp;
+    }
+
+  insn_code icode = prepare_gather_scatter (vec_mode, idx_mode, is_load);
+  if (is_vlmax)
+    {
+      if (is_load)
+	{
+	  rtx load_ops[]
+	    = {vec_reg, mask, RVV_VUNDEF (vec_mode), ptr, vec_offset};
+	  emit_vlmax_masked_insn (icode, RVV_GATHER_M_OP, load_ops);
+	}
+      else
+	{
+	  rtx store_ops[] = {mask, ptr, vec_offset, vec_reg};
+	  emit_vlmax_masked_store_insn (icode, RVV_SCATTER_M_OP, store_ops);
+	}
+    }
+  else
+    {
+      if (is_load)
+	{
+	  rtx load_ops[]
+	    = {vec_reg, mask, RVV_VUNDEF (vec_mode), ptr, vec_offset};
+	  emit_nonvlmax_masked_insn (icode, RVV_GATHER_M_OP, load_ops, len);
+	}
+      else
+	{
+	  rtx store_ops[] = {mask, ptr, vec_offset, vec_reg};
+	  emit_nonvlmax_masked_store_insn (icode, RVV_SCATTER_M_OP, store_ops,
+					   len);
+	}
+    }
+}
+
 } // namespace riscv_vector
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@ -2037,7 +2037,14 @@ riscv_legitimize_poly_move (machine_mode mode, rtx dest, rtx tmp, rtx src)
     (m, n) = base * magn + constant.
     This calculation doesn't need div operation.  */

-  emit_move_insn (tmp, gen_int_mode (BYTES_PER_RISCV_VECTOR, mode));
+  if (known_le (GET_MODE_SIZE (mode), GET_MODE_SIZE (Pmode)))
+    emit_move_insn (tmp, gen_int_mode (BYTES_PER_RISCV_VECTOR, mode));
+  else
+    {
+      emit_move_insn (gen_highpart (Pmode, tmp), CONST0_RTX (Pmode));
+      emit_move_insn (gen_lowpart (Pmode, tmp),
+		      gen_int_mode (BYTES_PER_RISCV_VECTOR, Pmode));
+    }

  if (BYTES_PER_RISCV_VECTOR.is_constant ())
    {
@ -2144,7 +2151,7 @@ riscv_legitimize_move (machine_mode mode, rtx dest, rtx src)
 	  return false;
 	}

-      if (satisfies_constraint_vp (src))
+      if (satisfies_constraint_vp (src) && GET_MODE (src) == Pmode)
 	return false;

      if (GET_MODE_SIZE (mode).to_constant () < GET_MODE_SIZE (Pmode))
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@ -115,6 +115,9 @@

 (define_mode_iterator VEEWEXT2 [
  (VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI VNx8HI VNx16HI (VNx32HI "TARGET_MIN_VLEN > 32") (VNx64HI "TARGET_MIN_VLEN >= 128")
+  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128") (VNx2HF "TARGET_VECTOR_ELEN_FP_16") (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx8HF "TARGET_VECTOR_ELEN_FP_16") (VNx16HF "TARGET_VECTOR_ELEN_FP_16") (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
+  (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
  (VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI VNx4SI VNx8SI (VNx16SI "TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128")
  (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI "TARGET_VECTOR_ELEN_64")
  (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
@ -161,6 +164,8 @@
 (define_mode_iterator VEEWTRUNC2 [
  (VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI VNx4QI VNx8QI VNx16QI VNx32QI (VNx64QI "TARGET_MIN_VLEN >= 128")
  (VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI VNx8HI VNx16HI (VNx32HI "TARGET_MIN_VLEN >= 128")
+  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128") (VNx2HF "TARGET_VECTOR_ELEN_FP_16") (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx8HF "TARGET_VECTOR_ELEN_FP_16") (VNx16HF "TARGET_VECTOR_ELEN_FP_16") (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
  (VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI VNx4SI VNx8SI (VNx16SI "TARGET_MIN_VLEN >= 128")
  (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
  (VNx2SF "TARGET_VECTOR_ELEN_FP_32")
@ -172,6 +177,8 @@
 (define_mode_iterator VEEWTRUNC4 [
  (VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI VNx4QI VNx8QI VNx16QI (VNx32QI "TARGET_MIN_VLEN >= 128")
  (VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI VNx8HI (VNx16HI "TARGET_MIN_VLEN >= 128")
+  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128") (VNx2HF "TARGET_VECTOR_ELEN_FP_16") (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx8HF "TARGET_VECTOR_ELEN_FP_16") (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
 ])

 (define_mode_iterator VEEWTRUNC8 [
@ -362,46 +369,67 @@
 ])

 (define_mode_iterator VNX1_QHSD [
-  (VNx1QI "TARGET_MIN_VLEN < 128") (VNx1HI "TARGET_MIN_VLEN < 128") (VNx1SI "TARGET_MIN_VLEN < 128")
+  (VNx1QI "TARGET_MIN_VLEN < 128")
+  (VNx1HI "TARGET_MIN_VLEN < 128")
+  (VNx1SI "TARGET_MIN_VLEN < 128")
  (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128")
+  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
  (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
  (VNx1DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN < 128")
 ])

 (define_mode_iterator VNX2_QHSD [
-  VNx2QI VNx2HI VNx2SI
+  VNx2QI
+  VNx2HI
+  VNx2SI
  (VNx2DI "TARGET_VECTOR_ELEN_64")
+  (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
  (VNx2SF "TARGET_VECTOR_ELEN_FP_32")
  (VNx2DF "TARGET_VECTOR_ELEN_FP_64")
 ])

 (define_mode_iterator VNX4_QHSD [
-  VNx4QI VNx4HI VNx4SI
+  VNx4QI
+  VNx4HI
+  VNx4SI
  (VNx4DI "TARGET_VECTOR_ELEN_64")
+  (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
  (VNx4SF "TARGET_VECTOR_ELEN_FP_32")
  (VNx4DF "TARGET_VECTOR_ELEN_FP_64")
 ])

 (define_mode_iterator VNX8_QHSD [
-  VNx8QI VNx8HI VNx8SI
+  VNx8QI
+  VNx8HI
+  VNx8SI
  (VNx8DI "TARGET_VECTOR_ELEN_64")
+  (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
  (VNx8SF "TARGET_VECTOR_ELEN_FP_32")
  (VNx8DF "TARGET_VECTOR_ELEN_FP_64")
 ])

-(define_mode_iterator VNX16_QHS [
-  VNx16QI VNx16HI (VNx16SI "TARGET_MIN_VLEN > 32")
+(define_mode_iterator VNX16_QHSD [
+  VNx16QI
+  VNx16HI
+  (VNx16SI "TARGET_MIN_VLEN > 32")
+  (VNx16DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
+  (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
  (VNx16SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN > 32")
-  (VNx16DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128") (VNx16DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN >= 128")
+  (VNx16DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN >= 128")
 ])

 (define_mode_iterator VNX32_QHS [
-  VNx32QI (VNx32HI "TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128") (VNx32SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
+  VNx32QI
+  (VNx32HI "TARGET_MIN_VLEN > 32")
+  (VNx32SI "TARGET_MIN_VLEN >= 128")
+  (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
+  (VNx32SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
 ])

 (define_mode_iterator VNX64_QH [
  (VNx64QI "TARGET_MIN_VLEN > 32")
  (VNx64HI "TARGET_MIN_VLEN >= 128")
+  (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
 ])

 (define_mode_iterator VNX128_Q [
@ -409,35 +437,49 @@
 ])

 (define_mode_iterator VNX1_QHSDI [
-  (VNx1QI "TARGET_MIN_VLEN < 128") (VNx1HI "TARGET_MIN_VLEN < 128") (VNx1SI "TARGET_MIN_VLEN < 128")
-  (VNx1DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128")
+  (VNx1QI "TARGET_MIN_VLEN < 128")
+  (VNx1HI "TARGET_MIN_VLEN < 128")
+  (VNx1SI "TARGET_MIN_VLEN < 128")
+  (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128 && TARGET_64BIT")
 ])

 (define_mode_iterator VNX2_QHSDI [
-  VNx2QI VNx2HI VNx2SI
-  (VNx2DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64")
+  VNx2QI
+  VNx2HI
+  VNx2SI
+  (VNx2DI "TARGET_VECTOR_ELEN_64 && TARGET_64BIT")
 ])

 (define_mode_iterator VNX4_QHSDI [
-  VNx4QI VNx4HI VNx4SI
-  (VNx4DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64")
+  VNx4QI
+  VNx4HI
+  VNx4SI
+  (VNx4DI "TARGET_VECTOR_ELEN_64 && TARGET_64BIT")
 ])

 (define_mode_iterator VNX8_QHSDI [
-  VNx8QI VNx8HI VNx8SI
-  (VNx8DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64")
+  VNx8QI
+  VNx8HI
+  VNx8SI
+  (VNx8DI "TARGET_VECTOR_ELEN_64 && TARGET_64BIT")
 ])

 (define_mode_iterator VNX16_QHSDI [
-  VNx16QI VNx16HI (VNx16SI "TARGET_MIN_VLEN > 32") (VNx16DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
+  VNx16QI
+  VNx16HI
+  (VNx16SI "TARGET_MIN_VLEN > 32")
+  (VNx16DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128 && TARGET_64BIT")
 ])

 (define_mode_iterator VNX32_QHSI [
-  VNx32QI (VNx32HI "TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128")
+  VNx32QI
+  (VNx32HI "TARGET_MIN_VLEN > 32")
+  (VNx32SI "TARGET_MIN_VLEN >= 128")
 ])

 (define_mode_iterator VNX64_QHI [
-  VNx64QI (VNx64HI "TARGET_MIN_VLEN >= 128")
+  (VNx64QI "TARGET_MIN_VLEN > 32")
+  (VNx64HI "TARGET_MIN_VLEN >= 128")
 ])

 (define_mode_iterator V_WHOLE [
@ -1393,6 +1435,8 @@
 (define_mode_attr VINDEX_DOUBLE_TRUNC [
  (VNx1HI "VNx1QI") (VNx2HI "VNx2QI")  (VNx4HI "VNx4QI")  (VNx8HI "VNx8QI")
  (VNx16HI "VNx16QI") (VNx32HI "VNx32QI") (VNx64HI "VNx64QI")
+  (VNx1HF "VNx1QI") (VNx2HF "VNx2QI")  (VNx4HF "VNx4QI")  (VNx8HF "VNx8QI")
+  (VNx16HF "VNx16QI") (VNx32HF "VNx32QI") (VNx64HF "VNx64QI")
  (VNx1SI "VNx1HI") (VNx2SI "VNx2HI") (VNx4SI "VNx4HI") (VNx8SI "VNx8HI")
  (VNx16SI "VNx16HI") (VNx32SI "VNx32HI")
  (VNx1SF "VNx1HI") (VNx2SF "VNx2HI") (VNx4SF "VNx4HI") (VNx8SF "VNx8HI")
@ -1420,6 +1464,7 @@
 (define_mode_attr VINDEX_DOUBLE_EXT [
  (VNx1QI "VNx1HI") (VNx2QI "VNx2HI") (VNx4QI "VNx4HI") (VNx8QI "VNx8HI") (VNx16QI "VNx16HI") (VNx32QI "VNx32HI") (VNx64QI "VNx64HI")
  (VNx1HI "VNx1SI") (VNx2HI "VNx2SI") (VNx4HI "VNx4SI") (VNx8HI "VNx8SI") (VNx16HI "VNx16SI") (VNx32HI "VNx32SI")
+  (VNx1HF "VNx1SI") (VNx2HF "VNx2SI") (VNx4HF "VNx4SI") (VNx8HF "VNx8SI") (VNx16HF "VNx16SI") (VNx32HF "VNx32SI")
  (VNx1SI "VNx1DI") (VNx2SI "VNx2DI") (VNx4SI "VNx4DI") (VNx8SI "VNx8DI") (VNx16SI "VNx16DI")
  (VNx1SF "VNx1DI") (VNx2SF "VNx2DI") (VNx4SF "VNx4DI") (VNx8SF "VNx8DI") (VNx16SF "VNx16DI")
 ])
@ -1427,6 +1472,7 @@
 (define_mode_attr VINDEX_QUAD_EXT [
  (VNx1QI "VNx1SI") (VNx2QI "VNx2SI") (VNx4QI "VNx4SI") (VNx8QI "VNx8SI") (VNx16QI "VNx16SI") (VNx32QI "VNx32SI")
  (VNx1HI "VNx1DI") (VNx2HI "VNx2DI") (VNx4HI "VNx4DI") (VNx8HI "VNx8DI") (VNx16HI "VNx16DI")
+  (VNx1HF "VNx1DI") (VNx2HF "VNx2DI") (VNx4HF "VNx4DI") (VNx8HF "VNx8DI") (VNx16HF "VNx16DI")
 ])

 (define_mode_attr VINDEX_OCT_EXT [
@ -1471,6 +1517,40 @@
  (VNx4DI "VNx8BI") (VNx8DI "VNx16BI") (VNx16DI "VNx32BI")
 ])

+(define_mode_attr gs_extension [
+  (VNx1QI "immediate_operand") (VNx2QI "immediate_operand") (VNx4QI "immediate_operand") (VNx8QI "immediate_operand") (VNx16QI "immediate_operand")
+  (VNx32QI "vector_gs_extension_operand") (VNx64QI "const_1_operand")
+  (VNx1HI "immediate_operand") (VNx2HI "immediate_operand") (VNx4HI "immediate_operand") (VNx8HI "immediate_operand") (VNx16HI "immediate_operand")
+  (VNx32HI "vector_gs_extension_operand") (VNx64HI "const_1_operand")
+  (VNx1SI "immediate_operand") (VNx2SI "immediate_operand") (VNx4SI "immediate_operand") (VNx8SI "immediate_operand") (VNx16SI "immediate_operand")
+  (VNx32SI "vector_gs_extension_operand")
+  (VNx1DI "immediate_operand") (VNx2DI "immediate_operand") (VNx4DI "immediate_operand") (VNx8DI "immediate_operand") (VNx16DI "immediate_operand")
+
+  (VNx1HF "immediate_operand") (VNx2HF "immediate_operand") (VNx4HF "immediate_operand") (VNx8HF "immediate_operand") (VNx16HF "immediate_operand")
+  (VNx32HF "vector_gs_extension_operand") (VNx64HF "const_1_operand")
+  (VNx1SF "immediate_operand") (VNx2SF "immediate_operand") (VNx4SF "immediate_operand") (VNx8SF "immediate_operand") (VNx16SF "immediate_operand")
+  (VNx32SF "vector_gs_extension_operand")
+  (VNx1DF "immediate_operand") (VNx2DF "immediate_operand") (VNx4DF "immediate_operand") (VNx8DF "immediate_operand") (VNx16DF "immediate_operand")
+])
+
+(define_mode_attr gs_scale [
+  (VNx1QI "const_1_operand") (VNx2QI "const_1_operand") (VNx4QI "const_1_operand") (VNx8QI "const_1_operand")
+  (VNx16QI "const_1_operand") (VNx32QI "const_1_operand") (VNx64QI "const_1_operand")
+  (VNx1HI "vector_gs_scale_operand_16") (VNx2HI "vector_gs_scale_operand_16") (VNx4HI "vector_gs_scale_operand_16") (VNx8HI "vector_gs_scale_operand_16")
+  (VNx16HI "vector_gs_scale_operand_16") (VNx32HI "vector_gs_scale_operand_16_rv32") (VNx64HI "const_1_operand")
+  (VNx1SI "vector_gs_scale_operand_32") (VNx2SI "vector_gs_scale_operand_32") (VNx4SI "vector_gs_scale_operand_32") (VNx8SI "vector_gs_scale_operand_32")
+  (VNx16SI "vector_gs_scale_operand_32") (VNx32SI "vector_gs_scale_operand_32_rv32")
+  (VNx1DI "vector_gs_scale_operand_64") (VNx2DI "vector_gs_scale_operand_64") (VNx4DI "vector_gs_scale_operand_64") (VNx8DI "vector_gs_scale_operand_64")
+  (VNx16DI "vector_gs_scale_operand_64")
+
+  (VNx1HF "vector_gs_scale_operand_16") (VNx2HF "vector_gs_scale_operand_16") (VNx4HF "vector_gs_scale_operand_16") (VNx8HF "vector_gs_scale_operand_16")
+  (VNx16HF "vector_gs_scale_operand_16") (VNx32HF "vector_gs_scale_operand_16_rv32") (VNx64HF "const_1_operand")
+  (VNx1SF "vector_gs_scale_operand_32") (VNx2SF "vector_gs_scale_operand_32") (VNx4SF "vector_gs_scale_operand_32") (VNx8SF "vector_gs_scale_operand_32")
+  (VNx16SF "vector_gs_scale_operand_32") (VNx32SF "vector_gs_scale_operand_32_rv32")
+  (VNx1DF "vector_gs_scale_operand_64") (VNx2DF "vector_gs_scale_operand_64") (VNx4DF "vector_gs_scale_operand_64") (VNx8DF "vector_gs_scale_operand_64")
+  (VNx16DF "vector_gs_scale_operand_64")
+])
+
 (define_int_iterator WREDUC [UNSPEC_WREDUC_SUM UNSPEC_WREDUC_USUM])

 (define_int_iterator ORDER [UNSPEC_ORDERED UNSPEC_UNORDERED])
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@ -818,7 +818,7 @@
 ;; This pattern only handles duplicates of non-constant inputs.
 ;; Constant vectors go through the movm pattern instead.
 ;; So "direct_broadcast_operand" can only be mem or reg, no CONSTANT.
-(define_expand "vec_duplicate<mode>"
+(define_expand "@vec_duplicate<mode>"
  [(set (match_operand:V 0 "register_operand")
 	(vec_duplicate:V
 	  (match_operand:<VEL> 1 "direct_broadcast_operand")))]
@ -1357,8 +1357,16 @@
 	}
    }
  else if (GET_MODE_BITSIZE (<VEL>mode) > GET_MODE_BITSIZE (Pmode)
-           && immediate_operand (operands[3], Pmode))
-    operands[3] = gen_rtx_SIGN_EXTEND (<VEL>mode, force_reg (Pmode, operands[3]));
+           && (immediate_operand (operands[3], Pmode)
+	       || (CONST_POLY_INT_P (operands[3])
+	           && known_ge (rtx_to_poly_int64 (operands[3]), 0U)
+		   && known_le (rtx_to_poly_int64 (operands[3]), GET_MODE_SIZE (<MODE>mode)))))
+    {
+      rtx tmp = gen_reg_rtx (Pmode);
+      poly_int64 value = rtx_to_poly_int64 (operands[3]);
+      emit_move_insn (tmp, gen_int_mode (value, Pmode));
+      operands[3] = gen_rtx_SIGN_EXTEND (<VEL>mode, tmp);
+    }
  else
    operands[3] = force_reg (<VEL>mode, operands[3]);
 })
@ -1387,7 +1395,8 @@
   vlse<sew>.v\t%0,%3,zero
   vmv.s.x\t%0,%3
   vmv.s.x\t%0,%3"
-  "register_operand (operands[3], <VEL>mode)
+  "(register_operand (operands[3], <VEL>mode)
+  || CONST_POLY_INT_P (operands[3]))
  && GET_MODE_BITSIZE (<VEL>mode) > GET_MODE_BITSIZE (Pmode)"
  [(set (match_dup 0)
 	(if_then_else:VI (unspec:<VM> [(match_dup 1) (match_dup 4)
@ -1397,6 +1406,12 @@
 	  (match_dup 2)))]
  {
    gcc_assert (can_create_pseudo_p ());
+    if (CONST_POLY_INT_P (operands[3]))
+      {
+	rtx tmp = gen_reg_rtx (<VEL>mode);
+	emit_move_insn (tmp, operands[3]);
+	operands[3] = tmp;
+      }
    rtx m = assign_stack_local (<VEL>mode, GET_MODE_SIZE (<VEL>mode),
 				GET_MODE_ALIGNMENT (<VEL>mode));
    m = validize_mem (m);
@ -1483,6 +1498,7 @@
 	     (match_operand 5 "vector_length_operand"    "   rK,    rK,    rK")
 	     (match_operand 6 "const_int_operand"        "    i,     i,     i")
 	     (match_operand 7 "const_int_operand"        "    i,     i,     i")
+	     (match_operand 8 "const_int_operand"        "    i,     i,     i")
 	     (reg:SI VL_REGNUM)
 	     (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
 	  (unspec:V
@ -1738,7 +1754,7 @@
  [(set_attr "type" "vst<order>x")
   (set_attr "mode" "<VNX8_QHSD:MODE>")])

-(define_insn "@pred_indexed_<order>store<VNX16_QHS:mode><VNX16_QHSDI:mode>"
+(define_insn "@pred_indexed_<order>store<VNX16_QHSD:mode><VNX16_QHSDI:mode>"
  [(set (mem:BLK (scratch))
 	(unspec:BLK
 	  [(unspec:<VM>
@ -1749,11 +1765,11 @@
 	     (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
 	   (match_operand 1 "pmode_reg_or_0_operand"      "  rJ")
 	   (match_operand:VNX16_QHSDI 2 "register_operand" "  vr")
-	   (match_operand:VNX16_QHS 3 "register_operand"  "  vr")] ORDER))]
+	   (match_operand:VNX16_QHSD 3 "register_operand"  "  vr")] ORDER))]
  "TARGET_VECTOR"
  "vs<order>xei<VNX16_QHSDI:sew>.v\t%3,(%z1),%2%p0"
  [(set_attr "type" "vst<order>x")
-   (set_attr "mode" "<VNX16_QHS:MODE>")])
+   (set_attr "mode" "<VNX16_QHSD:MODE>")])

 (define_insn "@pred_indexed_<order>store<VNX32_QHS:mode><VNX32_QHSI:mode>"
  [(set (mem:BLK (scratch))
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-1.c
@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 uint8_t
+#define INDEX16 uint16_t
+#define INDEX32 uint32_t
+#define INDEX64 uint64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-10.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-10.c
@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX64 int64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 64)                                                                \
+  T (uint8_t, 64)                                                               \
+  T (int16_t, 64)                                                               \
+  T (uint16_t, 64)                                                              \
+  T (_Float16, 64)                                                              \
+  T (int32_t, 64)                                                               \
+  T (uint32_t, 64)                                                              \
+  T (float, 64)                                                                 \
+  T (int64_t, 64)                                                               \
+  T (uint64_t, 64)                                                              \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-11.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-11.c
@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_LOOP(DATA_TYPE)                                                   \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict *src)           \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += *src[i];                                                      \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t)                                                                   \
+  T (uint8_t)                                                                  \
+  T (int16_t)                                                                  \
+  T (uint16_t)                                                                 \
+  T (_Float16)                                                                 \
+  T (int32_t)                                                                  \
+  T (uint32_t)                                                                 \
+  T (float)                                                                    \
+  T (int64_t)                                                                  \
+  T (uint64_t)                                                                 \
+  T (double)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-12.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-12.c
@ -0,0 +1,112 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-vect-cost-model -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_LOOP(DATA_TYPE, INDEX_TYPE)                                       \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE##_##INDEX_TYPE (DATA_TYPE *restrict y, DATA_TYPE *restrict x,  \
+				INDEX_TYPE *restrict index)                    \
+  {                                                                            \
+    for (int i = 0; i < 100; ++i)                                              \
+      {                                                                        \
+	y[i * 2] = x[index[i * 2]] + 1;                                        \
+	y[i * 2 + 1] = x[index[i * 2 + 1]] + 2;                                \
+      }                                                                        \
+  }
+
+TEST_LOOP (int8_t, int8_t)
+TEST_LOOP (uint8_t, int8_t)
+TEST_LOOP (int16_t, int8_t)
+TEST_LOOP (uint16_t, int8_t)
+TEST_LOOP (int32_t, int8_t)
+TEST_LOOP (uint32_t, int8_t)
+TEST_LOOP (int64_t, int8_t)
+TEST_LOOP (uint64_t, int8_t)
+TEST_LOOP (_Float16, int8_t)
+TEST_LOOP (float, int8_t)
+TEST_LOOP (double, int8_t)
+TEST_LOOP (int8_t, int16_t)
+TEST_LOOP (uint8_t, int16_t)
+TEST_LOOP (int16_t, int16_t)
+TEST_LOOP (uint16_t, int16_t)
+TEST_LOOP (int32_t, int16_t)
+TEST_LOOP (uint32_t, int16_t)
+TEST_LOOP (int64_t, int16_t)
+TEST_LOOP (uint64_t, int16_t)
+TEST_LOOP (_Float16, int16_t)
+TEST_LOOP (float, int16_t)
+TEST_LOOP (double, int16_t)
+TEST_LOOP (int8_t, int32_t)
+TEST_LOOP (uint8_t, int32_t)
+TEST_LOOP (int16_t, int32_t)
+TEST_LOOP (uint16_t, int32_t)
+TEST_LOOP (int32_t, int32_t)
+TEST_LOOP (uint32_t, int32_t)
+TEST_LOOP (int64_t, int32_t)
+TEST_LOOP (uint64_t, int32_t)
+TEST_LOOP (_Float16, int32_t)
+TEST_LOOP (float, int32_t)
+TEST_LOOP (double, int32_t)
+TEST_LOOP (int8_t, int64_t)
+TEST_LOOP (uint8_t, int64_t)
+TEST_LOOP (int16_t, int64_t)
+TEST_LOOP (uint16_t, int64_t)
+TEST_LOOP (int32_t, int64_t)
+TEST_LOOP (uint32_t, int64_t)
+TEST_LOOP (int64_t, int64_t)
+TEST_LOOP (uint64_t, int64_t)
+TEST_LOOP (_Float16, int64_t)
+TEST_LOOP (float, int64_t)
+TEST_LOOP (double, int64_t)
+TEST_LOOP (int8_t, uint8_t)
+TEST_LOOP (uint8_t, uint8_t)
+TEST_LOOP (int16_t, uint8_t)
+TEST_LOOP (uint16_t, uint8_t)
+TEST_LOOP (int32_t, uint8_t)
+TEST_LOOP (uint32_t, uint8_t)
+TEST_LOOP (int64_t, uint8_t)
+TEST_LOOP (uint64_t, uint8_t)
+TEST_LOOP (_Float16, uint8_t)
+TEST_LOOP (float, uint8_t)
+TEST_LOOP (double, uint8_t)
+TEST_LOOP (int8_t, uint16_t)
+TEST_LOOP (uint8_t, uint16_t)
+TEST_LOOP (int16_t, uint16_t)
+TEST_LOOP (uint16_t, uint16_t)
+TEST_LOOP (int32_t, uint16_t)
+TEST_LOOP (uint32_t, uint16_t)
+TEST_LOOP (int64_t, uint16_t)
+TEST_LOOP (uint64_t, uint16_t)
+TEST_LOOP (_Float16, uint16_t)
+TEST_LOOP (float, uint16_t)
+TEST_LOOP (double, uint16_t)
+TEST_LOOP (int8_t, uint32_t)
+TEST_LOOP (uint8_t, uint32_t)
+TEST_LOOP (int16_t, uint32_t)
+TEST_LOOP (uint16_t, uint32_t)
+TEST_LOOP (int32_t, uint32_t)
+TEST_LOOP (uint32_t, uint32_t)
+TEST_LOOP (int64_t, uint32_t)
+TEST_LOOP (uint64_t, uint32_t)
+TEST_LOOP (_Float16, uint32_t)
+TEST_LOOP (float, uint32_t)
+TEST_LOOP (double, uint32_t)
+TEST_LOOP (int8_t, uint64_t)
+TEST_LOOP (uint8_t, uint64_t)
+TEST_LOOP (int16_t, uint64_t)
+TEST_LOOP (uint16_t, uint64_t)
+TEST_LOOP (int32_t, uint64_t)
+TEST_LOOP (uint32_t, uint64_t)
+TEST_LOOP (int64_t, uint64_t)
+TEST_LOOP (uint64_t, uint64_t)
+TEST_LOOP (_Float16, uint64_t)
+TEST_LOOP (float, uint64_t)
+TEST_LOOP (double, uint64_t)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 88 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-assembler-not "vluxei64\.v" } } */
+/* { dg-final { scan-assembler-not "vsuxei64\.v" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-2.c
@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 int8_t
+#define INDEX16 int16_t
+#define INDEX32 int32_t
+#define INDEX64 int64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-3.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-3.c
@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 uint8_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 8)                                                               \
+  T (uint16_t, 8)                                                              \
+  T (_Float16, 8)                                                              \
+  T (int32_t, 8)                                                               \
+  T (uint32_t, 8)                                                              \
+  T (float, 8)                                                                 \
+  T (int64_t, 8)                                                               \
+  T (uint64_t, 8)                                                              \
+  T (double, 8)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-4.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-4.c
@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 int8_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 8)                                                               \
+  T (uint16_t, 8)                                                              \
+  T (_Float16, 8)                                                              \
+  T (int32_t, 8)                                                               \
+  T (uint32_t, 8)                                                              \
+  T (float, 8)                                                                 \
+  T (int64_t, 8)                                                               \
+  T (uint64_t, 8)                                                              \
+  T (double, 8)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-5.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-5.c
@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX16 uint16_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 16)                                                                \
+  T (uint8_t, 16)                                                               \
+  T (int16_t, 16)                                                               \
+  T (uint16_t, 16)                                                              \
+  T (_Float16, 16)                                                              \
+  T (int32_t, 16)                                                               \
+  T (uint32_t, 16)                                                              \
+  T (float, 16)                                                                 \
+  T (int64_t, 16)                                                               \
+  T (uint64_t, 16)                                                              \
+  T (double, 16)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-6.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-6.c
@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX16 int16_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 16)                                                                \
+  T (uint8_t, 16)                                                               \
+  T (int16_t, 16)                                                               \
+  T (uint16_t, 16)                                                              \
+  T (_Float16, 16)                                                              \
+  T (int32_t, 16)                                                               \
+  T (uint32_t, 16)                                                              \
+  T (float, 16)                                                                 \
+  T (int64_t, 16)                                                               \
+  T (uint64_t, 16)                                                              \
+  T (double, 16)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-7.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-7.c
@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX32 uint32_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 32)                                                                \
+  T (uint8_t, 32)                                                               \
+  T (int16_t, 32)                                                               \
+  T (uint16_t, 32)                                                              \
+  T (_Float16, 32)                                                              \
+  T (int32_t, 32)                                                               \
+  T (uint32_t, 32)                                                              \
+  T (float, 32)                                                                 \
+  T (int64_t, 32)                                                               \
+  T (uint64_t, 32)                                                              \
+  T (double, 32)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-8.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-8.c
@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX32 int32_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 32)                                                                \
+  T (uint8_t, 32)                                                               \
+  T (int16_t, 32)                                                               \
+  T (uint16_t, 32)                                                              \
+  T (_Float16, 32)                                                              \
+  T (int32_t, 32)                                                               \
+  T (uint32_t, 32)                                                              \
+  T (float, 32)                                                                 \
+  T (int64_t, 32)                                                               \
+  T (uint64_t, 32)                                                              \
+  T (double, 32)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-9.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-9.c
@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX64 uint64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 64)                                                                \
+  T (uint8_t, 64)                                                               \
+  T (int16_t, 64)                                                               \
+  T (uint16_t, 64)                                                              \
+  T (_Float16, 64)                                                              \
+  T (int32_t, 64)                                                               \
+  T (uint32_t, 64)                                                              \
+  T (float, 64)                                                                 \
+  T (int64_t, 64)                                                               \
+  T (uint64_t, 64)                                                              \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-1.c
@ -0,0 +1,41 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-1.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+	    == (dest2_##DATA_TYPE[i]                                           \
+		+ src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-10.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-10.c
@ -0,0 +1,41 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-10.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+	    == (dest2_##DATA_TYPE[i]                                           \
+		+ src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-11.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-11.c
@ -0,0 +1,39 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-11.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE *src_##DATA_TYPE[128];                                             \
+  DATA_TYPE src2_##DATA_TYPE[128];                                             \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = src2_##DATA_TYPE + i;                               \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE);                           \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+	    == (dest2_##DATA_TYPE[i] + src_##DATA_TYPE[i][0]));
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c
@ -0,0 +1,124 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-12.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, INDEX_TYPE)                                        \
+  DATA_TYPE dest_##DATA_TYPE##_##INDEX_TYPE[202] = {0};                        \
+  DATA_TYPE src_##DATA_TYPE##_##INDEX_TYPE[202] = {0};                         \
+  INDEX_TYPE index_##DATA_TYPE##_##INDEX_TYPE[202] = {0};                      \
+  for (int i = 0; i < 202; i++)                                                \
+    {                                                                          \
+      src_##DATA_TYPE##_##INDEX_TYPE[i]                                        \
+	= (DATA_TYPE) ((i * 19 + 735) & (sizeof (DATA_TYPE) * 7 - 1));         \
+      index_##DATA_TYPE##_##INDEX_TYPE[i] = (i * 7) % (55);                    \
+    }                                                                          \
+  f_##DATA_TYPE##_##INDEX_TYPE (dest_##DATA_TYPE##_##INDEX_TYPE,               \
+				src_##DATA_TYPE##_##INDEX_TYPE,                \
+				index_##DATA_TYPE##_##INDEX_TYPE);             \
+  for (int i = 0; i < 100; i++)                                                \
+    {                                                                          \
+      assert (dest_##DATA_TYPE##_##INDEX_TYPE[i * 2]                           \
+	      == (src_##DATA_TYPE##_##INDEX_TYPE                               \
+		    [index_##DATA_TYPE##_##INDEX_TYPE[i * 2]]                  \
+		  + 1));                                                       \
+      assert (dest_##DATA_TYPE##_##INDEX_TYPE[i * 2 + 1]                       \
+	      == (src_##DATA_TYPE##_##INDEX_TYPE                               \
+		    [index_##DATA_TYPE##_##INDEX_TYPE[i * 2 + 1]]              \
+		  + 2));                                                       \
+    }
+
+  RUN_LOOP (int8_t, int8_t)
+  RUN_LOOP (uint8_t, int8_t)
+  RUN_LOOP (int16_t, int8_t)
+  RUN_LOOP (uint16_t, int8_t)
+  RUN_LOOP (int32_t, int8_t)
+  RUN_LOOP (uint32_t, int8_t)
+  RUN_LOOP (int64_t, int8_t)
+  RUN_LOOP (uint64_t, int8_t)
+  RUN_LOOP (_Float16, int8_t)
+  RUN_LOOP (float, int8_t)
+  RUN_LOOP (double, int8_t)
+  RUN_LOOP (int8_t, int16_t)
+  RUN_LOOP (uint8_t, int16_t)
+  RUN_LOOP (int16_t, int16_t)
+  RUN_LOOP (uint16_t, int16_t)
+  RUN_LOOP (int32_t, int16_t)
+  RUN_LOOP (uint32_t, int16_t)
+  RUN_LOOP (int64_t, int16_t)
+  RUN_LOOP (uint64_t, int16_t)
+  RUN_LOOP (_Float16, int16_t)
+  RUN_LOOP (float, int16_t)
+  RUN_LOOP (double, int16_t)
+  RUN_LOOP (int8_t, int32_t)
+  RUN_LOOP (uint8_t, int32_t)
+  RUN_LOOP (int16_t, int32_t)
+  RUN_LOOP (uint16_t, int32_t)
+  RUN_LOOP (int32_t, int32_t)
+  RUN_LOOP (uint32_t, int32_t)
+  RUN_LOOP (int64_t, int32_t)
+  RUN_LOOP (uint64_t, int32_t)
+  RUN_LOOP (_Float16, int32_t)
+  RUN_LOOP (float, int32_t)
+  RUN_LOOP (double, int32_t)
+  RUN_LOOP (int8_t, int64_t)
+  RUN_LOOP (uint8_t, int64_t)
+  RUN_LOOP (int16_t, int64_t)
+  RUN_LOOP (uint16_t, int64_t)
+  RUN_LOOP (int32_t, int64_t)
+  RUN_LOOP (uint32_t, int64_t)
+  RUN_LOOP (int64_t, int64_t)
+  RUN_LOOP (uint64_t, int64_t)
+  RUN_LOOP (_Float16, int64_t)
+  RUN_LOOP (float, int64_t)
+  RUN_LOOP (double, int64_t)
+  RUN_LOOP (int8_t, uint8_t)
+  RUN_LOOP (uint8_t, uint8_t)
+  RUN_LOOP (int16_t, uint8_t)
+  RUN_LOOP (uint16_t, uint8_t)
+  RUN_LOOP (int32_t, uint8_t)
+  RUN_LOOP (uint32_t, uint8_t)
+  RUN_LOOP (int64_t, uint8_t)
+  RUN_LOOP (uint64_t, uint8_t)
+  RUN_LOOP (_Float16, uint8_t)
+  RUN_LOOP (float, uint8_t)
+  RUN_LOOP (double, uint8_t)
+  RUN_LOOP (int8_t, uint16_t)
+  RUN_LOOP (uint8_t, uint16_t)
+  RUN_LOOP (int16_t, uint16_t)
+  RUN_LOOP (uint16_t, uint16_t)
+  RUN_LOOP (int32_t, uint16_t)
+  RUN_LOOP (uint32_t, uint16_t)
+  RUN_LOOP (int64_t, uint16_t)
+  RUN_LOOP (uint64_t, uint16_t)
+  RUN_LOOP (_Float16, uint16_t)
+  RUN_LOOP (float, uint16_t)
+  RUN_LOOP (double, uint16_t)
+  RUN_LOOP (int8_t, uint32_t)
+  RUN_LOOP (uint8_t, uint32_t)
+  RUN_LOOP (int16_t, uint32_t)
+  RUN_LOOP (uint16_t, uint32_t)
+  RUN_LOOP (int32_t, uint32_t)
+  RUN_LOOP (uint32_t, uint32_t)
+  RUN_LOOP (int64_t, uint32_t)
+  RUN_LOOP (uint64_t, uint32_t)
+  RUN_LOOP (_Float16, uint32_t)
+  RUN_LOOP (float, uint32_t)
+  RUN_LOOP (double, uint32_t)
+  RUN_LOOP (int8_t, uint64_t)
+  RUN_LOOP (uint8_t, uint64_t)
+  RUN_LOOP (int16_t, uint64_t)
+  RUN_LOOP (uint16_t, uint64_t)
+  RUN_LOOP (int32_t, uint64_t)
+  RUN_LOOP (uint32_t, uint64_t)
+  RUN_LOOP (int64_t, uint64_t)
+  RUN_LOOP (uint64_t, uint64_t)
+  RUN_LOOP (_Float16, uint64_t)
+  RUN_LOOP (float, uint64_t)
+  RUN_LOOP (double, uint64_t)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-2.c
@ -0,0 +1,41 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-2.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+	    == (dest2_##DATA_TYPE[i]                                           \
+		+ src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-3.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-3.c
@ -0,0 +1,41 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-3.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+	    == (dest2_##DATA_TYPE[i]                                           \
+		+ src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-4.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-4.c
@ -0,0 +1,41 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-4.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+	    == (dest2_##DATA_TYPE[i]                                           \
+		+ src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-5.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-5.c
@ -0,0 +1,41 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-5.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+	    == (dest2_##DATA_TYPE[i]                                           \
+		+ src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-6.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-6.c
@ -0,0 +1,41 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-6.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+	    == (dest2_##DATA_TYPE[i]                                           \
+		+ src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-7.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-7.c
@ -0,0 +1,41 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-7.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+	    == (dest2_##DATA_TYPE[i]                                           \
+		+ src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-8.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-8.c
@ -0,0 +1,41 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-8.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+	    == (dest2_##DATA_TYPE[i]                                           \
+		+ src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-9.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-9.c
@ -0,0 +1,41 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-9.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+	    == (dest2_##DATA_TYPE[i]                                           \
+		+ src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-1.c
@ -0,0 +1,39 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 uint8_t
+#define INDEX16 uint16_t
+#define INDEX32 uint32_t
+#define INDEX64 uint64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-10.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-10.c
@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX64 int64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 64)                                                               \
+  T (uint8_t, 64)                                                              \
+  T (int16_t, 64)                                                              \
+  T (uint16_t, 64)                                                             \
+  T (_Float16, 64)                                                             \
+  T (int32_t, 64)                                                              \
+  T (uint32_t, 64)                                                             \
+  T (float, 64)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-11.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-11.c
@ -0,0 +1,116 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-vect-cost-model -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_LOOP(DATA_TYPE, INDEX_TYPE)                                       \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE##_##INDEX_TYPE (DATA_TYPE *restrict y, DATA_TYPE *restrict x,  \
+				INDEX_TYPE *restrict index,                    \
+				INDEX_TYPE *restrict cond)                     \
+  {                                                                            \
+    for (int i = 0; i < 100; ++i)                                              \
+      {                                                                        \
+	if (cond[i * 2])                                                       \
+	  y[i * 2] = x[index[i * 2]] + 1;                                      \
+	if (cond[i * 2 + 1])                                                   \
+	  y[i * 2 + 1] = x[index[i * 2 + 1]] + 2;                              \
+      }                                                                        \
+  }
+
+TEST_LOOP (int8_t, int8_t)
+TEST_LOOP (uint8_t, int8_t)
+TEST_LOOP (int16_t, int8_t)
+TEST_LOOP (uint16_t, int8_t)
+TEST_LOOP (int32_t, int8_t)
+TEST_LOOP (uint32_t, int8_t)
+TEST_LOOP (int64_t, int8_t)
+TEST_LOOP (uint64_t, int8_t)
+TEST_LOOP (_Float16, int8_t)
+TEST_LOOP (float, int8_t)
+TEST_LOOP (double, int8_t)
+TEST_LOOP (int8_t, int16_t)
+TEST_LOOP (uint8_t, int16_t)
+TEST_LOOP (int16_t, int16_t)
+TEST_LOOP (uint16_t, int16_t)
+TEST_LOOP (int32_t, int16_t)
+TEST_LOOP (uint32_t, int16_t)
+TEST_LOOP (int64_t, int16_t)
+TEST_LOOP (uint64_t, int16_t)
+TEST_LOOP (_Float16, int16_t)
+TEST_LOOP (float, int16_t)
+TEST_LOOP (double, int16_t)
+TEST_LOOP (int8_t, int32_t)
+TEST_LOOP (uint8_t, int32_t)
+TEST_LOOP (int16_t, int32_t)
+TEST_LOOP (uint16_t, int32_t)
+TEST_LOOP (int32_t, int32_t)
+TEST_LOOP (uint32_t, int32_t)
+TEST_LOOP (int64_t, int32_t)
+TEST_LOOP (uint64_t, int32_t)
+TEST_LOOP (_Float16, int32_t)
+TEST_LOOP (float, int32_t)
+TEST_LOOP (double, int32_t)
+TEST_LOOP (int8_t, int64_t)
+TEST_LOOP (uint8_t, int64_t)
+TEST_LOOP (int16_t, int64_t)
+TEST_LOOP (uint16_t, int64_t)
+TEST_LOOP (int32_t, int64_t)
+TEST_LOOP (uint32_t, int64_t)
+TEST_LOOP (int64_t, int64_t)
+TEST_LOOP (uint64_t, int64_t)
+TEST_LOOP (_Float16, int64_t)
+TEST_LOOP (float, int64_t)
+TEST_LOOP (double, int64_t)
+TEST_LOOP (int8_t, uint8_t)
+TEST_LOOP (uint8_t, uint8_t)
+TEST_LOOP (int16_t, uint8_t)
+TEST_LOOP (uint16_t, uint8_t)
+TEST_LOOP (int32_t, uint8_t)
+TEST_LOOP (uint32_t, uint8_t)
+TEST_LOOP (int64_t, uint8_t)
+TEST_LOOP (uint64_t, uint8_t)
+TEST_LOOP (_Float16, uint8_t)
+TEST_LOOP (float, uint8_t)
+TEST_LOOP (double, uint8_t)
+TEST_LOOP (int8_t, uint16_t)
+TEST_LOOP (uint8_t, uint16_t)
+TEST_LOOP (int16_t, uint16_t)
+TEST_LOOP (uint16_t, uint16_t)
+TEST_LOOP (int32_t, uint16_t)
+TEST_LOOP (uint32_t, uint16_t)
+TEST_LOOP (int64_t, uint16_t)
+TEST_LOOP (uint64_t, uint16_t)
+TEST_LOOP (_Float16, uint16_t)
+TEST_LOOP (float, uint16_t)
+TEST_LOOP (double, uint16_t)
+TEST_LOOP (int8_t, uint32_t)
+TEST_LOOP (uint8_t, uint32_t)
+TEST_LOOP (int16_t, uint32_t)
+TEST_LOOP (uint16_t, uint32_t)
+TEST_LOOP (int32_t, uint32_t)
+TEST_LOOP (uint32_t, uint32_t)
+TEST_LOOP (int64_t, uint32_t)
+TEST_LOOP (uint64_t, uint32_t)
+TEST_LOOP (_Float16, uint32_t)
+TEST_LOOP (float, uint32_t)
+TEST_LOOP (double, uint32_t)
+TEST_LOOP (int8_t, uint64_t)
+TEST_LOOP (uint8_t, uint64_t)
+TEST_LOOP (int16_t, uint64_t)
+TEST_LOOP (uint16_t, uint64_t)
+TEST_LOOP (int32_t, uint64_t)
+TEST_LOOP (uint32_t, uint64_t)
+TEST_LOOP (int64_t, uint64_t)
+TEST_LOOP (uint64_t, uint64_t)
+TEST_LOOP (_Float16, uint64_t)
+TEST_LOOP (float, uint64_t)
+TEST_LOOP (double, uint64_t)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 88 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-assembler-not "vluxei64\.v" } } */
+/* { dg-final { scan-assembler-not "vsuxei64\.v" } } */
+/* { dg-final { scan-assembler-not {vlse64\.v\s+v[0-9]+,\s*0\([a-x0-9]+\),\s*zero} } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-2.c
@ -0,0 +1,39 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 int8_t
+#define INDEX16 int16_t
+#define INDEX32 int32_t
+#define INDEX64 int64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-3.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-3.c
@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 uint8_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 8)                                                               \
+  T (uint16_t, 8)                                                              \
+  T (_Float16, 8)                                                              \
+  T (int32_t, 8)                                                               \
+  T (uint32_t, 8)                                                              \
+  T (float, 8)                                                                 \
+  T (int64_t, 8)                                                               \
+  T (uint64_t, 8)                                                              \
+  T (double, 8)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-4.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-4.c
@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 int8_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 8)                                                               \
+  T (uint16_t, 8)                                                              \
+  T (_Float16, 8)                                                              \
+  T (int32_t, 8)                                                               \
+  T (uint32_t, 8)                                                              \
+  T (float, 8)                                                                 \
+  T (int64_t, 8)                                                               \
+  T (uint64_t, 8)                                                              \
+  T (double, 8)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-5.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-5.c
@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX16 uint16_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 16)                                                               \
+  T (uint8_t, 16)                                                              \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 16)                                                              \
+  T (uint32_t, 16)                                                             \
+  T (float, 16)                                                                \
+  T (int64_t, 16)                                                              \
+  T (uint64_t, 16)                                                             \
+  T (double, 16)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-6.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-6.c
@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX16 int16_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 16)                                                               \
+  T (uint8_t, 16)                                                              \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 16)                                                              \
+  T (uint32_t, 16)                                                             \
+  T (float, 16)                                                                \
+  T (int64_t, 16)                                                              \
+  T (uint64_t, 16)                                                             \
+  T (double, 16)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-7.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-7.c
@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX32 uint32_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 32)                                                               \
+  T (uint8_t, 32)                                                              \
+  T (int16_t, 32)                                                              \
+  T (uint16_t, 32)                                                             \
+  T (_Float16, 32)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 32)                                                              \
+  T (uint64_t, 32)                                                             \
+  T (double, 32)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-8.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-8.c
@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX32 int32_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 32)                                                               \
+  T (uint8_t, 32)                                                              \
+  T (int16_t, 32)                                                              \
+  T (uint16_t, 32)                                                             \
+  T (_Float16, 32)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 32)                                                              \
+  T (uint64_t, 32)                                                             \
+  T (double, 32)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-9.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-9.c
@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX64 uint64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 64)                                                               \
+  T (uint8_t, 64)                                                              \
+  T (int16_t, 64)                                                              \
+  T (uint16_t, 64)                                                             \
+  T (_Float16, 64)                                                             \
+  T (int32_t, 64)                                                              \
+  T (uint32_t, 64)                                                             \
+  T (float, 64)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-1.c
@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_gather_load-1.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[i]                                            \
+		== (dest2_##DATA_TYPE[i]                                       \
+		    + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-10.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-10.c
@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_gather_load-10.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[i]                                            \
+		== (dest2_##DATA_TYPE[i]                                       \
+		    + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c
@ -0,0 +1,140 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "-mcmodel=medany" } */
+
+#include "mask_gather_load-11.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, INDEX_TYPE)                                        \
+  DATA_TYPE dest_##DATA_TYPE##_##INDEX_TYPE[202] = {0};                        \
+  DATA_TYPE dest2_##DATA_TYPE##_##INDEX_TYPE[202] = {0};                       \
+  DATA_TYPE src_##DATA_TYPE##_##INDEX_TYPE[202] = {0};                         \
+  INDEX_TYPE index_##DATA_TYPE##_##INDEX_TYPE[202] = {0};                      \
+  INDEX_TYPE cond_##DATA_TYPE##_##INDEX_TYPE[202] = {0};                       \
+  for (int i = 0; i < 202; i++)                                                \
+    {                                                                          \
+      src_##DATA_TYPE##_##INDEX_TYPE[i]                                        \
+	= (DATA_TYPE) ((i * 19 + 735) & (sizeof (DATA_TYPE) * 7 - 1));         \
+      dest_##DATA_TYPE##_##INDEX_TYPE[i]                                       \
+	= (DATA_TYPE) ((i * 7 + 666) & (sizeof (DATA_TYPE) * 5 - 1));          \
+      dest2_##DATA_TYPE##_##INDEX_TYPE[i]                                      \
+	= (DATA_TYPE) ((i * 7 + 666) & (sizeof (DATA_TYPE) * 5 - 1));          \
+      index_##DATA_TYPE##_##INDEX_TYPE[i] = (i * 7) % (55);                    \
+      cond_##DATA_TYPE##_##INDEX_TYPE[i] = (INDEX_TYPE) ((i & 0x3) == 3);      \
+    }                                                                          \
+  f_##DATA_TYPE##_##INDEX_TYPE (dest_##DATA_TYPE##_##INDEX_TYPE,               \
+				src_##DATA_TYPE##_##INDEX_TYPE,                \
+				index_##DATA_TYPE##_##INDEX_TYPE,              \
+				cond_##DATA_TYPE##_##INDEX_TYPE);              \
+  for (int i = 0; i < 100; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##INDEX_TYPE[i * 2])                              \
+	assert (dest_##DATA_TYPE##_##INDEX_TYPE[i * 2]                         \
+		== (src_##DATA_TYPE##_##INDEX_TYPE                             \
+		      [index_##DATA_TYPE##_##INDEX_TYPE[i * 2]]                \
+		    + 1));                                                     \
+      else                                                                     \
+	assert (dest_##DATA_TYPE##_##INDEX_TYPE[i * 2]                         \
+		== dest2_##DATA_TYPE##_##INDEX_TYPE[i * 2]);                   \
+      if (cond_##DATA_TYPE##_##INDEX_TYPE[i * 2 + 1])                          \
+	assert (dest_##DATA_TYPE##_##INDEX_TYPE[i * 2 + 1]                     \
+		== (src_##DATA_TYPE##_##INDEX_TYPE                             \
+		      [index_##DATA_TYPE##_##INDEX_TYPE[i * 2 + 1]]            \
+		    + 2));                                                     \
+      else                                                                     \
+	assert (dest_##DATA_TYPE##_##INDEX_TYPE[i * 2 + 1]                     \
+		== dest2_##DATA_TYPE##_##INDEX_TYPE[i * 2 + 1]);               \
+    }
+
+  RUN_LOOP (int8_t, int8_t)
+  RUN_LOOP (uint8_t, int8_t)
+  RUN_LOOP (int16_t, int8_t)
+  RUN_LOOP (uint16_t, int8_t)
+  RUN_LOOP (int32_t, int8_t)
+  RUN_LOOP (uint32_t, int8_t)
+  RUN_LOOP (int64_t, int8_t)
+  RUN_LOOP (uint64_t, int8_t)
+  RUN_LOOP (_Float16, int8_t)
+  RUN_LOOP (float, int8_t)
+  RUN_LOOP (double, int8_t)
+  RUN_LOOP (int8_t, int16_t)
+  RUN_LOOP (uint8_t, int16_t)
+  RUN_LOOP (int16_t, int16_t)
+  RUN_LOOP (uint16_t, int16_t)
+  RUN_LOOP (int32_t, int16_t)
+  RUN_LOOP (uint32_t, int16_t)
+  RUN_LOOP (int64_t, int16_t)
+  RUN_LOOP (uint64_t, int16_t)
+  RUN_LOOP (_Float16, int16_t)
+  RUN_LOOP (float, int16_t)
+  RUN_LOOP (double, int16_t)
+  RUN_LOOP (int8_t, int32_t)
+  RUN_LOOP (uint8_t, int32_t)
+  RUN_LOOP (int16_t, int32_t)
+  RUN_LOOP (uint16_t, int32_t)
+  RUN_LOOP (int32_t, int32_t)
+  RUN_LOOP (uint32_t, int32_t)
+  RUN_LOOP (int64_t, int32_t)
+  RUN_LOOP (uint64_t, int32_t)
+  RUN_LOOP (_Float16, int32_t)
+  RUN_LOOP (float, int32_t)
+  RUN_LOOP (double, int32_t)
+  RUN_LOOP (int8_t, int64_t)
+  RUN_LOOP (uint8_t, int64_t)
+  RUN_LOOP (int16_t, int64_t)
+  RUN_LOOP (uint16_t, int64_t)
+  RUN_LOOP (int32_t, int64_t)
+  RUN_LOOP (uint32_t, int64_t)
+  RUN_LOOP (int64_t, int64_t)
+  RUN_LOOP (uint64_t, int64_t)
+  RUN_LOOP (_Float16, int64_t)
+  RUN_LOOP (float, int64_t)
+  RUN_LOOP (double, int64_t)
+  RUN_LOOP (int8_t, uint8_t)
+  RUN_LOOP (uint8_t, uint8_t)
+  RUN_LOOP (int16_t, uint8_t)
+  RUN_LOOP (uint16_t, uint8_t)
+  RUN_LOOP (int32_t, uint8_t)
+  RUN_LOOP (uint32_t, uint8_t)
+  RUN_LOOP (int64_t, uint8_t)
+  RUN_LOOP (uint64_t, uint8_t)
+  RUN_LOOP (_Float16, uint8_t)
+  RUN_LOOP (float, uint8_t)
+  RUN_LOOP (double, uint8_t)
+  RUN_LOOP (int8_t, uint16_t)
+  RUN_LOOP (uint8_t, uint16_t)
+  RUN_LOOP (int16_t, uint16_t)
+  RUN_LOOP (uint16_t, uint16_t)
+  RUN_LOOP (int32_t, uint16_t)
+  RUN_LOOP (uint32_t, uint16_t)
+  RUN_LOOP (int64_t, uint16_t)
+  RUN_LOOP (uint64_t, uint16_t)
+  RUN_LOOP (_Float16, uint16_t)
+  RUN_LOOP (float, uint16_t)
+  RUN_LOOP (double, uint16_t)
+  RUN_LOOP (int8_t, uint32_t)
+  RUN_LOOP (uint8_t, uint32_t)
+  RUN_LOOP (int16_t, uint32_t)
+  RUN_LOOP (uint16_t, uint32_t)
+  RUN_LOOP (int32_t, uint32_t)
+  RUN_LOOP (uint32_t, uint32_t)
+  RUN_LOOP (int64_t, uint32_t)
+  RUN_LOOP (uint64_t, uint32_t)
+  RUN_LOOP (_Float16, uint32_t)
+  RUN_LOOP (float, uint32_t)
+  RUN_LOOP (double, uint32_t)
+  RUN_LOOP (int8_t, uint64_t)
+  RUN_LOOP (uint8_t, uint64_t)
+  RUN_LOOP (int16_t, uint64_t)
+  RUN_LOOP (uint16_t, uint64_t)
+  RUN_LOOP (int32_t, uint64_t)
+  RUN_LOOP (uint32_t, uint64_t)
+  RUN_LOOP (int64_t, uint64_t)
+  RUN_LOOP (uint64_t, uint64_t)
+  RUN_LOOP (_Float16, uint64_t)
+  RUN_LOOP (float, uint64_t)
+  RUN_LOOP (double, uint64_t)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-2.c
@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_gather_load-2.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[i]                                            \
+		== (dest2_##DATA_TYPE[i]                                       \
+		    + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-3.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-3.c
@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_gather_load-3.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[i]                                            \
+		== (dest2_##DATA_TYPE[i]                                       \
+		    + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-4.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-4.c
@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_gather_load-4.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[i]                                            \
+		== (dest2_##DATA_TYPE[i]                                       \
+		    + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-5.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-5.c
@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_gather_load-5.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[i]                                            \
+		== (dest2_##DATA_TYPE[i]                                       \
+		    + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-6.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-6.c
@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_gather_load-6.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[i]                                            \
+		== (dest2_##DATA_TYPE[i]                                       \
+		    + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-7.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-7.c
@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "-mcmodel=medany" } */
+
+#include "mask_gather_load-7.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[i]                                            \
+		== (dest2_##DATA_TYPE[i]                                       \
+		    + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-8.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-8.c
@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "-mcmodel=medany" } */
+
+#include "mask_gather_load-8.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[i]                                            \
+		== (dest2_##DATA_TYPE[i]                                       \
+		    + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-9.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-9.c
@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_gather_load-9.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[i]                                            \
+		== (dest2_##DATA_TYPE[i]                                       \
+		    + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-1.c
@ -0,0 +1,39 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 uint8_t
+#define INDEX16 uint16_t
+#define INDEX32 uint32_t
+#define INDEX64 uint64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-10.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-10.c
@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX64 int64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 64)                                                                \
+  T (uint8_t, 64)                                                               \
+  T (int16_t, 64)                                                              \
+  T (uint16_t, 64)                                                             \
+  T (_Float16, 64)                                                             \
+  T (int32_t, 64)                                                              \
+  T (uint32_t, 64)                                                             \
+  T (float, 64)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-2.c
@ -0,0 +1,39 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 int8_t
+#define INDEX16 int16_t
+#define INDEX32 int32_t
+#define INDEX64 int64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-3.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-3.c
@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 uint8_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 8)                                                              \
+  T (uint16_t, 8)                                                             \
+  T (_Float16, 8)                                                             \
+  T (int32_t, 8)                                                              \
+  T (uint32_t, 8)                                                             \
+  T (float, 8)                                                                \
+  T (int64_t, 8)                                                              \
+  T (uint64_t, 8)                                                             \
+  T (double, 8)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-4.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-4.c
@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 int8_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 8)                                                              \
+  T (uint16_t, 8)                                                             \
+  T (_Float16, 8)                                                             \
+  T (int32_t, 8)                                                              \
+  T (uint32_t, 8)                                                             \
+  T (float, 8)                                                                \
+  T (int64_t, 8)                                                              \
+  T (uint64_t, 8)                                                             \
+  T (double, 8)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-5.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-5.c
@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX16 uint16_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 16)                                                                \
+  T (uint8_t, 16)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 16)                                                              \
+  T (uint32_t, 16)                                                             \
+  T (float, 16)                                                                \
+  T (int64_t, 16)                                                              \
+  T (uint64_t, 16)                                                             \
+  T (double, 16)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-6.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-6.c
@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX16 int16_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 16)                                                                \
+  T (uint8_t, 16)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 16)                                                              \
+  T (uint32_t, 16)                                                             \
+  T (float, 16)                                                                \
+  T (int64_t, 16)                                                              \
+  T (uint64_t, 16)                                                             \
+  T (double, 16)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-7.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-7.c
@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX32 uint32_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 32)                                                                \
+  T (uint8_t, 32)                                                               \
+  T (int16_t, 32)                                                              \
+  T (uint16_t, 32)                                                             \
+  T (_Float16, 32)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 32)                                                              \
+  T (uint64_t, 32)                                                             \
+  T (double, 32)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-8.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-8.c
@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX32 int32_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 32)                                                                \
+  T (uint8_t, 32)                                                               \
+  T (int16_t, 32)                                                              \
+  T (uint16_t, 32)                                                             \
+  T (_Float16, 32)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 32)                                                              \
+  T (uint64_t, 32)                                                             \
+  T (double, 32)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-9.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-9.c
@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX64 uint64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 64)                                                                \
+  T (uint8_t, 64)                                                               \
+  T (int16_t, 64)                                                              \
+  T (uint16_t, 64)                                                             \
+  T (_Float16, 64)                                                             \
+  T (int32_t, 64)                                                              \
+  T (uint32_t, 64)                                                             \
+  T (float, 64)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-1.c
@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-1.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-10.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-10.c
@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-10.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-2.c
@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-2.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-3.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-3.c
@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-3.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-4.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-4.c
@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-4.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-5.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-5.c
@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-5.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-6.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-6.c
@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-6.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-7.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-7.c
@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "-mcmodel=medany" } */
+
+#include "mask_scatter_store-7.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-8.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-8.c
@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-8.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-9.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-9.c
@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-9.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-1.c
@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 uint8_t
+#define INDEX16 uint16_t
+#define INDEX32 uint32_t
+#define INDEX64 uint64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-10.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-10.c
@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX64 int64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 64)                                                                \
+  T (uint8_t, 64)                                                               \
+  T (int16_t, 64)                                                               \
+  T (uint16_t, 64)                                                              \
+  T (_Float16, 64)                                                              \
+  T (int32_t, 64)                                                               \
+  T (uint32_t, 64)                                                              \
+  T (float, 64)                                                                 \
+  T (int64_t, 64)                                                               \
+  T (uint64_t, 64)                                                              \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-2.c
@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 int8_t
+#define INDEX16 int16_t
+#define INDEX32 int32_t
+#define INDEX64 int64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-3.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-3.c
@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 uint8_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 8)                                                               \
+  T (uint16_t, 8)                                                              \
+  T (_Float16, 8)                                                              \
+  T (int32_t, 8)                                                               \
+  T (uint32_t, 8)                                                              \
+  T (float, 8)                                                                 \
+  T (int64_t, 8)                                                               \
+  T (uint64_t, 8)                                                              \
+  T (double, 8)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-4.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-4.c
@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 int8_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 8)                                                               \
+  T (uint16_t, 8)                                                              \
+  T (_Float16, 8)                                                              \
+  T (int32_t, 8)                                                               \
+  T (uint32_t, 8)                                                              \
+  T (float, 8)                                                                 \
+  T (int64_t, 8)                                                               \
+  T (uint64_t, 8)                                                              \
+  T (double, 8)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-5.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-5.c
@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX16 uint16_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 16)                                                                \
+  T (uint8_t, 16)                                                               \
+  T (int16_t, 16)                                                               \
+  T (uint16_t, 16)                                                              \
+  T (_Float16, 16)                                                              \
+  T (int32_t, 16)                                                               \
+  T (uint32_t, 16)                                                              \
+  T (float, 16)                                                                 \
+  T (int64_t, 16)                                                               \
+  T (uint64_t, 16)                                                              \
+  T (double, 16)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-6.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-6.c
@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX16 int16_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 16)                                                                \
+  T (uint8_t, 16)                                                               \
+  T (int16_t, 16)                                                               \
+  T (uint16_t, 16)                                                              \
+  T (_Float16, 16)                                                              \
+  T (int32_t, 16)                                                               \
+  T (uint32_t, 16)                                                              \
+  T (float, 16)                                                                 \
+  T (int64_t, 16)                                                               \
+  T (uint64_t, 16)                                                              \
+  T (double, 16)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-7.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-7.c
@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX32 uint32_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 32)                                                                \
+  T (uint8_t, 32)                                                               \
+  T (int16_t, 32)                                                               \
+  T (uint16_t, 32)                                                              \
+  T (_Float16, 32)                                                              \
+  T (int32_t, 32)                                                               \
+  T (uint32_t, 32)                                                              \
+  T (float, 32)                                                                 \
+  T (int64_t, 32)                                                               \
+  T (uint64_t, 32)                                                              \
+  T (double, 32)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-8.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-8.c
@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX32 int32_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 32)                                                                \
+  T (uint8_t, 32)                                                               \
+  T (int16_t, 32)                                                               \
+  T (uint16_t, 32)                                                              \
+  T (_Float16, 32)                                                              \
+  T (int32_t, 32)                                                               \
+  T (uint32_t, 32)                                                              \
+  T (float, 32)                                                                 \
+  T (int64_t, 32)                                                               \
+  T (uint64_t, 32)                                                              \
+  T (double, 32)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-9.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-9.c
@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX64 uint64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 64)                                                                \
+  T (uint8_t, 64)                                                               \
+  T (int16_t, 64)                                                               \
+  T (uint16_t, 64)                                                              \
+  T (_Float16, 64)                                                              \
+  T (int32_t, 64)                                                               \
+  T (uint32_t, 64)                                                              \
+  T (float, 64)                                                                 \
+  T (int64_t, 64)                                                               \
+  T (uint64_t, 64)                                                              \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-1.c
@ -0,0 +1,40 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "scatter_store-1.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+	    == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-10.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-10.c
@ -0,0 +1,40 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "scatter_store-10.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+	    == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-2.c
@ -0,0 +1,40 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "-mcmodel=medany" } */
+
+#include "scatter_store-2.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+	    == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-3.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-3.c
@ -0,0 +1,40 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "scatter_store-3.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+	    == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-4.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-4.c
@ -0,0 +1,40 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "scatter_store-4.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+	    == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-5.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-5.c
@ -0,0 +1,40 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "scatter_store-5.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+	    == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-6.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-6.c
@ -0,0 +1,40 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "scatter_store-6.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+	    == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-7.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-7.c
@ -0,0 +1,40 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "scatter_store-7.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+	    == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-8.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-8.c
@ -0,0 +1,40 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "scatter_store-8.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+	    == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-9.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-9.c
@ -0,0 +1,40 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "-mcmodel=medany" } */
+
+#include "scatter_store-9.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+	    == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c
@ -0,0 +1,45 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3 --param riscv-autovec-preference=scalable -fno-vect-cost-model -ffast-math -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+#ifndef INDEX8
+#define INDEX8 int8_t
+#define INDEX16 int16_t
+#define INDEX32 int32_t
+#define INDEX64 int64_t
+#endif
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,   \
+			  INDEX##BITS stride, INDEX##BITS n)                   \
+  {                                                                            \
+    for (INDEX##BITS i = 0; i < n; ++i)                                        \
+      dest[i] += src[i * stride];                                              \
+  }
+
+#define TEST_TYPE(T, DATA_TYPE)                                                \
+  T (DATA_TYPE, 8)                                                             \
+  T (DATA_TYPE, 16)                                                            \
+  T (DATA_TYPE, 32)                                                            \
+  T (DATA_TYPE, 64)
+
+#define TEST_ALL(T)                                                            \
+  TEST_TYPE (T, int8_t)                                                        \
+  TEST_TYPE (T, uint8_t)                                                       \
+  TEST_TYPE (T, int16_t)                                                       \
+  TEST_TYPE (T, uint16_t)                                                      \
+  TEST_TYPE (T, _Float16)                                                      \
+  TEST_TYPE (T, int32_t)                                                       \
+  TEST_TYPE (T, uint32_t)                                                      \
+  TEST_TYPE (T, float)                                                         \
+  TEST_TYPE (T, int64_t)                                                       \
+  TEST_TYPE (T, uint64_t)                                                      \
+  TEST_TYPE (T, double)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times " \.LEN_MASK_GATHER_LOAD" 66 "optimized" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "optimized" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "optimized" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c
@ -0,0 +1,45 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3 --param riscv-autovec-preference=scalable -fno-vect-cost-model -ffast-math -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+#ifndef INDEX8
+#define INDEX8 int8_t
+#define INDEX16 int16_t
+#define INDEX32 int32_t
+#define INDEX64 int64_t
+#endif
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,   \
+			  INDEX##BITS stride, INDEX##BITS n)                   \
+  {                                                                            \
+    for (INDEX##BITS i = 0; i < (BITS + 13); ++i)                              \
+      dest[i] += src[i * (BITS - 3)];                                          \
+  }
+
+#define TEST_TYPE(T, DATA_TYPE)                                                \
+  T (DATA_TYPE, 8)                                                             \
+  T (DATA_TYPE, 16)                                                            \
+  T (DATA_TYPE, 32)                                                            \
+  T (DATA_TYPE, 64)
+
+#define TEST_ALL(T)                                                            \
+  TEST_TYPE (T, int8_t)                                                        \
+  TEST_TYPE (T, uint8_t)                                                       \
+  TEST_TYPE (T, int16_t)                                                       \
+  TEST_TYPE (T, uint16_t)                                                      \
+  TEST_TYPE (T, _Float16)                                                      \
+  TEST_TYPE (T, int32_t)                                                       \
+  TEST_TYPE (T, uint32_t)                                                      \
+  TEST_TYPE (T, float)                                                         \
+  TEST_TYPE (T, int64_t)                                                       \
+  TEST_TYPE (T, uint64_t)                                                      \
+  TEST_TYPE (T, double)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times " \.LEN_MASK_GATHER_LOAD" 46 "optimized" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "optimized" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "optimized" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c
@ -0,0 +1,84 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+#include "strided_load-1.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];               \
+  DATA_TYPE dest2_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];              \
+  DATA_TYPE src_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];                \
+  INDEX##BITS stride_##DATA_TYPE##_##BITS = (BITS - 3);                        \
+  INDEX##BITS n_##DATA_TYPE##_##BITS = (BITS + 13);                            \
+  for (INDEX##BITS i = 0;                                                      \
+       i < stride_##DATA_TYPE##_##BITS * n_##DATA_TYPE##_##BITS; i++)          \
+    {                                                                          \
+      dest_##DATA_TYPE##_##BITS[i]                                             \
+	= (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));                           \
+      dest2_##DATA_TYPE##_##BITS[i]                                            \
+	= (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));                           \
+      src_##DATA_TYPE##_##BITS[i]                                              \
+	= (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));                          \
+    }                                                                          \
+  f_##DATA_TYPE##_##BITS (dest_##DATA_TYPE##_##BITS, src_##DATA_TYPE##_##BITS, \
+			  stride_##DATA_TYPE##_##BITS,                         \
+			  n_##DATA_TYPE##_##BITS);                             \
+  for (int i = 0; i < n_##DATA_TYPE##_##BITS; i++)                             \
+    {                                                                          \
+      assert (                                                                 \
+	dest_##DATA_TYPE##_##BITS[i]                                           \
+	== (dest2_##DATA_TYPE##_##BITS[i]                                      \
+	    + src_##DATA_TYPE##_##BITS[i * stride_##DATA_TYPE##_##BITS]));     \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c
@ -0,0 +1,84 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+#include "strided_load-2.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];               \
+  DATA_TYPE dest2_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];              \
+  DATA_TYPE src_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];                \
+  INDEX##BITS stride_##DATA_TYPE##_##BITS = (BITS - 3);                        \
+  INDEX##BITS n_##DATA_TYPE##_##BITS = (BITS + 13);                            \
+  for (INDEX##BITS i = 0;                                                      \
+       i < stride_##DATA_TYPE##_##BITS * n_##DATA_TYPE##_##BITS; i++)          \
+    {                                                                          \
+      dest_##DATA_TYPE##_##BITS[i]                                             \
+	= (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));                           \
+      dest2_##DATA_TYPE##_##BITS[i]                                            \
+	= (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));                           \
+      src_##DATA_TYPE##_##BITS[i]                                              \
+	= (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));                          \
+    }                                                                          \
+  f_##DATA_TYPE##_##BITS (dest_##DATA_TYPE##_##BITS, src_##DATA_TYPE##_##BITS, \
+			  stride_##DATA_TYPE##_##BITS,                         \
+			  n_##DATA_TYPE##_##BITS);                             \
+  for (int i = 0; i < n_##DATA_TYPE##_##BITS; i++)                             \
+    {                                                                          \
+      assert (                                                                 \
+	dest_##DATA_TYPE##_##BITS[i]                                           \
+	== (dest2_##DATA_TYPE##_##BITS[i]                                      \
+	    + src_##DATA_TYPE##_##BITS[i * stride_##DATA_TYPE##_##BITS]));     \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c
@ -0,0 +1,45 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3 --param riscv-autovec-preference=scalable -fno-vect-cost-model -ffast-math -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+#ifndef INDEX8
+#define INDEX8 int8_t
+#define INDEX16 int16_t
+#define INDEX32 int32_t
+#define INDEX64 int64_t
+#endif
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,   \
+			  INDEX##BITS stride, INDEX##BITS n)                   \
+  {                                                                            \
+    for (INDEX##BITS i = 0; i < n; ++i)                                        \
+      dest[i * stride] = src[i] + BITS;                                        \
+  }
+
+#define TEST_TYPE(T, DATA_TYPE)                                                \
+  T (DATA_TYPE, 8)                                                             \
+  T (DATA_TYPE, 16)                                                            \
+  T (DATA_TYPE, 32)                                                            \
+  T (DATA_TYPE, 64)
+
+#define TEST_ALL(T)                                                            \
+  TEST_TYPE (T, int8_t)                                                        \
+  TEST_TYPE (T, uint8_t)                                                       \
+  TEST_TYPE (T, int16_t)                                                       \
+  TEST_TYPE (T, uint16_t)                                                      \
+  TEST_TYPE (T, _Float16)                                                      \
+  TEST_TYPE (T, int32_t)                                                       \
+  TEST_TYPE (T, uint32_t)                                                      \
+  TEST_TYPE (T, float)                                                         \
+  TEST_TYPE (T, int64_t)                                                       \
+  TEST_TYPE (T, uint64_t)                                                      \
+  TEST_TYPE (T, double)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times " \.LEN_MASK_SCATTER_STORE" 66 "optimized" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "optimized" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "optimized" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c
@ -0,0 +1,45 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3 --param riscv-autovec-preference=scalable -fno-vect-cost-model -ffast-math -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+#ifndef INDEX8
+#define INDEX8 int8_t
+#define INDEX16 int16_t
+#define INDEX32 int32_t
+#define INDEX64 int64_t
+#endif
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,   \
+			  INDEX##BITS stride, INDEX##BITS n)                   \
+  {                                                                            \
+    for (INDEX##BITS i = 0; i < n; ++i)                                        \
+      dest[i * (BITS - 3)] = src[i] + BITS;                                    \
+  }
+
+#define TEST_TYPE(T, DATA_TYPE)                                                \
+  T (DATA_TYPE, 8)                                                             \
+  T (DATA_TYPE, 16)                                                            \
+  T (DATA_TYPE, 32)                                                            \
+  T (DATA_TYPE, 64)
+
+#define TEST_ALL(T)                                                            \
+  TEST_TYPE (T, int8_t)                                                        \
+  TEST_TYPE (T, uint8_t)                                                       \
+  TEST_TYPE (T, int16_t)                                                       \
+  TEST_TYPE (T, uint16_t)                                                      \
+  TEST_TYPE (T, _Float16)                                                      \
+  TEST_TYPE (T, int32_t)                                                       \
+  TEST_TYPE (T, uint32_t)                                                      \
+  TEST_TYPE (T, float)                                                         \
+  TEST_TYPE (T, int64_t)                                                       \
+  TEST_TYPE (T, uint64_t)                                                      \
+  TEST_TYPE (T, double)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times " \.LEN_MASK_SCATTER_STORE" 44 "optimized" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "optimized" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "optimized" } } */
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c
@ -0,0 +1,82 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+#include "strided_store-1.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];               \
+  DATA_TYPE dest2_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];              \
+  DATA_TYPE src_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];                \
+  INDEX##BITS stride_##DATA_TYPE##_##BITS = (BITS - 3);                        \
+  INDEX##BITS n_##DATA_TYPE##_##BITS = (BITS + 13);                            \
+  for (INDEX##BITS i = 0;                                                      \
+       i < stride_##DATA_TYPE##_##BITS * n_##DATA_TYPE##_##BITS; i++)          \
+    {                                                                          \
+      dest_##DATA_TYPE##_##BITS[i]                                             \
+	= (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));                           \
+      dest2_##DATA_TYPE##_##BITS[i]                                            \
+	= (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));                           \
+      src_##DATA_TYPE##_##BITS[i]                                              \
+	= (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));                          \
+    }                                                                          \
+  f_##DATA_TYPE##_##BITS (dest_##DATA_TYPE##_##BITS, src_##DATA_TYPE##_##BITS, \
+			  stride_##DATA_TYPE##_##BITS,                         \
+			  n_##DATA_TYPE##_##BITS);                             \
+  for (int i = 0; i < n_##DATA_TYPE##_##BITS; i++)                             \
+    {                                                                          \
+      assert (dest_##DATA_TYPE##_##BITS[i * stride_##DATA_TYPE##_##BITS]       \
+	      == (src_##DATA_TYPE##_##BITS[i] + BITS));                        \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
--- a/Show more
+++ b/Show more