diff mbox series

RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS

Message ID 20230821015951.399184-1-juzhe.zhong@rivai.ai
State New
Headers show
Series RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS | expand

Commit Message

juzhe.zhong@rivai.ai Aug. 21, 2023, 1:59 a.m. UTC
This patch refactors the Phase 3 (Demand fusion) and rename it into Earliest fusion.
I do the refactor for the following reasons:
  
  1. Current implementation of phase 3 is doing too many things which makes the code quality
     quite messy and not easy to maintain.
  2. The demand fusion I do previously is we explicitly make the fusion including how to fuse
     VSETVLs, where to make the VSETVL fusion happens, check the VSETVL fusion point (location)
     whether it is correct and optimal...etc.

     We are dong these things too much so I added these following functions:

        enum fusion_type get_backward_fusion_type (const bb_info *,
					     const vector_insn_info &);
        bool hard_empty_block_p (const bb_info *, const vector_insn_info &) const;
        bool backward_demand_fusion (void);
        bool forward_demand_fusion (void);
        bool cleanup_illegal_dirty_blocks (void);

     to make sure the VSETV fusion is optimal and correct. I found in may downstream testing it is
     not the reliable and optimal approach.

     Instead, this patch is to use 'compute_earliest' which is the function of LCM to fuse multiple
     'compatible' VSETVL demand info if they are having same earliest edge.  We let LCM decide almost
     everything of demand fusion for us. The only thing we do (Not the LCM do) is just checking the
     VSETVLs demand info are compatible or not. That's all we need to do.
     I belive such approach is much more reliable and optimal than before (We have many testcases already to check this refactor patch).
  3. Using LCM approach to do the demand fusion is more reliable and better CFG than before.
  ...

Here is the basics of this patch approach:

Consider this following case:

for
  for 
    for
      ...
         for
	   if (...)
	     VSETVL 1 demand: RATIO = 32 and TU policy.
	   else if (...)
	     VSETVL 2 demand: SEW = 16.
	   else
	     VSETVL 3 demand: MU policy.

   - 'compute_earliest' which output the earliest edge of VSETVL 1, VSETVL 2 and VSETVL 3.
     They are having same earliest edge which is outside the 1th inner-most loop.
   
   - Then, we check these 3 VSETVL demand info are compatible so fuse them into a single VSETVL info:
     demand SEW = 16, LMUL = MF2, TU, MU.
   
   - Then the later phase (phase 4) LCM PRE (partial reduandancy elimination) will hoist such VSETVL
     to the outer-most loop. So that we can get optimal codegen.

This patch is depending on: https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627948.html

gcc/ChangeLog:

	* config/riscv/riscv-vsetvl.cc (vsetvl_vtype_change_only_p): New function.
	(find_reg_killed_by): Delete.
	(after_or_same_p): New function.
	(has_vsetvl_killed_avl_p):Delete.
	(anticipatable_occurrence_p): Adapt function.
	(get_same_bb_set): Delete.
	(any_set_in_bb_p): Ditto.
	(change_insn): Format.
	(ge_sew_ratio_unavailable_p): Fix bug.
	(backward_propagate_worthwhile_p): Delete.
	(vector_insn_info::parse_insn): Adapt function.
	(vector_insn_info::merge): Ditto.
	(vector_insn_info::dump): Ditto.
	(vector_infos_manager::vector_infos_manager): Refactor Phase 3.
	(vector_infos_manager::all_empty_predecessor_p): Delete.
	(vector_infos_manager::all_same_ratio_p): Refactor Phase 3.
	(vector_infos_manager::all_same_avl_p): Ditto.
	(vector_infos_manager::create_bitmap_vectors): Ditto.
	(vector_infos_manager::free_bitmap_vectors): Ditto.
	(vector_infos_manager::dump): Ditto.
	(pass_vsetvl::update_block_info): New function.
	(enum fusion_type): Refactor Phase 3.
	(pass_vsetvl::get_backward_fusion_type): Delete.
	(demands_can_be_fused_p): New function.
	(pass_vsetvl::hard_empty_block_p): Delete.
	(earliest_pred_can_be_fused_p): New function.
	(pass_vsetvl::backward_demand_fusion): Delete.
	(pass_vsetvl::earliest_fusion): New function.
	(pass_vsetvl::forward_demand_fusion): Delete.
	(pass_vsetvl::demand_fusion): Ditto.
	(pass_vsetvl::cleanup_illegal_dirty_blocks): Ditto.
	(pass_vsetvl::compute_local_properties): Adapt function.
	(pass_vsetvl::refine_vsetvls): Ditto.
	(pass_vsetvl::cleanup_vsetvls): Ditto.
	(pass_vsetvl::commit_vsetvls): Ditto.
	(pass_vsetvl::local_eliminate_vsetvl_insn): Ditto.
	(get_first_vsetvl_before_rvv_insns): Ditto.
	(pass_vsetvl::global_eliminate_vsetvl_insn): Ditto.
	(pass_vsetvl::cleanup_earliest_vsetvls): New function.
	(pass_vsetvl::df_post_optimization): Adapt function.
	(pass_vsetvl::compute_probabilities): Ditto.
	(pass_vsetvl::lazy_vsetvl): Ditto.
	* config/riscv/riscv-vsetvl.def (DEF_SEW_LMUL_FUSE_RULE): Fix bug.
	* config/riscv/riscv-vsetvl.h: Refactor Phase 3.
	* config/riscv/t-riscv: Add def into makefile list.
	* config/riscv/vector.md: Add attributes.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c: Adapt test.
	* gcc.target/riscv/rvv/base/vxrm-8.c: Ditto.
	* gcc.target/riscv/rvv/base/vxrm-9.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/avl_multiple-7.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/avl_multiple-8.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/avl_single-102.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/avl_single-14.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/avl_single-15.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/avl_single-27.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/avl_single-28.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/avl_single-29.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/avl_single-30.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/avl_single-35.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/avl_single-36.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/avl_single-46.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/avl_single-48.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/avl_single-50.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/avl_single-51.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/avl_single-6.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/avl_single-66.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/avl_single-67.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/avl_single-68.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/avl_single-69.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/avl_single-70.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/avl_single-71.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/avl_single-72.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/avl_single-76.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/avl_single-77.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/avl_single-82.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/avl_single-83.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/avl_single-84.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/avl_single-89.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/avl_single-93.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/avl_single-94.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/avl_single-95.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/avl_single-96.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/ffload-5.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/imm_bb_prop-3.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/imm_bb_prop-4.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/imm_bb_prop-9.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/imm_switch-7.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/imm_switch-9.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-45.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-1.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-11.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-3.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-4.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-1.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-16.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/vsetvl-11.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/vsetvlmax-2.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/vsetvlmax-4.c: Ditto.
	* gcc.target/riscv/rvv/vsetvl/avl_single-103.c: New test.

---
 gcc/config/riscv/riscv-vsetvl.cc              | 1395 ++++++-----------
 gcc/config/riscv/riscv-vsetvl.def             |    2 +-
 gcc/config/riscv/riscv-vsetvl.h               |   69 +-
 gcc/config/riscv/t-riscv                      |    3 +-
 gcc/config/riscv/vector.md                    |    6 +-
 .../gather-scatter/gather_load_run-12.c       |    6 +
 .../gcc.target/riscv/rvv/base/vxrm-8.c        |    2 +-
 .../gcc.target/riscv/rvv/base/vxrm-9.c        |    2 +-
 .../riscv/rvv/vsetvl/avl_multiple-7.c         |    2 +-
 .../riscv/rvv/vsetvl/avl_multiple-8.c         |    2 +-
 .../riscv/rvv/vsetvl/avl_single-102.c         |    1 +
 .../riscv/rvv/vsetvl/avl_single-103.c         |   27 +
 .../riscv/rvv/vsetvl/avl_single-14.c          |    2 +-
 .../riscv/rvv/vsetvl/avl_single-15.c          |    2 +-
 .../riscv/rvv/vsetvl/avl_single-27.c          |    2 +-
 .../riscv/rvv/vsetvl/avl_single-28.c          |    2 +-
 .../riscv/rvv/vsetvl/avl_single-29.c          |    2 +-
 .../riscv/rvv/vsetvl/avl_single-30.c          |    2 +-
 .../riscv/rvv/vsetvl/avl_single-35.c          |    1 +
 .../riscv/rvv/vsetvl/avl_single-36.c          |   14 +-
 .../riscv/rvv/vsetvl/avl_single-46.c          |    4 +-
 .../riscv/rvv/vsetvl/avl_single-48.c          |    2 +-
 .../riscv/rvv/vsetvl/avl_single-50.c          |    5 +-
 .../riscv/rvv/vsetvl/avl_single-51.c          |    5 +-
 .../riscv/rvv/vsetvl/avl_single-6.c           |    4 +-
 .../riscv/rvv/vsetvl/avl_single-66.c          |    2 +-
 .../riscv/rvv/vsetvl/avl_single-67.c          |    4 +-
 .../riscv/rvv/vsetvl/avl_single-68.c          |    4 +-
 .../riscv/rvv/vsetvl/avl_single-69.c          |    2 +-
 .../riscv/rvv/vsetvl/avl_single-70.c          |    2 +-
 .../riscv/rvv/vsetvl/avl_single-71.c          |    6 +-
 .../riscv/rvv/vsetvl/avl_single-72.c          |    4 +-
 .../riscv/rvv/vsetvl/avl_single-76.c          |    2 +-
 .../riscv/rvv/vsetvl/avl_single-77.c          |    2 +-
 .../riscv/rvv/vsetvl/avl_single-82.c          |    4 +-
 .../riscv/rvv/vsetvl/avl_single-83.c          |    4 +-
 .../riscv/rvv/vsetvl/avl_single-84.c          |    2 +-
 .../riscv/rvv/vsetvl/avl_single-89.c          |    4 +-
 .../riscv/rvv/vsetvl/avl_single-93.c          |    4 +-
 .../riscv/rvv/vsetvl/avl_single-94.c          |    4 +-
 .../riscv/rvv/vsetvl/avl_single-95.c          |    4 +-
 .../riscv/rvv/vsetvl/avl_single-96.c          |    4 +-
 .../gcc.target/riscv/rvv/vsetvl/ffload-5.c    |    2 +-
 .../riscv/rvv/vsetvl/imm_bb_prop-3.c          |    1 +
 .../riscv/rvv/vsetvl/imm_bb_prop-4.c          |    1 +
 .../riscv/rvv/vsetvl/imm_bb_prop-9.c          |    1 +
 .../riscv/rvv/vsetvl/imm_switch-7.c           |    2 +-
 .../riscv/rvv/vsetvl/imm_switch-9.c           |    4 +-
 .../riscv/rvv/vsetvl/vlmax_back_prop-45.c     |    1 -
 .../riscv/rvv/vsetvl/vlmax_bb_prop-1.c        |   18 +-
 .../riscv/rvv/vsetvl/vlmax_bb_prop-11.c       |    4 +-
 .../riscv/rvv/vsetvl/vlmax_bb_prop-3.c        |    2 +-
 .../riscv/rvv/vsetvl/vlmax_bb_prop-4.c        |   18 +-
 .../riscv/rvv/vsetvl/vlmax_conflict-7.c       |    2 +-
 .../riscv/rvv/vsetvl/vlmax_switch_vtype-1.c   |    2 +-
 .../riscv/rvv/vsetvl/vlmax_switch_vtype-16.c  |    2 +-
 .../gcc.target/riscv/rvv/vsetvl/vsetvl-11.c   |    2 +-
 .../gcc.target/riscv/rvv/vsetvl/vsetvlmax-2.c |    4 +-
 .../gcc.target/riscv/rvv/vsetvl/vsetvlmax-4.c |    4 +-
 59 files changed, 635 insertions(+), 1057 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-103.c

Comments

Robin Dapp Aug. 21, 2023, 3:23 p.m. UTC | #1
Hi Juzhe,

thanks, this is a reasonable approach and improves readability noticeably.
LGTM but I'd like to wait for other opinions (e.g. by Kito) as I haven't
looked closely into the vsetvl pass before and cannot entirely review it
quickly.  As we already have good test coverage there is not much that
can go wrong IMHO.

Regards
 Robin
Kito Cheng Aug. 21, 2023, 4:06 p.m. UTC | #2
I think I could do some details review tomorrow on the plane, I am free
from the meeting hell tomorrow :p



Robin Dapp via Gcc-patches <gcc-patches@gcc.gnu.org> 於 2023年8月21日 週一 23:24
寫道:

> Hi Juzhe,
>
> thanks, this is a reasonable approach and improves readability noticeably.
> LGTM but I'd like to wait for other opinions (e.g. by Kito) as I haven't
> looked closely into the vsetvl pass before and cannot entirely review it
> quickly.  As we already have good test coverage there is not much that
> can go wrong IMHO.
>
> Regards
>  Robin
>
Kito Cheng Aug. 22, 2023, 3:35 p.m. UTC | #3
It's really great improvement, it's drop some state like HARD_EMPTY
and DIRTY_WITH_KILLED_AVL which make this algorithm more easy to
understand!
also this also fundamentally improved the phase 3, although one
concern is the time complexity might be come more higher order,
(and it's already high enough in fact.)
but mostly those vectorized code are only appeard within the inner
most loop, so that is acceptable in generally

So I will try my best to review this closely to make it more close to
the perfect :)

I saw you has update serveral testcase, why update instead of add new testcase??
could you say more about why some testcase added __riscv_vadd_vv_i8mf8
or add some more dependency of vl variable?



> @@ -1423,8 +1409,13 @@ static bool
>  ge_sew_ratio_unavailable_p (const vector_insn_info &info1,
>                             const vector_insn_info &info2)
>  {
> -  if (!info2.demand_p (DEMAND_LMUL) && info2.demand_p (DEMAND_GE_SEW))
> -    return info1.get_sew () < info2.get_sew ();
> +  if (!info2.demand_p (DEMAND_LMUL))
> +    {
> +      if (info2.demand_p (DEMAND_GE_SEW))
> +       return info1.get_sew () < info2.get_sew ();
> +      else if (!info2.demand_p (DEMAND_SEW))
> +       return false;
> +    }

This seems relax the compatiblitly check to allow optimize more case,
if so this should be a sperated patch.

>    return true;
>  }


> @@ -1815,7 +1737,7 @@ vector_insn_info::parse_insn (rtx_insn *rinsn)
>      return;
>    if (optimize == 0 && !has_vtype_op (rinsn))
>      return;
> -  if (optimize > 0 && !vsetvl_insn_p (rinsn))
> +  if (optimize > 0 && vsetvl_discard_result_insn_p (rinsn))

I didn't get this change, could you explan few more about that? it was
early exit for non vsetvl insn, but now it allowed that now?

>      return;
>    m_state = VALID;
>    extract_insn_cached (rinsn);

> @@ -2206,9 +2128,9 @@ vector_insn_info::fuse_mask_policy (const vector_insn_info &info1,
>
>  vector_insn_info
>  vector_insn_info::merge (const vector_insn_info &merge_info,
> -                        enum merge_type type) const
> +                        enum merge_type type, unsigned bb_index) const
>  {
> -  if (!vsetvl_insn_p (get_insn ()->rtl ()))
> +  if (!vsetvl_insn_p (get_insn ()->rtl ()) && *this != merge_info)

Why need this exception?

>      gcc_assert (this->compatible_p (merge_info)
>                 && "Can't merge incompatible demanded infos");

> @@ -2403,18 +2348,22 @@ vector_infos_manager::get_all_available_exprs (
>  }
>
>  bool
> -vector_infos_manager::all_empty_predecessor_p (const basic_block cfg_bb) const
> +vector_infos_manager::earliest_fusion_worthwhile_p (
> +  const basic_block cfg_bb) const
>  {
> -  hash_set<basic_block> pred_cfg_bbs = get_all_predecessors (cfg_bb);
> -  for (const basic_block pred_cfg_bb : pred_cfg_bbs)
> +  edge e;
> +  edge_iterator ei;
> +  profile_probability prob = profile_probability::uninitialized ();
> +  FOR_EACH_EDGE (e, ei, cfg_bb->succs)
>      {
> -      const auto &pred_block_info = vector_block_infos[pred_cfg_bb->index];
> -      if (!pred_block_info.local_dem.valid_or_dirty_p ()
> -         && !pred_block_info.reaching_out.valid_or_dirty_p ())
> +      if (prob == profile_probability::uninitialized ())
> +       prob = vector_block_infos[e->dest->index].probability;
> +      else if (prob == vector_block_infos[e->dest->index].probability)
>         continue;
> -      return false;
> +      else
> +       return true;

Make sure I understand this correctly: it's worth if thoe edges has
different probability?

>      }
> -  return true;
> +  return false;

If all probability is same, then it's not worth?

Plz add few comments no matter my understand is right or not :)

>  }
>
>  bool

> @@ -2428,12 +2377,12 @@ vector_infos_manager::all_same_ratio_p (sbitmap bitdata) const
>    sbitmap_iterator sbi;
>
>    EXECUTE_IF_SET_IN_BITMAP (bitdata, 0, bb_index, sbi)
> -  {
> -    if (ratio == -1)
> -      ratio = vector_exprs[bb_index]->get_ratio ();
> -    else if (vector_exprs[bb_index]->get_ratio () != ratio)
> -      return false;
> -  }
> +    {
> +      if (ratio == -1)
> +       ratio = vector_exprs[bb_index]->get_ratio ();
> +      else if (vector_exprs[bb_index]->get_ratio () != ratio)
> +       return false;
> +    }
>    return true;
>  }

Split this into a NFC patch, you can commit that without asking review.

> @@ -907,8 +893,8 @@ change_insn (function_info *ssa, insn_change change, insn_info *insn,
>                 ] UNSPEC_VPREDICATE)
>             (plus:RVVM4DI (reg/v:RVVM4DI 104 v8 [orig:137 op1 ] [137])
>                 (sign_extend:RVVM4DI (vec_duplicate:RVVM4SI (reg:SI 15 a5
> -    [140])))) (unspec:RVVM4DI [ (const_int 0 [0]) ] UNSPEC_VUNDEF))) "rvv.c":8:12
> -    2784 {pred_single_widen_addsvnx8di_scalar} (expr_list:REG_EQUIV
> +    [140])))) (unspec:RVVM4DI [ (const_int 0 [0]) ] UNSPEC_VUNDEF)))
> +    "rvv.c":8:12 2784 {pred_single_widen_addsvnx8di_scalar} (expr_list:REG_EQUIV
>      (mem/c:RVVM4DI (reg:DI 10 a0 [142]) [1 <retval>+0 S[64, 64] A128])
>         (expr_list:REG_EQUAL (if_then_else:RVVM4DI (unspec:RVVMF8BI [
>                         (const_vector:RVVMF8BI repeat [

Split this into a NFC patch, you can commit that without asking review.


> @@ -2777,6 +2770,17 @@ pass_vsetvl::update_vector_info (const insn_info *i,
>    m_vector_manager->vector_insn_infos[i->uid ()] = new_info;
>  }
>
> +void
> +pass_vsetvl::update_block_info (int index, profile_probability prob,
> +                               vector_insn_info new_info)
const vector_insn_info &new_info


> +{
> +  m_vector_manager->vector_block_infos[index].probability = prob;
> +  if (m_vector_manager->vector_block_infos[index].local_dem
> +      == m_vector_manager->vector_block_infos[index].reaching_out)
> +    m_vector_manager->vector_block_infos[index].local_dem = new_info;
> +  m_vector_manager->vector_block_infos[index].reaching_out = new_info;
> +}
> +

{
  auto &block_info = m_vector_manager->vector_block_infos[index];
  block_info.probability = prob;
  if (block_info.local_dem == block_info.reaching_out)
    block_info.local_dem = new_info;
  block_info.reaching_out = new_info;
}

>  /* Simple m_vsetvl_insert vsetvl for optimize == 0.  */
>  void
>  pass_vsetvl::simple_vsetvl (void) const


> +      for (insn_info *i = earliest_pred->end_insn ()->prev_nondebug_insn ();
> +          real_insn_and_same_bb_p (i, earliest_pred)
> +          && after_or_same_p (i, last_insn);
> +          i = i->prev_nondebug_insn ())
>         {
> +         if (!vl && find_access (i->defs (), REGNO (avl)))
> +           return false;
> +         if (vl && find_access (i->defs (), REGNO (vl)))
> +           return false;
> +         if (vl && find_access (i->uses (), REGNO (vl)))
> +           return false;

should we check `i->is_call () || i->is_asm ()`?


> @@ -3892,7 +3408,7 @@ pass_vsetvl::refine_vsetvls (void) const
>    basic_block cfg_bb;
>    FOR_EACH_BB_FN (cfg_bb, cfun)
>      {
> -      auto info = get_block_info(cfg_bb).local_dem;
> +      auto info = get_block_info (cfg_bb).local_dem;
>        insn_info *insn = info.get_insn ();
>        if (!info.valid_p ())
>         continue;

Split this into a NFC patch, you can commit that without asking review.

> @@ -3938,8 +3454,7 @@ pass_vsetvl::cleanup_vsetvls ()
>    basic_block cfg_bb;
>    FOR_EACH_BB_FN (cfg_bb, cfun)
>      {
> -      auto &info
> -       = get_block_info(cfg_bb).reaching_out;
> +      auto &info = get_block_info (cfg_bb).reaching_out;
>        gcc_assert (m_vector_manager->expr_set_num (
>                     m_vector_manager->vector_del[cfg_bb->index])
>                   <= 1);

Split this into a NFC patch, you can commit that without asking review.

> @@ -3951,9 +3466,7 @@ pass_vsetvl::cleanup_vsetvls ()
>                 info.set_unknown ();
>               else
>                 {
> -                 const auto dem
> -                   = get_block_info(cfg_bb)
> -                       .local_dem;
> +                 const auto dem = get_block_info (cfg_bb).local_dem;
>                   gcc_assert (dem == *m_vector_manager->vector_exprs[i]);
>                   insn_info *insn = dem.get_insn ();
>                   gcc_assert (insn && insn->rtl ());

Split this into a NFC patch, you can commit that without asking review.

> @@ -4020,33 +3543,10 @@ pass_vsetvl::commit_vsetvls (void)
>    for (const bb_info *bb : crtl->ssa->bbs ())
>      {
>        basic_block cfg_bb = bb->cfg_bb ();
> -      const auto reaching_out
> -       = get_block_info(cfg_bb).reaching_out;
> +      const auto reaching_out = get_block_info (cfg_bb).reaching_out;

Split this into a NFC patch, you can commit that without asking review.

> @@ -4263,7 +3783,8 @@ pass_vsetvl::local_eliminate_vsetvl_insn (const bb_info *bb) const
>
>        /* Local AVL compatibility checking is simpler than global, we only
>          need to check the REGNO is same.  */
> -      if (prev_dem.valid_or_dirty_p () && prev_dem.skip_avl_compatible_p (curr_dem)
> +      if (prev_dem.valid_or_dirty_p ()
> +         && prev_dem.skip_avl_compatible_p (curr_dem)
>           && local_avl_compatible_p (prev_avl, curr_avl))
>         {
>           /* curr_dem and prev_dem is compatible!  */

Split this into a NFC patch, you can commit that without asking review.

>@@ -4655,8 +4240,7 @@ pass_vsetvl::compute_probabilities (void)
>   for (const bb_info *bb : crtl->ssa->bbs ())
>     {
>       basic_block cfg_bb = bb->cfg_bb ();
>-      auto &curr_prob
>-       = get_block_info(cfg_bb).probability;
>+      auto &curr_prob = get_block_info (cfg_bb).probability;
>
>       /* GCC assume entry block (bb 0) are always so
>         executed so set its probability as "always".  */

Split this into a NFC patch, you can commit that without asking review.

> @@ -4669,8 +4253,7 @@ pass_vsetvl::compute_probabilities (void)
>        gcc_assert (curr_prob.initialized_p ());
>        FOR_EACH_EDGE (e, ei, cfg_bb->succs)
>         {
> -         auto &new_prob
> -           = get_block_info(e->dest).probability;
> +         auto &new_prob = get_block_info (e->dest).probability;
>           if (!new_prob.initialized_p ())
>             new_prob = curr_prob * e->probability;
>           else if (new_prob == profile_probability::always ())


Split this into a NFC patch, you can commit that without asking review.


> @@ -4298,7 +3819,8 @@ pass_vsetvl::local_eliminate_vsetvl_insn (const bb_info *bb) const
>     none exists or if a user RVV instruction is enountered
>     prior to any vsetvl.  */
>  static rtx_insn *
> -get_first_vsetvl_before_rvv_insns (basic_block cfg_bb)
> +get_first_vsetvl_before_rvv_insns (basic_block cfg_bb,
> +                                  enum vsetvl_type insn_type)
>  {

add gcc_assert (insn_type == VSETVL_DISCARD_RESULT || insn_type ==
VSETVL_VTYPE_CHANGE_ONLY).

>    rtx_insn *rinsn;
>    FOR_BB_INSNS (cfg_bb, rinsn)
> @@ -4310,7 +3832,11 @@ get_first_vsetvl_before_rvv_insns (basic_block cfg_bb)
>        if (has_vtype_op (rinsn) || vsetvl_insn_p (rinsn))
>         return nullptr;
>
> -      if (vsetvl_discard_result_insn_p (rinsn))
> +      if (insn_type == VSETVL_DISCARD_RESULT
> +         && vsetvl_discard_result_insn_p (rinsn))
> +       return rinsn;
> +      if (insn_type == VSETVL_VTYPE_CHANGE_ONLY
> +         && vsetvl_vtype_change_only_p (rinsn))
>         return rinsn;
>      }
>    return nullptr;


> diff --git a/gcc/config/riscv/riscv-vsetvl.def b/gcc/config/riscv/riscv-vsetvl.def
> index 7a73149f1da..7289c01efcf 100644
> --- a/gcc/config/riscv/riscv-vsetvl.def
> +++ b/gcc/config/riscv/riscv-vsetvl.def
> @@ -319,7 +319,7 @@ DEF_SEW_LMUL_FUSE_RULE (/*SEW*/ DEMAND_TRUE, /*LMUL*/ DEMAND_FALSE,
>                         /*RATIO*/ DEMAND_TRUE, /*GE_SEW*/ DEMAND_FALSE,
>                         /*NEW_DEMAND_SEW*/ true,
>                         /*NEW_DEMAND_LMUL*/ false,
> -                       /*NEW_DEMAND_RATIO*/ false,
> +                       /*NEW_DEMAND_RATIO*/ true,

This seems relax the compatiblitly check to allow optimize more case,
if so this should be a sperated patch.

>                         /*NEW_DEMAND_GE_SEW*/ true, first_sew,
>                         vlmul_for_first_sew_second_ratio, second_ratio)
>  DEF_SEW_LMUL_FUSE_RULE (/*SEW*/ DEMAND_TRUE, /*LMUL*/ DEMAND_FALSE,


> @@ -386,7 +337,8 @@ public:
>    bool compatible_avl_p (const avl_info &) const;
>    bool compatible_vtype_p (const vl_vtype_info &) const;
>    bool compatible_p (const vl_vtype_info &) const;
> -  vector_insn_info merge (const vector_insn_info &, enum merge_type) const;
> +  vector_insn_info merge (const vector_insn_info &, enum merge_type,
> +                         unsigned = 0) const;

it seems weired to set bb_index as 0 by default?

>
>    rtl_ssa::insn_info *get_insn () const { return m_insn; }
>    const bool *get_demands (void) const { return m_demands; }


> diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
> index 1252d6f851a..f3ce66ccdd4 100644
> --- a/gcc/config/riscv/t-riscv
> +++ b/gcc/config/riscv/t-riscv
> @@ -62,7 +62,8 @@ riscv-vsetvl.o: $(srcdir)/config/riscv/riscv-vsetvl.cc \
>    $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(REGS_H) \
>    $(TARGET_H) tree-pass.h df.h rtl-ssa.h cfgcleanup.h insn-config.h \
>    insn-attr.h insn-opinit.h tm-constrs.h cfgrtl.h cfganal.h lcm.h \
> -  predict.h profile-count.h $(srcdir)/config/riscv/riscv-vsetvl.h
> +  predict.h profile-count.h $(srcdir)/config/riscv/riscv-vsetvl.h \
> +  $(srcdir)/config/riscv/riscv-vsetvl.def

This should be a seperate fix and backport to GCC 13 as well,
pre-approve for both master and GCC-13 branch for this fix.

>         $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
>                 $(srcdir)/config/riscv/riscv-vsetvl.cc
juzhe.zhong@rivai.ai Aug. 22, 2023, 11:02 p.m. UTC | #4
>> I saw you has update serveral testcase, why update instead of add new testcase??
Since original testcase failed after this patch.

>> could you say more about why some testcase added __riscv_vadd_vv_i8mf8
>> or add some more dependency of vl variable?
These are 2 separate questions.

1. Why some testcase added __riscv_vadd_vv_i8mf8.
    This is because the original testcase is too fragile and easily fail.
     Consider this following case:

      for (...)
         if (cond)
           vsetvl e8mf8
           load
           store
         else
           vsetvl e16mf4
           load
           store
This example, we know that both "e8mf8" and "e16mf4" are compatible, so we can either put a vsevl e8mf8 or vsetvli e16mf4 before the
for...loop and elide all vsetvlis inside the loop. 
Before this patch, the codegen result is vsetvli e8mf8, after this patch, the codegen result is vsetvli e16mf4.
They are both legal and optimal codegen.

To avoid future potential unnecessary test report failure, I added "vadd" which demand both SEW and LMUL and only allow e8mf8.
Such testcase doesn't change our testing goal, since our goal of this testcase is to test LCM ability of fusing VSETVL and compute the 
optimal location of vsetvl.

2. Why add some more dependency of vl variable ?
Well, as I told you previously.
HARD_EMPTY and DIRTY_WITH_KILLED_AVL is supposed to optimize this following
case:

li a6, 101.
vsetvli e8mf8
for ...
li a5,101
vsetvli e16mf4
for ...

This case happens since we set "li" cost too low that previous pass failed to optimized them.
I don't think we should optimize such corner case in VSETVL PASS which complicates the implementation seriously and 
mess up the code quality.

So after I remove them, the codegen for such case will generate one more "vsetvli" (only one more dynamic run-time instruction count).
I note if we make all "li" inside a loop, the issue will be gone and VSETVL PASS can achieve optimal codegen.

To fix this failure of such testcases, instead of "vl= 101", I make them "vl = a + 101", then the assembly check remain and pass.

Thanks.



juzhe.zhong@rivai.ai
 
From: Kito Cheng
Date: 2023-08-22 23:35
To: Kito Cheng
CC: Robin Dapp; Juzhe-Zhong; GCC Patches; Jeff Law
Subject: Re: [PATCH] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS
It's really great improvement, it's drop some state like HARD_EMPTY
and DIRTY_WITH_KILLED_AVL which make this algorithm more easy to
understand!
also this also fundamentally improved the phase 3, although one
concern is the time complexity might be come more higher order,
(and it's already high enough in fact.)
but mostly those vectorized code are only appeard within the inner
most loop, so that is acceptable in generally
 
So I will try my best to review this closely to make it more close to
the perfect :)
 
I saw you has update serveral testcase, why update instead of add new testcase??
could you say more about why some testcase added __riscv_vadd_vv_i8mf8
or add some more dependency of vl variable?
 
 
 
> @@ -1423,8 +1409,13 @@ static bool
>  ge_sew_ratio_unavailable_p (const vector_insn_info &info1,
>                             const vector_insn_info &info2)
>  {
> -  if (!info2.demand_p (DEMAND_LMUL) && info2.demand_p (DEMAND_GE_SEW))
> -    return info1.get_sew () < info2.get_sew ();
> +  if (!info2.demand_p (DEMAND_LMUL))
> +    {
> +      if (info2.demand_p (DEMAND_GE_SEW))
> +       return info1.get_sew () < info2.get_sew ();
> +      else if (!info2.demand_p (DEMAND_SEW))
> +       return false;
> +    }
 
This seems relax the compatiblitly check to allow optimize more case,
if so this should be a sperated patch.
 
>    return true;
>  }
 
 
> @@ -1815,7 +1737,7 @@ vector_insn_info::parse_insn (rtx_insn *rinsn)
>      return;
>    if (optimize == 0 && !has_vtype_op (rinsn))
>      return;
> -  if (optimize > 0 && !vsetvl_insn_p (rinsn))
> +  if (optimize > 0 && vsetvl_discard_result_insn_p (rinsn))
 
I didn't get this change, could you explan few more about that? it was
early exit for non vsetvl insn, but now it allowed that now?
 
>      return;
>    m_state = VALID;
>    extract_insn_cached (rinsn);
 
> @@ -2206,9 +2128,9 @@ vector_insn_info::fuse_mask_policy (const vector_insn_info &info1,
>
>  vector_insn_info
>  vector_insn_info::merge (const vector_insn_info &merge_info,
> -                        enum merge_type type) const
> +                        enum merge_type type, unsigned bb_index) const
>  {
> -  if (!vsetvl_insn_p (get_insn ()->rtl ()))
> +  if (!vsetvl_insn_p (get_insn ()->rtl ()) && *this != merge_info)
 
Why need this exception?
 
>      gcc_assert (this->compatible_p (merge_info)
>                 && "Can't merge incompatible demanded infos");
 
> @@ -2403,18 +2348,22 @@ vector_infos_manager::get_all_available_exprs (
>  }
>
>  bool
> -vector_infos_manager::all_empty_predecessor_p (const basic_block cfg_bb) const
> +vector_infos_manager::earliest_fusion_worthwhile_p (
> +  const basic_block cfg_bb) const
>  {
> -  hash_set<basic_block> pred_cfg_bbs = get_all_predecessors (cfg_bb);
> -  for (const basic_block pred_cfg_bb : pred_cfg_bbs)
> +  edge e;
> +  edge_iterator ei;
> +  profile_probability prob = profile_probability::uninitialized ();
> +  FOR_EACH_EDGE (e, ei, cfg_bb->succs)
>      {
> -      const auto &pred_block_info = vector_block_infos[pred_cfg_bb->index];
> -      if (!pred_block_info.local_dem.valid_or_dirty_p ()
> -         && !pred_block_info.reaching_out.valid_or_dirty_p ())
> +      if (prob == profile_probability::uninitialized ())
> +       prob = vector_block_infos[e->dest->index].probability;
> +      else if (prob == vector_block_infos[e->dest->index].probability)
>         continue;
> -      return false;
> +      else
> +       return true;
 
Make sure I understand this correctly: it's worth if thoe edges has
different probability?
 
>      }
> -  return true;
> +  return false;
 
If all probability is same, then it's not worth?
 
Plz add few comments no matter my understand is right or not :)
 
>  }
>
>  bool
 
> @@ -2428,12 +2377,12 @@ vector_infos_manager::all_same_ratio_p (sbitmap bitdata) const
>    sbitmap_iterator sbi;
>
>    EXECUTE_IF_SET_IN_BITMAP (bitdata, 0, bb_index, sbi)
> -  {
> -    if (ratio == -1)
> -      ratio = vector_exprs[bb_index]->get_ratio ();
> -    else if (vector_exprs[bb_index]->get_ratio () != ratio)
> -      return false;
> -  }
> +    {
> +      if (ratio == -1)
> +       ratio = vector_exprs[bb_index]->get_ratio ();
> +      else if (vector_exprs[bb_index]->get_ratio () != ratio)
> +       return false;
> +    }
>    return true;
>  }
 
Split this into a NFC patch, you can commit that without asking review.
 
> @@ -907,8 +893,8 @@ change_insn (function_info *ssa, insn_change change, insn_info *insn,
>                 ] UNSPEC_VPREDICATE)
>             (plus:RVVM4DI (reg/v:RVVM4DI 104 v8 [orig:137 op1 ] [137])
>                 (sign_extend:RVVM4DI (vec_duplicate:RVVM4SI (reg:SI 15 a5
> -    [140])))) (unspec:RVVM4DI [ (const_int 0 [0]) ] UNSPEC_VUNDEF))) "rvv.c":8:12
> -    2784 {pred_single_widen_addsvnx8di_scalar} (expr_list:REG_EQUIV
> +    [140])))) (unspec:RVVM4DI [ (const_int 0 [0]) ] UNSPEC_VUNDEF)))
> +    "rvv.c":8:12 2784 {pred_single_widen_addsvnx8di_scalar} (expr_list:REG_EQUIV
>      (mem/c:RVVM4DI (reg:DI 10 a0 [142]) [1 <retval>+0 S[64, 64] A128])
>         (expr_list:REG_EQUAL (if_then_else:RVVM4DI (unspec:RVVMF8BI [
>                         (const_vector:RVVMF8BI repeat [
 
Split this into a NFC patch, you can commit that without asking review.
 
 
> @@ -2777,6 +2770,17 @@ pass_vsetvl::update_vector_info (const insn_info *i,
>    m_vector_manager->vector_insn_infos[i->uid ()] = new_info;
>  }
>
> +void
> +pass_vsetvl::update_block_info (int index, profile_probability prob,
> +                               vector_insn_info new_info)
const vector_insn_info &new_info
 
 
> +{
> +  m_vector_manager->vector_block_infos[index].probability = prob;
> +  if (m_vector_manager->vector_block_infos[index].local_dem
> +      == m_vector_manager->vector_block_infos[index].reaching_out)
> +    m_vector_manager->vector_block_infos[index].local_dem = new_info;
> +  m_vector_manager->vector_block_infos[index].reaching_out = new_info;
> +}
> +
 
{
  auto &block_info = m_vector_manager->vector_block_infos[index];
  block_info.probability = prob;
  if (block_info.local_dem == block_info.reaching_out)
    block_info.local_dem = new_info;
  block_info.reaching_out = new_info;
}
 
>  /* Simple m_vsetvl_insert vsetvl for optimize == 0.  */
>  void
>  pass_vsetvl::simple_vsetvl (void) const
 
 
> +      for (insn_info *i = earliest_pred->end_insn ()->prev_nondebug_insn ();
> +          real_insn_and_same_bb_p (i, earliest_pred)
> +          && after_or_same_p (i, last_insn);
> +          i = i->prev_nondebug_insn ())
>         {
> +         if (!vl && find_access (i->defs (), REGNO (avl)))
> +           return false;
> +         if (vl && find_access (i->defs (), REGNO (vl)))
> +           return false;
> +         if (vl && find_access (i->uses (), REGNO (vl)))
> +           return false;
 
should we check `i->is_call () || i->is_asm ()`?
 
 
> @@ -3892,7 +3408,7 @@ pass_vsetvl::refine_vsetvls (void) const
>    basic_block cfg_bb;
>    FOR_EACH_BB_FN (cfg_bb, cfun)
>      {
> -      auto info = get_block_info(cfg_bb).local_dem;
> +      auto info = get_block_info (cfg_bb).local_dem;
>        insn_info *insn = info.get_insn ();
>        if (!info.valid_p ())
>         continue;
 
Split this into a NFC patch, you can commit that without asking review.
 
> @@ -3938,8 +3454,7 @@ pass_vsetvl::cleanup_vsetvls ()
>    basic_block cfg_bb;
>    FOR_EACH_BB_FN (cfg_bb, cfun)
>      {
> -      auto &info
> -       = get_block_info(cfg_bb).reaching_out;
> +      auto &info = get_block_info (cfg_bb).reaching_out;
>        gcc_assert (m_vector_manager->expr_set_num (
>                     m_vector_manager->vector_del[cfg_bb->index])
>                   <= 1);
 
Split this into a NFC patch, you can commit that without asking review.
 
> @@ -3951,9 +3466,7 @@ pass_vsetvl::cleanup_vsetvls ()
>                 info.set_unknown ();
>               else
>                 {
> -                 const auto dem
> -                   = get_block_info(cfg_bb)
> -                       .local_dem;
> +                 const auto dem = get_block_info (cfg_bb).local_dem;
>                   gcc_assert (dem == *m_vector_manager->vector_exprs[i]);
>                   insn_info *insn = dem.get_insn ();
>                   gcc_assert (insn && insn->rtl ());
 
Split this into a NFC patch, you can commit that without asking review.
 
> @@ -4020,33 +3543,10 @@ pass_vsetvl::commit_vsetvls (void)
>    for (const bb_info *bb : crtl->ssa->bbs ())
>      {
>        basic_block cfg_bb = bb->cfg_bb ();
> -      const auto reaching_out
> -       = get_block_info(cfg_bb).reaching_out;
> +      const auto reaching_out = get_block_info (cfg_bb).reaching_out;
 
Split this into a NFC patch, you can commit that without asking review.
 
> @@ -4263,7 +3783,8 @@ pass_vsetvl::local_eliminate_vsetvl_insn (const bb_info *bb) const
>
>        /* Local AVL compatibility checking is simpler than global, we only
>          need to check the REGNO is same.  */
> -      if (prev_dem.valid_or_dirty_p () && prev_dem.skip_avl_compatible_p (curr_dem)
> +      if (prev_dem.valid_or_dirty_p ()
> +         && prev_dem.skip_avl_compatible_p (curr_dem)
>           && local_avl_compatible_p (prev_avl, curr_avl))
>         {
>           /* curr_dem and prev_dem is compatible!  */
 
Split this into a NFC patch, you can commit that without asking review.
 
>@@ -4655,8 +4240,7 @@ pass_vsetvl::compute_probabilities (void)
>   for (const bb_info *bb : crtl->ssa->bbs ())
>     {
>       basic_block cfg_bb = bb->cfg_bb ();
>-      auto &curr_prob
>-       = get_block_info(cfg_bb).probability;
>+      auto &curr_prob = get_block_info (cfg_bb).probability;
>
>       /* GCC assume entry block (bb 0) are always so
>         executed so set its probability as "always".  */
 
Split this into a NFC patch, you can commit that without asking review.
 
> @@ -4669,8 +4253,7 @@ pass_vsetvl::compute_probabilities (void)
>        gcc_assert (curr_prob.initialized_p ());
>        FOR_EACH_EDGE (e, ei, cfg_bb->succs)
>         {
> -         auto &new_prob
> -           = get_block_info(e->dest).probability;
> +         auto &new_prob = get_block_info (e->dest).probability;
>           if (!new_prob.initialized_p ())
>             new_prob = curr_prob * e->probability;
>           else if (new_prob == profile_probability::always ())
 
 
Split this into a NFC patch, you can commit that without asking review.
 
 
> @@ -4298,7 +3819,8 @@ pass_vsetvl::local_eliminate_vsetvl_insn (const bb_info *bb) const
>     none exists or if a user RVV instruction is enountered
>     prior to any vsetvl.  */
>  static rtx_insn *
> -get_first_vsetvl_before_rvv_insns (basic_block cfg_bb)
> +get_first_vsetvl_before_rvv_insns (basic_block cfg_bb,
> +                                  enum vsetvl_type insn_type)
>  {
 
add gcc_assert (insn_type == VSETVL_DISCARD_RESULT || insn_type ==
VSETVL_VTYPE_CHANGE_ONLY).
 
>    rtx_insn *rinsn;
>    FOR_BB_INSNS (cfg_bb, rinsn)
> @@ -4310,7 +3832,11 @@ get_first_vsetvl_before_rvv_insns (basic_block cfg_bb)
>        if (has_vtype_op (rinsn) || vsetvl_insn_p (rinsn))
>         return nullptr;
>
> -      if (vsetvl_discard_result_insn_p (rinsn))
> +      if (insn_type == VSETVL_DISCARD_RESULT
> +         && vsetvl_discard_result_insn_p (rinsn))
> +       return rinsn;
> +      if (insn_type == VSETVL_VTYPE_CHANGE_ONLY
> +         && vsetvl_vtype_change_only_p (rinsn))
>         return rinsn;
>      }
>    return nullptr;
 
 
> diff --git a/gcc/config/riscv/riscv-vsetvl.def b/gcc/config/riscv/riscv-vsetvl.def
> index 7a73149f1da..7289c01efcf 100644
> --- a/gcc/config/riscv/riscv-vsetvl.def
> +++ b/gcc/config/riscv/riscv-vsetvl.def
> @@ -319,7 +319,7 @@ DEF_SEW_LMUL_FUSE_RULE (/*SEW*/ DEMAND_TRUE, /*LMUL*/ DEMAND_FALSE,
>                         /*RATIO*/ DEMAND_TRUE, /*GE_SEW*/ DEMAND_FALSE,
>                         /*NEW_DEMAND_SEW*/ true,
>                         /*NEW_DEMAND_LMUL*/ false,
> -                       /*NEW_DEMAND_RATIO*/ false,
> +                       /*NEW_DEMAND_RATIO*/ true,
 
This seems relax the compatiblitly check to allow optimize more case,
if so this should be a sperated patch.
 
>                         /*NEW_DEMAND_GE_SEW*/ true, first_sew,
>                         vlmul_for_first_sew_second_ratio, second_ratio)
>  DEF_SEW_LMUL_FUSE_RULE (/*SEW*/ DEMAND_TRUE, /*LMUL*/ DEMAND_FALSE,
 
 
> @@ -386,7 +337,8 @@ public:
>    bool compatible_avl_p (const avl_info &) const;
>    bool compatible_vtype_p (const vl_vtype_info &) const;
>    bool compatible_p (const vl_vtype_info &) const;
> -  vector_insn_info merge (const vector_insn_info &, enum merge_type) const;
> +  vector_insn_info merge (const vector_insn_info &, enum merge_type,
> +                         unsigned = 0) const;
 
it seems weired to set bb_index as 0 by default?
 
>
>    rtl_ssa::insn_info *get_insn () const { return m_insn; }
>    const bool *get_demands (void) const { return m_demands; }
 
 
> diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
> index 1252d6f851a..f3ce66ccdd4 100644
> --- a/gcc/config/riscv/t-riscv
> +++ b/gcc/config/riscv/t-riscv
> @@ -62,7 +62,8 @@ riscv-vsetvl.o: $(srcdir)/config/riscv/riscv-vsetvl.cc \
>    $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(REGS_H) \
>    $(TARGET_H) tree-pass.h df.h rtl-ssa.h cfgcleanup.h insn-config.h \
>    insn-attr.h insn-opinit.h tm-constrs.h cfgrtl.h cfganal.h lcm.h \
> -  predict.h profile-count.h $(srcdir)/config/riscv/riscv-vsetvl.h
> +  predict.h profile-count.h $(srcdir)/config/riscv/riscv-vsetvl.h \
> +  $(srcdir)/config/riscv/riscv-vsetvl.def
 
This should be a seperate fix and backport to GCC 13 as well,
pre-approve for both master and GCC-13 branch for this fix.
 
>         $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
>                 $(srcdir)/config/riscv/riscv-vsetvl.cc
juzhe.zhong@rivai.ai Aug. 23, 2023, 1:27 a.m. UTC | #5
>> This seems relax the compatiblitly check to allow optimize more case,
>> if so this should be a sperated patch.
This is not a optimization fix, It's an bug fix.

Since fusion for these 2 demands:
1. demand SEW and GE_SEW (meaning demand a SEW larger than a specific SEW).
2. demand SEW and GE_SEW (meaning demand a SEW larger than a specific SEW) and demand RATIO.

The new fusion demand should include RATIO demand but it didn't before. It's an bug.
It's lucky that previous tests didn't expose such bug before refactor.
But such bug is exposed after refactor.

I committed it with a separate patch.
Thanks.


juzhe.zhong@rivai.ai
 
From: Kito Cheng
Date: 2023-08-22 23:35
To: Kito Cheng
CC: Robin Dapp; Juzhe-Zhong; GCC Patches; Jeff Law
Subject: Re: [PATCH] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS
It's really great improvement, it's drop some state like HARD_EMPTY
and DIRTY_WITH_KILLED_AVL which make this algorithm more easy to
understand!
also this also fundamentally improved the phase 3, although one
concern is the time complexity might be come more higher order,
(and it's already high enough in fact.)
but mostly those vectorized code are only appeard within the inner
most loop, so that is acceptable in generally
 
So I will try my best to review this closely to make it more close to
the perfect :)
 
I saw you has update serveral testcase, why update instead of add new testcase??
could you say more about why some testcase added __riscv_vadd_vv_i8mf8
or add some more dependency of vl variable?
 
 
 
> @@ -1423,8 +1409,13 @@ static bool
>  ge_sew_ratio_unavailable_p (const vector_insn_info &info1,
>                             const vector_insn_info &info2)
>  {
> -  if (!info2.demand_p (DEMAND_LMUL) && info2.demand_p (DEMAND_GE_SEW))
> -    return info1.get_sew () < info2.get_sew ();
> +  if (!info2.demand_p (DEMAND_LMUL))
> +    {
> +      if (info2.demand_p (DEMAND_GE_SEW))
> +       return info1.get_sew () < info2.get_sew ();
> +      else if (!info2.demand_p (DEMAND_SEW))
> +       return false;
> +    }
 
This seems relax the compatiblitly check to allow optimize more case,
if so this should be a sperated patch.
 
>    return true;
>  }
 
 
> @@ -1815,7 +1737,7 @@ vector_insn_info::parse_insn (rtx_insn *rinsn)
>      return;
>    if (optimize == 0 && !has_vtype_op (rinsn))
>      return;
> -  if (optimize > 0 && !vsetvl_insn_p (rinsn))
> +  if (optimize > 0 && vsetvl_discard_result_insn_p (rinsn))
 
I didn't get this change, could you explan few more about that? it was
early exit for non vsetvl insn, but now it allowed that now?
 
>      return;
>    m_state = VALID;
>    extract_insn_cached (rinsn);
 
> @@ -2206,9 +2128,9 @@ vector_insn_info::fuse_mask_policy (const vector_insn_info &info1,
>
>  vector_insn_info
>  vector_insn_info::merge (const vector_insn_info &merge_info,
> -                        enum merge_type type) const
> +                        enum merge_type type, unsigned bb_index) const
>  {
> -  if (!vsetvl_insn_p (get_insn ()->rtl ()))
> +  if (!vsetvl_insn_p (get_insn ()->rtl ()) && *this != merge_info)
 
Why need this exception?
 
>      gcc_assert (this->compatible_p (merge_info)
>                 && "Can't merge incompatible demanded infos");
 
> @@ -2403,18 +2348,22 @@ vector_infos_manager::get_all_available_exprs (
>  }
>
>  bool
> -vector_infos_manager::all_empty_predecessor_p (const basic_block cfg_bb) const
> +vector_infos_manager::earliest_fusion_worthwhile_p (
> +  const basic_block cfg_bb) const
>  {
> -  hash_set<basic_block> pred_cfg_bbs = get_all_predecessors (cfg_bb);
> -  for (const basic_block pred_cfg_bb : pred_cfg_bbs)
> +  edge e;
> +  edge_iterator ei;
> +  profile_probability prob = profile_probability::uninitialized ();
> +  FOR_EACH_EDGE (e, ei, cfg_bb->succs)
>      {
> -      const auto &pred_block_info = vector_block_infos[pred_cfg_bb->index];
> -      if (!pred_block_info.local_dem.valid_or_dirty_p ()
> -         && !pred_block_info.reaching_out.valid_or_dirty_p ())
> +      if (prob == profile_probability::uninitialized ())
> +       prob = vector_block_infos[e->dest->index].probability;
> +      else if (prob == vector_block_infos[e->dest->index].probability)
>         continue;
> -      return false;
> +      else
> +       return true;
 
Make sure I understand this correctly: it's worth if thoe edges has
different probability?
 
>      }
> -  return true;
> +  return false;
 
If all probability is same, then it's not worth?
 
Plz add few comments no matter my understand is right or not :)
 
>  }
>
>  bool
 
> @@ -2428,12 +2377,12 @@ vector_infos_manager::all_same_ratio_p (sbitmap bitdata) const
>    sbitmap_iterator sbi;
>
>    EXECUTE_IF_SET_IN_BITMAP (bitdata, 0, bb_index, sbi)
> -  {
> -    if (ratio == -1)
> -      ratio = vector_exprs[bb_index]->get_ratio ();
> -    else if (vector_exprs[bb_index]->get_ratio () != ratio)
> -      return false;
> -  }
> +    {
> +      if (ratio == -1)
> +       ratio = vector_exprs[bb_index]->get_ratio ();
> +      else if (vector_exprs[bb_index]->get_ratio () != ratio)
> +       return false;
> +    }
>    return true;
>  }
 
Split this into a NFC patch, you can commit that without asking review.
 
> @@ -907,8 +893,8 @@ change_insn (function_info *ssa, insn_change change, insn_info *insn,
>                 ] UNSPEC_VPREDICATE)
>             (plus:RVVM4DI (reg/v:RVVM4DI 104 v8 [orig:137 op1 ] [137])
>                 (sign_extend:RVVM4DI (vec_duplicate:RVVM4SI (reg:SI 15 a5
> -    [140])))) (unspec:RVVM4DI [ (const_int 0 [0]) ] UNSPEC_VUNDEF))) "rvv.c":8:12
> -    2784 {pred_single_widen_addsvnx8di_scalar} (expr_list:REG_EQUIV
> +    [140])))) (unspec:RVVM4DI [ (const_int 0 [0]) ] UNSPEC_VUNDEF)))
> +    "rvv.c":8:12 2784 {pred_single_widen_addsvnx8di_scalar} (expr_list:REG_EQUIV
>      (mem/c:RVVM4DI (reg:DI 10 a0 [142]) [1 <retval>+0 S[64, 64] A128])
>         (expr_list:REG_EQUAL (if_then_else:RVVM4DI (unspec:RVVMF8BI [
>                         (const_vector:RVVMF8BI repeat [
 
Split this into a NFC patch, you can commit that without asking review.
 
 
> @@ -2777,6 +2770,17 @@ pass_vsetvl::update_vector_info (const insn_info *i,
>    m_vector_manager->vector_insn_infos[i->uid ()] = new_info;
>  }
>
> +void
> +pass_vsetvl::update_block_info (int index, profile_probability prob,
> +                               vector_insn_info new_info)
const vector_insn_info &new_info
 
 
> +{
> +  m_vector_manager->vector_block_infos[index].probability = prob;
> +  if (m_vector_manager->vector_block_infos[index].local_dem
> +      == m_vector_manager->vector_block_infos[index].reaching_out)
> +    m_vector_manager->vector_block_infos[index].local_dem = new_info;
> +  m_vector_manager->vector_block_infos[index].reaching_out = new_info;
> +}
> +
 
{
  auto &block_info = m_vector_manager->vector_block_infos[index];
  block_info.probability = prob;
  if (block_info.local_dem == block_info.reaching_out)
    block_info.local_dem = new_info;
  block_info.reaching_out = new_info;
}
 
>  /* Simple m_vsetvl_insert vsetvl for optimize == 0.  */
>  void
>  pass_vsetvl::simple_vsetvl (void) const
 
 
> +      for (insn_info *i = earliest_pred->end_insn ()->prev_nondebug_insn ();
> +          real_insn_and_same_bb_p (i, earliest_pred)
> +          && after_or_same_p (i, last_insn);
> +          i = i->prev_nondebug_insn ())
>         {
> +         if (!vl && find_access (i->defs (), REGNO (avl)))
> +           return false;
> +         if (vl && find_access (i->defs (), REGNO (vl)))
> +           return false;
> +         if (vl && find_access (i->uses (), REGNO (vl)))
> +           return false;
 
should we check `i->is_call () || i->is_asm ()`?
 
 
> @@ -3892,7 +3408,7 @@ pass_vsetvl::refine_vsetvls (void) const
>    basic_block cfg_bb;
>    FOR_EACH_BB_FN (cfg_bb, cfun)
>      {
> -      auto info = get_block_info(cfg_bb).local_dem;
> +      auto info = get_block_info (cfg_bb).local_dem;
>        insn_info *insn = info.get_insn ();
>        if (!info.valid_p ())
>         continue;
 
Split this into a NFC patch, you can commit that without asking review.
 
> @@ -3938,8 +3454,7 @@ pass_vsetvl::cleanup_vsetvls ()
>    basic_block cfg_bb;
>    FOR_EACH_BB_FN (cfg_bb, cfun)
>      {
> -      auto &info
> -       = get_block_info(cfg_bb).reaching_out;
> +      auto &info = get_block_info (cfg_bb).reaching_out;
>        gcc_assert (m_vector_manager->expr_set_num (
>                     m_vector_manager->vector_del[cfg_bb->index])
>                   <= 1);
 
Split this into a NFC patch, you can commit that without asking review.
 
> @@ -3951,9 +3466,7 @@ pass_vsetvl::cleanup_vsetvls ()
>                 info.set_unknown ();
>               else
>                 {
> -                 const auto dem
> -                   = get_block_info(cfg_bb)
> -                       .local_dem;
> +                 const auto dem = get_block_info (cfg_bb).local_dem;
>                   gcc_assert (dem == *m_vector_manager->vector_exprs[i]);
>                   insn_info *insn = dem.get_insn ();
>                   gcc_assert (insn && insn->rtl ());
 
Split this into a NFC patch, you can commit that without asking review.
 
> @@ -4020,33 +3543,10 @@ pass_vsetvl::commit_vsetvls (void)
>    for (const bb_info *bb : crtl->ssa->bbs ())
>      {
>        basic_block cfg_bb = bb->cfg_bb ();
> -      const auto reaching_out
> -       = get_block_info(cfg_bb).reaching_out;
> +      const auto reaching_out = get_block_info (cfg_bb).reaching_out;
 
Split this into a NFC patch, you can commit that without asking review.
 
> @@ -4263,7 +3783,8 @@ pass_vsetvl::local_eliminate_vsetvl_insn (const bb_info *bb) const
>
>        /* Local AVL compatibility checking is simpler than global, we only
>          need to check the REGNO is same.  */
> -      if (prev_dem.valid_or_dirty_p () && prev_dem.skip_avl_compatible_p (curr_dem)
> +      if (prev_dem.valid_or_dirty_p ()
> +         && prev_dem.skip_avl_compatible_p (curr_dem)
>           && local_avl_compatible_p (prev_avl, curr_avl))
>         {
>           /* curr_dem and prev_dem is compatible!  */
 
Split this into a NFC patch, you can commit that without asking review.
 
>@@ -4655,8 +4240,7 @@ pass_vsetvl::compute_probabilities (void)
>   for (const bb_info *bb : crtl->ssa->bbs ())
>     {
>       basic_block cfg_bb = bb->cfg_bb ();
>-      auto &curr_prob
>-       = get_block_info(cfg_bb).probability;
>+      auto &curr_prob = get_block_info (cfg_bb).probability;
>
>       /* GCC assume entry block (bb 0) are always so
>         executed so set its probability as "always".  */
 
Split this into a NFC patch, you can commit that without asking review.
 
> @@ -4669,8 +4253,7 @@ pass_vsetvl::compute_probabilities (void)
>        gcc_assert (curr_prob.initialized_p ());
>        FOR_EACH_EDGE (e, ei, cfg_bb->succs)
>         {
> -         auto &new_prob
> -           = get_block_info(e->dest).probability;
> +         auto &new_prob = get_block_info (e->dest).probability;
>           if (!new_prob.initialized_p ())
>             new_prob = curr_prob * e->probability;
>           else if (new_prob == profile_probability::always ())
 
 
Split this into a NFC patch, you can commit that without asking review.
 
 
> @@ -4298,7 +3819,8 @@ pass_vsetvl::local_eliminate_vsetvl_insn (const bb_info *bb) const
>     none exists or if a user RVV instruction is enountered
>     prior to any vsetvl.  */
>  static rtx_insn *
> -get_first_vsetvl_before_rvv_insns (basic_block cfg_bb)
> +get_first_vsetvl_before_rvv_insns (basic_block cfg_bb,
> +                                  enum vsetvl_type insn_type)
>  {
 
add gcc_assert (insn_type == VSETVL_DISCARD_RESULT || insn_type ==
VSETVL_VTYPE_CHANGE_ONLY).
 
>    rtx_insn *rinsn;
>    FOR_BB_INSNS (cfg_bb, rinsn)
> @@ -4310,7 +3832,11 @@ get_first_vsetvl_before_rvv_insns (basic_block cfg_bb)
>        if (has_vtype_op (rinsn) || vsetvl_insn_p (rinsn))
>         return nullptr;
>
> -      if (vsetvl_discard_result_insn_p (rinsn))
> +      if (insn_type == VSETVL_DISCARD_RESULT
> +         && vsetvl_discard_result_insn_p (rinsn))
> +       return rinsn;
> +      if (insn_type == VSETVL_VTYPE_CHANGE_ONLY
> +         && vsetvl_vtype_change_only_p (rinsn))
>         return rinsn;
>      }
>    return nullptr;
 
 
> diff --git a/gcc/config/riscv/riscv-vsetvl.def b/gcc/config/riscv/riscv-vsetvl.def
> index 7a73149f1da..7289c01efcf 100644
> --- a/gcc/config/riscv/riscv-vsetvl.def
> +++ b/gcc/config/riscv/riscv-vsetvl.def
> @@ -319,7 +319,7 @@ DEF_SEW_LMUL_FUSE_RULE (/*SEW*/ DEMAND_TRUE, /*LMUL*/ DEMAND_FALSE,
>                         /*RATIO*/ DEMAND_TRUE, /*GE_SEW*/ DEMAND_FALSE,
>                         /*NEW_DEMAND_SEW*/ true,
>                         /*NEW_DEMAND_LMUL*/ false,
> -                       /*NEW_DEMAND_RATIO*/ false,
> +                       /*NEW_DEMAND_RATIO*/ true,
 
This seems relax the compatiblitly check to allow optimize more case,
if so this should be a sperated patch.
 
>                         /*NEW_DEMAND_GE_SEW*/ true, first_sew,
>                         vlmul_for_first_sew_second_ratio, second_ratio)
>  DEF_SEW_LMUL_FUSE_RULE (/*SEW*/ DEMAND_TRUE, /*LMUL*/ DEMAND_FALSE,
 
 
> @@ -386,7 +337,8 @@ public:
>    bool compatible_avl_p (const avl_info &) const;
>    bool compatible_vtype_p (const vl_vtype_info &) const;
>    bool compatible_p (const vl_vtype_info &) const;
> -  vector_insn_info merge (const vector_insn_info &, enum merge_type) const;
> +  vector_insn_info merge (const vector_insn_info &, enum merge_type,
> +                         unsigned = 0) const;
 
it seems weired to set bb_index as 0 by default?
 
>
>    rtl_ssa::insn_info *get_insn () const { return m_insn; }
>    const bool *get_demands (void) const { return m_demands; }
 
 
> diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
> index 1252d6f851a..f3ce66ccdd4 100644
> --- a/gcc/config/riscv/t-riscv
> +++ b/gcc/config/riscv/t-riscv
> @@ -62,7 +62,8 @@ riscv-vsetvl.o: $(srcdir)/config/riscv/riscv-vsetvl.cc \
>    $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(REGS_H) \
>    $(TARGET_H) tree-pass.h df.h rtl-ssa.h cfgcleanup.h insn-config.h \
>    insn-attr.h insn-opinit.h tm-constrs.h cfgrtl.h cfganal.h lcm.h \
> -  predict.h profile-count.h $(srcdir)/config/riscv/riscv-vsetvl.h
> +  predict.h profile-count.h $(srcdir)/config/riscv/riscv-vsetvl.h \
> +  $(srcdir)/config/riscv/riscv-vsetvl.def
 
This should be a seperate fix and backport to GCC 13 as well,
pre-approve for both master and GCC-13 branch for this fix.
 
>         $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
>                 $(srcdir)/config/riscv/riscv-vsetvl.cc
juzhe.zhong@rivai.ai Aug. 23, 2023, 12:34 p.m. UTC | #6
I have reorder the functions so that we won't mess up deleted functions and new functions.
V2 patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628237.html 

>> Why need this exception?

Because we have this piece code here for fusion in "EMPTY" block:
new_info = expr.merge (expr, GLOBAL_MERGE, eg->src->index);
The expr may not have a reall avl source which is considered as incompatible.
However, in this case, we should skip the compatible check, just use merge to compute demand info.

>>Make sure I understand this correctly: it's worth if thoe edges has
>>different probability?
>>If all probability is same, then it's not worth?
The probability is supposed to help for picking the optimal VSETVL info for incompatible demand infos.

Consider this following case:

void f (int32_t * restrict in, int32_t * restrict out, size_t n, size_t cond, size_t cond2)
{
  for (size_t i = 0; i < n; i++)
    {
      if (i== cond) {
        vint8mf8_t v = *(vint8mf8_t*)(in + i + 100);
        *(vint8mf8_t*)(out + i + 100) = v;
      } else {
        vbool1_t v = *(vbool1_t*)(in + i + 400);
        *(vbool1_t*)(out + i + 400) = v;
      }
    }
}

Both VSETVLs are incompatible since one want e8mf8, the other wants e8m8.
For if (i == cond) is very low probability (It could only be accessed 0 times or once)
We want to hoist the e8m8 to get optimal codegen like this:
f:
beq a2,zero,.L10
addi a0,a0,1600
addi a1,a1,1600
li a5,0
vsetvli a4,zero,e8,m8,ta,ma
.L5:
beq a3,a5,.L12
vlm.v v1,0(a0)
vsm.v v1,0(a1)
.L4:
addi a5,a5,1
addi a0,a0,4
addi a1,a1,4
bne a2,a5,.L5
.L10:
ret
.L12:
vsetvli a7,zero,e8,mf8,ta,ma
addi a6,a1,-1200
addi t1,a0,-1200
vle8.v v1,0(t1)
vse8.v v1,0(a6)
vsetvli a4,zero,e8,m8,ta,ma
j .L4


Wheras the other case is like this:
void f (int32_t * restrict in, int32_t * restrict out, size_t n, size_t cond, size_t cond2)
{
  for (size_t i = 0; i < n; i++)
    {
      if (i > cond) {
        vint8mf8_t v = *(vint8mf8_t*)(in + i + 100);
        *(vint8mf8_t*)(out + i + 100) = v;
      } else {
        vbool1_t v = *(vbool1_t*)(in + i + 400);
        *(vbool1_t*)(out + i + 400) = v;
      }
    }
}

Both condition probabilities are equal, so we don't want to take any of them as higher priority,
so the codegen should be:
f:
beq a2,zero,.L10
addi a0,a0,1600
addi a1,a1,1600
li a5,0
j .L5
.L12:
vsetvli a7,zero,e8,mf8,ta,ma
addi a5,a5,1
vle8.v v1,0(a6)
vse8.v v1,0(a4)
addi a0,a0,4
addi a1,a1,4
beq a2,a5,.L10
.L5:
addi a4,a1,-1200
addi a6,a0,-1200
bltu a3,a5,.L12
vsetvli t1,zero,e8,m8,ta,ma
addi a5,a5,1
vlm.v v1,0(a0)
vsm.v v1,0(a1)
addi a0,a0,4
addi a1,a1,4
bne a2,a5,.L5
.L10:
ret



juzhe.zhong@rivai.ai
 
From: Kito Cheng
Date: 2023-08-22 23:35
To: Kito Cheng
CC: Robin Dapp; Juzhe-Zhong; GCC Patches; Jeff Law
Subject: Re: [PATCH] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS
It's really great improvement, it's drop some state like HARD_EMPTY
and DIRTY_WITH_KILLED_AVL which make this algorithm more easy to
understand!
also this also fundamentally improved the phase 3, although one
concern is the time complexity might be come more higher order,
(and it's already high enough in fact.)
but mostly those vectorized code are only appeard within the inner
most loop, so that is acceptable in generally
 
So I will try my best to review this closely to make it more close to
the perfect :)
 
I saw you has update serveral testcase, why update instead of add new testcase??
could you say more about why some testcase added __riscv_vadd_vv_i8mf8
or add some more dependency of vl variable?
 
 
 
> @@ -1423,8 +1409,13 @@ static bool
>  ge_sew_ratio_unavailable_p (const vector_insn_info &info1,
>                             const vector_insn_info &info2)
>  {
> -  if (!info2.demand_p (DEMAND_LMUL) && info2.demand_p (DEMAND_GE_SEW))
> -    return info1.get_sew () < info2.get_sew ();
> +  if (!info2.demand_p (DEMAND_LMUL))
> +    {
> +      if (info2.demand_p (DEMAND_GE_SEW))
> +       return info1.get_sew () < info2.get_sew ();
> +      else if (!info2.demand_p (DEMAND_SEW))
> +       return false;
> +    }
 
This seems relax the compatiblitly check to allow optimize more case,
if so this should be a sperated patch.
 
>    return true;
>  }
 
 
> @@ -1815,7 +1737,7 @@ vector_insn_info::parse_insn (rtx_insn *rinsn)
>      return;
>    if (optimize == 0 && !has_vtype_op (rinsn))
>      return;
> -  if (optimize > 0 && !vsetvl_insn_p (rinsn))
> +  if (optimize > 0 && vsetvl_discard_result_insn_p (rinsn))
 
I didn't get this change, could you explan few more about that? it was
early exit for non vsetvl insn, but now it allowed that now?
 
>      return;
>    m_state = VALID;
>    extract_insn_cached (rinsn);
 
> @@ -2206,9 +2128,9 @@ vector_insn_info::fuse_mask_policy (const vector_insn_info &info1,
>
>  vector_insn_info
>  vector_insn_info::merge (const vector_insn_info &merge_info,
> -                        enum merge_type type) const
> +                        enum merge_type type, unsigned bb_index) const
>  {
> -  if (!vsetvl_insn_p (get_insn ()->rtl ()))
> +  if (!vsetvl_insn_p (get_insn ()->rtl ()) && *this != merge_info)
 
Why need this exception?
 
>      gcc_assert (this->compatible_p (merge_info)
>                 && "Can't merge incompatible demanded infos");
 
> @@ -2403,18 +2348,22 @@ vector_infos_manager::get_all_available_exprs (
>  }
>
>  bool
> -vector_infos_manager::all_empty_predecessor_p (const basic_block cfg_bb) const
> +vector_infos_manager::earliest_fusion_worthwhile_p (
> +  const basic_block cfg_bb) const
>  {
> -  hash_set<basic_block> pred_cfg_bbs = get_all_predecessors (cfg_bb);
> -  for (const basic_block pred_cfg_bb : pred_cfg_bbs)
> +  edge e;
> +  edge_iterator ei;
> +  profile_probability prob = profile_probability::uninitialized ();
> +  FOR_EACH_EDGE (e, ei, cfg_bb->succs)
>      {
> -      const auto &pred_block_info = vector_block_infos[pred_cfg_bb->index];
> -      if (!pred_block_info.local_dem.valid_or_dirty_p ()
> -         && !pred_block_info.reaching_out.valid_or_dirty_p ())
> +      if (prob == profile_probability::uninitialized ())
> +       prob = vector_block_infos[e->dest->index].probability;
> +      else if (prob == vector_block_infos[e->dest->index].probability)
>         continue;
> -      return false;
> +      else
> +       return true;
 
Make sure I understand this correctly: it's worth if thoe edges has
different probability?
 
>      }
> -  return true;
> +  return false;
 
If all probability is same, then it's not worth?
 
Plz add few comments no matter my understand is right or not :)
 
>  }
>
>  bool
 
> @@ -2428,12 +2377,12 @@ vector_infos_manager::all_same_ratio_p (sbitmap bitdata) const
>    sbitmap_iterator sbi;
>
>    EXECUTE_IF_SET_IN_BITMAP (bitdata, 0, bb_index, sbi)
> -  {
> -    if (ratio == -1)
> -      ratio = vector_exprs[bb_index]->get_ratio ();
> -    else if (vector_exprs[bb_index]->get_ratio () != ratio)
> -      return false;
> -  }
> +    {
> +      if (ratio == -1)
> +       ratio = vector_exprs[bb_index]->get_ratio ();
> +      else if (vector_exprs[bb_index]->get_ratio () != ratio)
> +       return false;
> +    }
>    return true;
>  }
 
Split this into a NFC patch, you can commit that without asking review.
 
> @@ -907,8 +893,8 @@ change_insn (function_info *ssa, insn_change change, insn_info *insn,
>                 ] UNSPEC_VPREDICATE)
>             (plus:RVVM4DI (reg/v:RVVM4DI 104 v8 [orig:137 op1 ] [137])
>                 (sign_extend:RVVM4DI (vec_duplicate:RVVM4SI (reg:SI 15 a5
> -    [140])))) (unspec:RVVM4DI [ (const_int 0 [0]) ] UNSPEC_VUNDEF))) "rvv.c":8:12
> -    2784 {pred_single_widen_addsvnx8di_scalar} (expr_list:REG_EQUIV
> +    [140])))) (unspec:RVVM4DI [ (const_int 0 [0]) ] UNSPEC_VUNDEF)))
> +    "rvv.c":8:12 2784 {pred_single_widen_addsvnx8di_scalar} (expr_list:REG_EQUIV
>      (mem/c:RVVM4DI (reg:DI 10 a0 [142]) [1 <retval>+0 S[64, 64] A128])
>         (expr_list:REG_EQUAL (if_then_else:RVVM4DI (unspec:RVVMF8BI [
>                         (const_vector:RVVMF8BI repeat [
 
Split this into a NFC patch, you can commit that without asking review.
 
 
> @@ -2777,6 +2770,17 @@ pass_vsetvl::update_vector_info (const insn_info *i,
>    m_vector_manager->vector_insn_infos[i->uid ()] = new_info;
>  }
>
> +void
> +pass_vsetvl::update_block_info (int index, profile_probability prob,
> +                               vector_insn_info new_info)
const vector_insn_info &new_info
 
 
> +{
> +  m_vector_manager->vector_block_infos[index].probability = prob;
> +  if (m_vector_manager->vector_block_infos[index].local_dem
> +      == m_vector_manager->vector_block_infos[index].reaching_out)
> +    m_vector_manager->vector_block_infos[index].local_dem = new_info;
> +  m_vector_manager->vector_block_infos[index].reaching_out = new_info;
> +}
> +
 
{
  auto &block_info = m_vector_manager->vector_block_infos[index];
  block_info.probability = prob;
  if (block_info.local_dem == block_info.reaching_out)
    block_info.local_dem = new_info;
  block_info.reaching_out = new_info;
}
 
>  /* Simple m_vsetvl_insert vsetvl for optimize == 0.  */
>  void
>  pass_vsetvl::simple_vsetvl (void) const
 
 
> +      for (insn_info *i = earliest_pred->end_insn ()->prev_nondebug_insn ();
> +          real_insn_and_same_bb_p (i, earliest_pred)
> +          && after_or_same_p (i, last_insn);
> +          i = i->prev_nondebug_insn ())
>         {
> +         if (!vl && find_access (i->defs (), REGNO (avl)))
> +           return false;
> +         if (vl && find_access (i->defs (), REGNO (vl)))
> +           return false;
> +         if (vl && find_access (i->uses (), REGNO (vl)))
> +           return false;
 
should we check `i->is_call () || i->is_asm ()`?
 
 
> @@ -3892,7 +3408,7 @@ pass_vsetvl::refine_vsetvls (void) const
>    basic_block cfg_bb;
>    FOR_EACH_BB_FN (cfg_bb, cfun)
>      {
> -      auto info = get_block_info(cfg_bb).local_dem;
> +      auto info = get_block_info (cfg_bb).local_dem;
>        insn_info *insn = info.get_insn ();
>        if (!info.valid_p ())
>         continue;
 
Split this into a NFC patch, you can commit that without asking review.
 
> @@ -3938,8 +3454,7 @@ pass_vsetvl::cleanup_vsetvls ()
>    basic_block cfg_bb;
>    FOR_EACH_BB_FN (cfg_bb, cfun)
>      {
> -      auto &info
> -       = get_block_info(cfg_bb).reaching_out;
> +      auto &info = get_block_info (cfg_bb).reaching_out;
>        gcc_assert (m_vector_manager->expr_set_num (
>                     m_vector_manager->vector_del[cfg_bb->index])
>                   <= 1);
 
Split this into a NFC patch, you can commit that without asking review.
 
> @@ -3951,9 +3466,7 @@ pass_vsetvl::cleanup_vsetvls ()
>                 info.set_unknown ();
>               else
>                 {
> -                 const auto dem
> -                   = get_block_info(cfg_bb)
> -                       .local_dem;
> +                 const auto dem = get_block_info (cfg_bb).local_dem;
>                   gcc_assert (dem == *m_vector_manager->vector_exprs[i]);
>                   insn_info *insn = dem.get_insn ();
>                   gcc_assert (insn && insn->rtl ());
 
Split this into a NFC patch, you can commit that without asking review.
 
> @@ -4020,33 +3543,10 @@ pass_vsetvl::commit_vsetvls (void)
>    for (const bb_info *bb : crtl->ssa->bbs ())
>      {
>        basic_block cfg_bb = bb->cfg_bb ();
> -      const auto reaching_out
> -       = get_block_info(cfg_bb).reaching_out;
> +      const auto reaching_out = get_block_info (cfg_bb).reaching_out;
 
Split this into a NFC patch, you can commit that without asking review.
 
> @@ -4263,7 +3783,8 @@ pass_vsetvl::local_eliminate_vsetvl_insn (const bb_info *bb) const
>
>        /* Local AVL compatibility checking is simpler than global, we only
>          need to check the REGNO is same.  */
> -      if (prev_dem.valid_or_dirty_p () && prev_dem.skip_avl_compatible_p (curr_dem)
> +      if (prev_dem.valid_or_dirty_p ()
> +         && prev_dem.skip_avl_compatible_p (curr_dem)
>           && local_avl_compatible_p (prev_avl, curr_avl))
>         {
>           /* curr_dem and prev_dem is compatible!  */
 
Split this into a NFC patch, you can commit that without asking review.
 
>@@ -4655,8 +4240,7 @@ pass_vsetvl::compute_probabilities (void)
>   for (const bb_info *bb : crtl->ssa->bbs ())
>     {
>       basic_block cfg_bb = bb->cfg_bb ();
>-      auto &curr_prob
>-       = get_block_info(cfg_bb).probability;
>+      auto &curr_prob = get_block_info (cfg_bb).probability;
>
>       /* GCC assume entry block (bb 0) are always so
>         executed so set its probability as "always".  */
 
Split this into a NFC patch, you can commit that without asking review.
 
> @@ -4669,8 +4253,7 @@ pass_vsetvl::compute_probabilities (void)
>        gcc_assert (curr_prob.initialized_p ());
>        FOR_EACH_EDGE (e, ei, cfg_bb->succs)
>         {
> -         auto &new_prob
> -           = get_block_info(e->dest).probability;
> +         auto &new_prob = get_block_info (e->dest).probability;
>           if (!new_prob.initialized_p ())
>             new_prob = curr_prob * e->probability;
>           else if (new_prob == profile_probability::always ())
 
 
Split this into a NFC patch, you can commit that without asking review.
 
 
> @@ -4298,7 +3819,8 @@ pass_vsetvl::local_eliminate_vsetvl_insn (const bb_info *bb) const
>     none exists or if a user RVV instruction is enountered
>     prior to any vsetvl.  */
>  static rtx_insn *
> -get_first_vsetvl_before_rvv_insns (basic_block cfg_bb)
> +get_first_vsetvl_before_rvv_insns (basic_block cfg_bb,
> +                                  enum vsetvl_type insn_type)
>  {
 
add gcc_assert (insn_type == VSETVL_DISCARD_RESULT || insn_type ==
VSETVL_VTYPE_CHANGE_ONLY).
 
>    rtx_insn *rinsn;
>    FOR_BB_INSNS (cfg_bb, rinsn)
> @@ -4310,7 +3832,11 @@ get_first_vsetvl_before_rvv_insns (basic_block cfg_bb)
>        if (has_vtype_op (rinsn) || vsetvl_insn_p (rinsn))
>         return nullptr;
>
> -      if (vsetvl_discard_result_insn_p (rinsn))
> +      if (insn_type == VSETVL_DISCARD_RESULT
> +         && vsetvl_discard_result_insn_p (rinsn))
> +       return rinsn;
> +      if (insn_type == VSETVL_VTYPE_CHANGE_ONLY
> +         && vsetvl_vtype_change_only_p (rinsn))
>         return rinsn;
>      }
>    return nullptr;
 
 
> diff --git a/gcc/config/riscv/riscv-vsetvl.def b/gcc/config/riscv/riscv-vsetvl.def
> index 7a73149f1da..7289c01efcf 100644
> --- a/gcc/config/riscv/riscv-vsetvl.def
> +++ b/gcc/config/riscv/riscv-vsetvl.def
> @@ -319,7 +319,7 @@ DEF_SEW_LMUL_FUSE_RULE (/*SEW*/ DEMAND_TRUE, /*LMUL*/ DEMAND_FALSE,
>                         /*RATIO*/ DEMAND_TRUE, /*GE_SEW*/ DEMAND_FALSE,
>                         /*NEW_DEMAND_SEW*/ true,
>                         /*NEW_DEMAND_LMUL*/ false,
> -                       /*NEW_DEMAND_RATIO*/ false,
> +                       /*NEW_DEMAND_RATIO*/ true,
 
This seems relax the compatiblitly check to allow optimize more case,
if so this should be a sperated patch.
 
>                         /*NEW_DEMAND_GE_SEW*/ true, first_sew,
>                         vlmul_for_first_sew_second_ratio, second_ratio)
>  DEF_SEW_LMUL_FUSE_RULE (/*SEW*/ DEMAND_TRUE, /*LMUL*/ DEMAND_FALSE,
 
 
> @@ -386,7 +337,8 @@ public:
>    bool compatible_avl_p (const avl_info &) const;
>    bool compatible_vtype_p (const vl_vtype_info &) const;
>    bool compatible_p (const vl_vtype_info &) const;
> -  vector_insn_info merge (const vector_insn_info &, enum merge_type) const;
> +  vector_insn_info merge (const vector_insn_info &, enum merge_type,
> +                         unsigned = 0) const;
 
it seems weired to set bb_index as 0 by default?
 
>
>    rtl_ssa::insn_info *get_insn () const { return m_insn; }
>    const bool *get_demands (void) const { return m_demands; }
 
 
> diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
> index 1252d6f851a..f3ce66ccdd4 100644
> --- a/gcc/config/riscv/t-riscv
> +++ b/gcc/config/riscv/t-riscv
> @@ -62,7 +62,8 @@ riscv-vsetvl.o: $(srcdir)/config/riscv/riscv-vsetvl.cc \
>    $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(REGS_H) \
>    $(TARGET_H) tree-pass.h df.h rtl-ssa.h cfgcleanup.h insn-config.h \
>    insn-attr.h insn-opinit.h tm-constrs.h cfgrtl.h cfganal.h lcm.h \
> -  predict.h profile-count.h $(srcdir)/config/riscv/riscv-vsetvl.h
> +  predict.h profile-count.h $(srcdir)/config/riscv/riscv-vsetvl.h \
> +  $(srcdir)/config/riscv/riscv-vsetvl.def
 
This should be a seperate fix and backport to GCC 13 as well,
pre-approve for both master and GCC-13 branch for this fix.
 
>         $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
>                 $(srcdir)/config/riscv/riscv-vsetvl.cc
diff mbox series

Patch

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 2d8fa754ea0..42153eee0ba 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -240,6 +240,15 @@  vsetvl_discard_result_insn_p (rtx_insn *rinsn)
 	  || INSN_CODE (rinsn) == CODE_FOR_vsetvl_discard_resultsi);
 }
 
+/* Return true if it is vsetvl zero, rs1.  */
+static bool
+vsetvl_vtype_change_only_p (rtx_insn *rinsn)
+{
+  if (!vector_config_insn_p (rinsn))
+    return false;
+  return (INSN_CODE (rinsn) == CODE_FOR_vsetvl_vtype_change_only);
+}
+
 static bool
 real_insn_and_same_bb_p (const insn_info *insn, const bb_info *bb)
 {
@@ -252,15 +261,10 @@  before_p (const insn_info *insn1, const insn_info *insn2)
   return insn1->compare_with (insn2) < 0;
 }
 
-static insn_info *
-find_reg_killed_by (const bb_info *bb, rtx x)
+static bool
+after_or_same_p (const insn_info *insn1, const insn_info *insn2)
 {
-  if (!x || vlmax_avl_p (x) || !REG_P (x))
-    return nullptr;
-  for (insn_info *insn : bb->reverse_real_nondebug_insns ())
-    if (find_access (insn->defs (), REGNO (x)))
-      return insn;
-  return nullptr;
+  return insn1->compare_with (insn2) >= 0;
 }
 
 /* Helper function to get VL operand.  */
@@ -275,35 +279,6 @@  get_vl (rtx_insn *rinsn)
   return SET_DEST (XVECEXP (PATTERN (rinsn), 0, 0));
 }
 
-static bool
-has_vsetvl_killed_avl_p (const bb_info *bb, const vector_insn_info &info)
-{
-  if (info.dirty_with_killed_avl_p ())
-    {
-      rtx avl = info.get_avl ();
-      if (vlmax_avl_p (avl))
-	return find_reg_killed_by (bb, info.get_avl_reg_rtx ()) != nullptr;
-      for (const insn_info *insn : bb->reverse_real_nondebug_insns ())
-	{
-	  def_info *def = find_access (insn->defs (), REGNO (avl));
-	  if (def)
-	    {
-	      set_info *set = safe_dyn_cast<set_info *> (def);
-	      if (!set)
-		return false;
-
-	      rtx new_avl = gen_rtx_REG (GET_MODE (avl), REGNO (avl));
-	      gcc_assert (new_avl != avl);
-	      if (!info.compatible_avl_p (avl_info (new_avl, set)))
-		return false;
-
-	      return true;
-	    }
-	}
-    }
-  return false;
-}
-
 /* An "anticipatable occurrence" is one that is the first occurrence in the
    basic block, the operands are not modified in the basic block prior
    to the occurrence and the output is not used between the start of
@@ -335,7 +310,29 @@  anticipatable_occurrence_p (const bb_info *bb, const vector_insn_info dem)
       /* rs1 (avl) are not modified in the basic block prior to the VSETVL.  */
       rtx avl
 	= has_vl_op (insn->rtl ()) ? get_vl (insn->rtl ()) : dem.get_avl ();
-      if (!vlmax_avl_p (avl))
+      if (dem.dirty_p ())
+	{
+	  gcc_assert (!vsetvl_insn_p (insn->rtl ()));
+
+	  /* Earliest VSETVL will be inserted at the end of the block.  */
+	  for (const insn_info *i : bb->real_nondebug_insns ())
+	    {
+	      /* rs1 (avl) are not modified in the basic block prior to the
+		 VSETVL.  */
+	      if (find_access (i->defs (), REGNO (avl)))
+		return false;
+	      if (vlmax_avl_p (dem.get_avl ()))
+		{
+		  /* rd (avl) is not used between the start of the block and
+		     the occurrence. Note: Only for Dirty and VLMAX-avl.  */
+		  if (find_access (i->uses (), REGNO (avl)))
+		    return false;
+		}
+	    }
+
+	  return true;
+	}
+      else if (!vlmax_avl_p (avl))
 	{
 	  set_info *set = dem.get_avl_source ();
 	  /* If it's undefined, it's not anticipatable conservatively.  */
@@ -344,6 +341,14 @@  anticipatable_occurrence_p (const bb_info *bb, const vector_insn_info dem)
 	  if (real_insn_and_same_bb_p (set->insn (), bb)
 	      && before_p (set->insn (), insn))
 	    return false;
+	  for (insn_info *i = insn->prev_nondebug_insn ();
+	       real_insn_and_same_bb_p (i, bb); i = i->prev_nondebug_insn ())
+	    {
+	      /* rs1 (avl) are not modified in the basic block prior to the
+		 VSETVL.  */
+	      if (find_access (i->defs (), REGNO (avl)))
+		return false;
+	    }
 	}
     }
 
@@ -508,15 +513,6 @@  get_avl (rtx_insn *rinsn)
   return recog_data.operand[get_attr_vl_op_idx (rinsn)];
 }
 
-static set_info *
-get_same_bb_set (hash_set<set_info *> &sets, const basic_block cfg_bb)
-{
-  for (set_info *set : sets)
-    if (set->bb ()->cfg_bb () == cfg_bb)
-      return set;
-  return nullptr;
-}
-
 /* Recursively find all predecessor blocks for cfg_bb. */
 static hash_set<basic_block>
 get_all_predecessors (basic_block cfg_bb)
@@ -542,16 +538,6 @@  get_all_predecessors (basic_block cfg_bb)
   return blocks;
 }
 
-/* Return true if there is an INSN in insns staying in the block BB.  */
-static bool
-any_set_in_bb_p (hash_set<set_info *> sets, const bb_info *bb)
-{
-  for (const set_info *set : sets)
-    if (set->bb ()->index () == bb->index ())
-      return true;
-  return false;
-}
-
 /* Helper function to get SEW operand. We always have SEW value for
    all RVV instructions that have VTYPE OP.  */
 static uint8_t
@@ -907,8 +893,8 @@  change_insn (function_info *ssa, insn_change change, insn_info *insn,
 		] UNSPEC_VPREDICATE)
 	    (plus:RVVM4DI (reg/v:RVVM4DI 104 v8 [orig:137 op1 ] [137])
 		(sign_extend:RVVM4DI (vec_duplicate:RVVM4SI (reg:SI 15 a5
-    [140])))) (unspec:RVVM4DI [ (const_int 0 [0]) ] UNSPEC_VUNDEF))) "rvv.c":8:12
-    2784 {pred_single_widen_addsvnx8di_scalar} (expr_list:REG_EQUIV
+    [140])))) (unspec:RVVM4DI [ (const_int 0 [0]) ] UNSPEC_VUNDEF)))
+    "rvv.c":8:12 2784 {pred_single_widen_addsvnx8di_scalar} (expr_list:REG_EQUIV
     (mem/c:RVVM4DI (reg:DI 10 a0 [142]) [1 <retval>+0 S[64, 64] A128])
 	(expr_list:REG_EQUAL (if_then_else:RVVM4DI (unspec:RVVMF8BI [
 			(const_vector:RVVMF8BI repeat [
@@ -1423,8 +1409,13 @@  static bool
 ge_sew_ratio_unavailable_p (const vector_insn_info &info1,
 			    const vector_insn_info &info2)
 {
-  if (!info2.demand_p (DEMAND_LMUL) && info2.demand_p (DEMAND_GE_SEW))
-    return info1.get_sew () < info2.get_sew ();
+  if (!info2.demand_p (DEMAND_LMUL))
+    {
+      if (info2.demand_p (DEMAND_GE_SEW))
+	return info1.get_sew () < info2.get_sew ();
+      else if (!info2.demand_p (DEMAND_SEW))
+	return false;
+    }
   return true;
 }
 
@@ -1503,75 +1494,6 @@  support_relaxed_compatible_p (const vector_insn_info &info1,
   return false;
 }
 
-/* Return true if the block is worthwhile backward propagation.  */
-static bool
-backward_propagate_worthwhile_p (const basic_block cfg_bb,
-				 const vector_block_info block_info)
-{
-  if (loop_basic_block_p (cfg_bb))
-    {
-      if (block_info.reaching_out.valid_or_dirty_p ())
-	{
-	  if (block_info.local_dem.compatible_p (block_info.reaching_out))
-	    {
-	      /* Case 1 (Can backward propagate):
-		 ....
-		 bb0:
-		 ...
-		 for (int i = 0; i < n; i++)
-		   {
-		     vint16mf4_t v = __riscv_vle16_v_i16mf4 (in + i + 5, 7);
-		     __riscv_vse16_v_i16mf4 (out + i + 5, v, 7);
-		   }
-		 The local_dem is compatible with reaching_out. Such case is
-		 worthwhile backward propagation.  */
-	      return true;
-	    }
-	  else
-	    {
-	      if (support_relaxed_compatible_p (block_info.reaching_out,
-						block_info.local_dem))
-		return true;
-	      /* Case 2 (Don't backward propagate):
-		    ....
-		    bb0:
-		    ...
-		    for (int i = 0; i < n; i++)
-		      {
-			vint16mf4_t v = __riscv_vle16_v_i16mf4 (in + i + 5, 7);
-			__riscv_vse16_v_i16mf4 (out + i + 5, v, 7);
-			vint16mf2_t v2 = __riscv_vle16_v_i16mf2 (in + i + 6, 8);
-			__riscv_vse16_v_i16mf2 (out + i + 6, v, 8);
-		      }
-		 The local_dem is incompatible with reaching_out.
-		 It makes no sense to backward propagate the local_dem since we
-		 can't avoid VSETVL inside the loop.  */
-	      return false;
-	    }
-	}
-      else
-	{
-	  gcc_assert (block_info.reaching_out.unknown_p ());
-	  /* Case 3 (Don't backward propagate):
-		....
-		bb0:
-		...
-		for (int i = 0; i < n; i++)
-		  {
-		    vint16mf4_t v = __riscv_vle16_v_i16mf4 (in + i + 5, 7);
-		    __riscv_vse16_v_i16mf4 (out + i + 5, v, 7);
-		    fn3 ();
-		  }
-	    The local_dem is VALID, but the reaching_out is UNKNOWN.
-	    It makes no sense to backward propagate the local_dem since we
-	    can't avoid VSETVL inside the loop.  */
-	  return false;
-	}
-    }
-
-  return true;
-}
-
 /* Count the number of REGNO in RINSN.  */
 static int
 count_regno_occurrences (rtx_insn *rinsn, unsigned int regno)
@@ -1815,7 +1737,7 @@  vector_insn_info::parse_insn (rtx_insn *rinsn)
     return;
   if (optimize == 0 && !has_vtype_op (rinsn))
     return;
-  if (optimize > 0 && !vsetvl_insn_p (rinsn))
+  if (optimize > 0 && vsetvl_discard_result_insn_p (rinsn))
     return;
   m_state = VALID;
   extract_insn_cached (rinsn);
@@ -2206,9 +2128,9 @@  vector_insn_info::fuse_mask_policy (const vector_insn_info &info1,
 
 vector_insn_info
 vector_insn_info::merge (const vector_insn_info &merge_info,
-			 enum merge_type type) const
+			 enum merge_type type, unsigned bb_index) const
 {
-  if (!vsetvl_insn_p (get_insn ()->rtl ()))
+  if (!vsetvl_insn_p (get_insn ()->rtl ()) && *this != merge_info)
     gcc_assert (this->compatible_p (merge_info)
 		&& "Can't merge incompatible demanded infos");
 
@@ -2223,13 +2145,37 @@  vector_insn_info::merge (const vector_insn_info &merge_info,
   else
     {
       /* For global data flow, we should keep original INSN and AVL if they
-      valid since we should keep the life information of each block.
+	 valid since we should keep the life information of each block.
 
-      For example:
-	bb 0 -> bb 1.
-      We should keep INSN && AVL of bb 1 since we will eventually emit
-      vsetvl instruction according to INSN and AVL of bb 1.  */
+	 For example:
+	   bb 0 -> bb 1.
+	 We should keep INSN && AVL of bb 1 since we will eventually emit
+	 vsetvl instruction according to INSN and AVL of bb 1.  */
       new_info.fuse_avl (*this, merge_info);
+      if (new_info.get_avl_source ()
+	  && new_info.get_avl_source ()->insn ()->is_phi ()
+	  && new_info.get_avl_source ()->bb ()->index () != bb_index)
+	{
+	  hash_set<set_info *> sets
+	    = get_all_sets (new_info.get_avl_source (), true, true, true);
+	  new_info.set_avl_source (nullptr);
+	  bool can_find_set_p = false;
+	  set_info *first_set = nullptr;
+	  for (set_info *set : sets)
+	    {
+	      if (!first_set)
+		first_set = set;
+	      if (set->bb ()->index () == bb_index)
+		{
+		  gcc_assert (!can_find_set_p);
+		  new_info.set_avl_source (set);
+		  can_find_set_p = true;
+		}
+	    }
+	  if (!can_find_set_p && sets.elements () == 1
+	      && first_set->insn ()->is_real ())
+	    new_info.set_avl_source (first_set);
+	}
     }
 
   new_info.fuse_sew_lmul (*this, merge_info);
@@ -2300,10 +2246,6 @@  vector_insn_info::dump (FILE *file) const
     fprintf (file, "UNKNOWN,");
   else if (empty_p ())
     fprintf (file, "EMPTY,");
-  else if (hard_empty_p ())
-    fprintf (file, "HARD_EMPTY,");
-  else if (dirty_with_killed_avl_p ())
-    fprintf (file, "DIRTY_WITH_KILLED_AVL,");
   else
     fprintf (file, "DIRTY,");
 
@@ -2346,6 +2288,9 @@  vector_infos_manager::vector_infos_manager ()
   vector_comp = nullptr;
   vector_avin = nullptr;
   vector_avout = nullptr;
+  vector_antin = nullptr;
+  vector_antout = nullptr;
+  vector_earliest = nullptr;
   vector_insn_infos.safe_grow (get_max_uid ());
   vector_block_infos.safe_grow (last_basic_block_for_fn (cfun));
   if (!optimize)
@@ -2403,18 +2348,22 @@  vector_infos_manager::get_all_available_exprs (
 }
 
 bool
-vector_infos_manager::all_empty_predecessor_p (const basic_block cfg_bb) const
+vector_infos_manager::earliest_fusion_worthwhile_p (
+  const basic_block cfg_bb) const
 {
-  hash_set<basic_block> pred_cfg_bbs = get_all_predecessors (cfg_bb);
-  for (const basic_block pred_cfg_bb : pred_cfg_bbs)
+  edge e;
+  edge_iterator ei;
+  profile_probability prob = profile_probability::uninitialized ();
+  FOR_EACH_EDGE (e, ei, cfg_bb->succs)
     {
-      const auto &pred_block_info = vector_block_infos[pred_cfg_bb->index];
-      if (!pred_block_info.local_dem.valid_or_dirty_p ()
-	  && !pred_block_info.reaching_out.valid_or_dirty_p ())
+      if (prob == profile_probability::uninitialized ())
+	prob = vector_block_infos[e->dest->index].probability;
+      else if (prob == vector_block_infos[e->dest->index].probability)
 	continue;
-      return false;
+      else
+	return true;
     }
-  return true;
+  return false;
 }
 
 bool
@@ -2428,12 +2377,12 @@  vector_infos_manager::all_same_ratio_p (sbitmap bitdata) const
   sbitmap_iterator sbi;
 
   EXECUTE_IF_SET_IN_BITMAP (bitdata, 0, bb_index, sbi)
-  {
-    if (ratio == -1)
-      ratio = vector_exprs[bb_index]->get_ratio ();
-    else if (vector_exprs[bb_index]->get_ratio () != ratio)
-      return false;
-  }
+    {
+      if (ratio == -1)
+	ratio = vector_exprs[bb_index]->get_ratio ();
+      else if (vector_exprs[bb_index]->get_ratio () != ratio)
+	return false;
+    }
   return true;
 }
 
@@ -2473,10 +2422,10 @@  vector_infos_manager::all_same_avl_p (const basic_block cfg_bb,
   sbitmap_iterator sbi;
 
   EXECUTE_IF_SET_IN_BITMAP (bitdata, 0, bb_index, sbi)
-  {
-    if (vector_exprs[bb_index]->get_avl_info () != avl)
-      return false;
-  }
+    {
+      if (vector_exprs[bb_index]->get_avl_info () != avl)
+	return false;
+    }
   return true;
 }
 
@@ -2522,10 +2471,17 @@  vector_infos_manager::create_bitmap_vectors (void)
 				       vector_exprs.length ());
   vector_kill = sbitmap_vector_alloc (last_basic_block_for_fn (cfun),
 				      vector_exprs.length ());
+  vector_antin = sbitmap_vector_alloc (last_basic_block_for_fn (cfun),
+				       vector_exprs.length ());
+  vector_antout = sbitmap_vector_alloc (last_basic_block_for_fn (cfun),
+					vector_exprs.length ());
 
   bitmap_vector_ones (vector_transp, last_basic_block_for_fn (cfun));
   bitmap_vector_clear (vector_antic, last_basic_block_for_fn (cfun));
   bitmap_vector_clear (vector_comp, last_basic_block_for_fn (cfun));
+  vector_edge_list = create_edge_list ();
+  vector_earliest = sbitmap_vector_alloc (NUM_EDGES (vector_edge_list),
+					  vector_exprs.length ());
 }
 
 void
@@ -2549,6 +2505,12 @@  vector_infos_manager::free_bitmap_vectors (void)
     sbitmap_vector_free (vector_avin);
   if (vector_avout)
     sbitmap_vector_free (vector_avout);
+  if (vector_antin)
+    sbitmap_vector_free (vector_antin);
+  if (vector_antout)
+    sbitmap_vector_free (vector_antout);
+  if (vector_earliest)
+    sbitmap_vector_free (vector_earliest);
 
   vector_edge_list = nullptr;
   vector_kill = nullptr;
@@ -2559,6 +2521,9 @@  vector_infos_manager::free_bitmap_vectors (void)
   vector_comp = nullptr;
   vector_avin = nullptr;
   vector_avout = nullptr;
+  vector_antin = nullptr;
+  vector_antout = nullptr;
+  vector_earliest = nullptr;
 }
 
 void
@@ -2616,6 +2581,18 @@  vector_infos_manager::dump (FILE *file) const
 	fprintf (file, "(nil)\n");
       else
 	dump_bitmap_file (file, vector_kill[cfg_bb->index]);
+
+      fprintf (file, "<ANTIN>=");
+      if (vector_antin == nullptr)
+	fprintf (file, "(nil)\n");
+      else
+	dump_bitmap_file (file, vector_antin[cfg_bb->index]);
+
+      fprintf (file, "<ANTOUT>=");
+      if (vector_antout == nullptr)
+	fprintf (file, "(nil)\n");
+      else
+	dump_bitmap_file (file, vector_antout[cfg_bb->index]);
     }
 
   fprintf (file, "\n");
@@ -2643,17 +2620,36 @@  vector_infos_manager::dump (FILE *file) const
 	dump_bitmap_file (file, vector_del[cfg_bb->index]);
     }
 
-  fprintf (file, "\nGlobal LCM (Lazy code motion) INSERT info:\n");
   for (size_t i = 0; i < vector_exprs.length (); i++)
     {
       for (int ed = 0; ed < NUM_EDGES (vector_edge_list); ed++)
 	{
 	  edge eg = INDEX_EDGE (vector_edge_list, ed);
-	  if (bitmap_bit_p (vector_insert[ed], i))
-	    fprintf (dump_file,
-		     "INSERT edge %d from bb %d to bb %d for VSETVL "
-		     "expr[%ld]\n",
-		     ed, eg->src->index, eg->dest->index, i);
+	  if (vector_insert)
+	    {
+	      if (bitmap_bit_p (vector_insert[ed], i))
+		{
+		  fprintf (file,
+			   "\nGlobal LCM (Lazy code motion) INSERT info:\n");
+		  fprintf (file,
+			   "INSERT edge %d from <bb %d> to <bb %d> for VSETVL "
+			   "expr[%ld]\n",
+			   ed, eg->src->index, eg->dest->index, i);
+		}
+	    }
+	  else
+	    {
+	      if (bitmap_bit_p (vector_earliest[ed], i))
+		{
+		  fprintf (file,
+			   "\nGlobal LCM (Lazy code motion) EARLIEST info:\n");
+		  fprintf (
+		    file,
+		    "EARLIEST edge %d from <bb %d> to <bb %d> for VSETVL "
+		    "expr[%ld]\n",
+		    ed, eg->src->index, eg->dest->index, i);
+		}
+	    }
 	}
     }
 }
@@ -2682,6 +2678,7 @@  private:
   vector_block_info &get_block_info (const basic_block);
   vector_block_info &get_block_info (const bb_info *);
   void update_vector_info (const insn_info *, const vector_insn_info &);
+  void update_block_info (int, profile_probability, vector_insn_info);
 
   void simple_vsetvl (void) const;
   void lazy_vsetvl (void);
@@ -2696,12 +2693,7 @@  private:
   void emit_local_forward_vsetvls (const bb_info *);
 
   /* Phase 3.  */
-  enum fusion_type get_backward_fusion_type (const bb_info *,
-					     const vector_insn_info &);
-  bool hard_empty_block_p (const bb_info *, const vector_insn_info &) const;
-  bool backward_demand_fusion (void);
-  bool forward_demand_fusion (void);
-  bool cleanup_illegal_dirty_blocks (void);
+  bool earliest_fusion (void);
   void demand_fusion (void);
 
   /* Phase 4.  */
@@ -2720,6 +2712,7 @@  private:
   void ssa_post_optimization (void) const;
 
   /* Phase 6.  */
+  bool cleanup_earliest_vsetvls (const basic_block) const;
   void df_post_optimization (void) const;
 
   void init (void);
@@ -2777,6 +2770,17 @@  pass_vsetvl::update_vector_info (const insn_info *i,
   m_vector_manager->vector_insn_infos[i->uid ()] = new_info;
 }
 
+void
+pass_vsetvl::update_block_info (int index, profile_probability prob,
+				vector_insn_info new_info)
+{
+  m_vector_manager->vector_block_infos[index].probability = prob;
+  if (m_vector_manager->vector_block_infos[index].local_dem
+      == m_vector_manager->vector_block_infos[index].reaching_out)
+    m_vector_manager->vector_block_infos[index].local_dem = new_info;
+  m_vector_manager->vector_block_infos[index].reaching_out = new_info;
+}
+
 /* Simple m_vsetvl_insert vsetvl for optimize == 0.  */
 void
 pass_vsetvl::simple_vsetvl (void) const
@@ -2951,697 +2955,225 @@  pass_vsetvl::emit_local_forward_vsetvls (const bb_info *bb)
   block_info.reaching_out = curr_info;
 }
 
-enum fusion_type
-pass_vsetvl::get_backward_fusion_type (const bb_info *bb,
-				       const vector_insn_info &prop)
+/* Return TRUE if the demands can be fused.  */
+static bool
+demands_can_be_fused_p (const vector_insn_info &be_fused,
+			const vector_insn_info &to_fuse)
 {
-  insn_info *insn = prop.get_insn ();
-
-  /* TODO: We don't backward propagate the explict VSETVL here
-     since we will change vsetvl and vsetvlmax intrinsics into
-     no side effects which can be optimized into optimal location
-     by GCC internal passes. We only need to support these backward
-     propagation if vsetvl intrinsics have side effects.  */
-  if (vsetvl_insn_p (insn->rtl ()))
-    return INVALID_FUSION;
-
-  gcc_assert (has_vtype_op (insn->rtl ()));
-  rtx reg = NULL_RTX;
-
-  /* Case 1: Don't need VL. Just let it backward propagate.  */
-  if (!prop.demand_p (DEMAND_AVL))
-    return VALID_AVL_FUSION;
-  else
-    {
-      /* Case 2: CONST_INT AVL, we don't need to check def.  */
-      if (prop.has_avl_imm ())
-	return VALID_AVL_FUSION;
-      else
-	{
-	  /* Case 3: REG AVL, we need to check the distance of def to make
-	     sure we won't backward propagate over the def.  */
-	  gcc_assert (prop.has_avl_reg ());
-	  if (vlmax_avl_p (prop.get_avl ()))
-	    /* Check VL operand for vsetvl vl,zero.  */
-	    reg = prop.get_avl_reg_rtx ();
-	  else
-	    /* Check AVL operand for vsetvl zero,avl.  */
-	    reg = prop.get_avl ();
-	}
-    }
-
-  gcc_assert (reg);
-  if (!prop.get_avl_source ()->insn ()->is_phi ()
-      && prop.get_avl_source ()->insn ()->bb () == insn->bb ())
-    return INVALID_FUSION;
-  hash_set<set_info *> sets
-    = get_all_sets (prop.get_avl_source (), true, true, true);
-  if (any_set_in_bb_p (sets, insn->bb ()))
-    return INVALID_FUSION;
-
-  if (vlmax_avl_p (prop.get_avl ()))
-    {
-      if (find_reg_killed_by (bb, reg))
-	return INVALID_FUSION;
-      else
-	return VALID_AVL_FUSION;
-    }
-
-  /* By default, we always enable backward fusion so that we can
-     gain more optimizations.  */
-  if (!find_reg_killed_by (bb, reg))
-    return VALID_AVL_FUSION;
-  return KILLED_AVL_FUSION;
+  return be_fused.compatible_p (to_fuse) && !be_fused.available_p (to_fuse);
 }
 
-/* We almost enable all cases in get_backward_fusion_type, this function
-   disable the backward fusion by changing dirty blocks into hard empty
-   blocks in forward dataflow. We can have more accurate optimization by
-   this method.  */
-bool
-pass_vsetvl::hard_empty_block_p (const bb_info *bb,
-				 const vector_insn_info &info) const
-{
-  if (!info.dirty_p () || !info.has_avl_reg ())
-    return false;
-
-  basic_block cfg_bb = bb->cfg_bb ();
-  sbitmap avin = m_vector_manager->vector_avin[cfg_bb->index];
-  set_info *set = info.get_avl_source ();
-  rtx avl = gen_rtx_REG (Pmode, set->regno ());
-  hash_set<set_info *> sets = get_all_sets (set, true, false, false);
-  hash_set<basic_block> pred_cfg_bbs = get_all_predecessors (cfg_bb);
-
-  if (find_reg_killed_by (bb, avl))
-    {
-      /* Condition 1:
-	 Dirty block with killed AVL means that the empty block (no RVV
-	 instructions) are polluted as Dirty blocks with the value of current
-	 AVL is killed. For example:
-	      bb 0:
-		...
-	      bb 1:
-		def a5
-	      bb 2:
-		RVV (use a5)
-	 In backward dataflow, we will polluted BB0 and BB1 as Dirt with AVL
-	 killed. since a5 is killed in BB1.
-	 In this case, let's take a look at this example:
-
-	      bb 3:        bb 4:
-		def3 a5       def4 a5
-	      bb 5:        bb 6:
-		def1 a5       def2 a5
-		    \         /
-		     \       /
-		      \     /
-		       \   /
-			bb 7:
-		    RVV (use a5)
-	 In thi case, we can polluted BB5 and BB6 as dirty if get-def
-	 of a5 from RVV instruction in BB7 is the def1 in BB5 and
-	 def2 BB6 so we can return false early here for HARD_EMPTY_BLOCK_P.
-	 However, we are not sure whether BB3 and BB4 can be
-	 polluted as Dirty with AVL killed so we can't return false
-	 for HARD_EMPTY_BLOCK_P here since it's too early which will
-	 potentially produce issues.  */
-      gcc_assert (info.dirty_with_killed_avl_p ());
-      if (info.get_avl_source ()
-	  && get_same_bb_set (sets, bb->cfg_bb ()) == info.get_avl_source ())
-	return false;
-    }
-
-  /* Condition 2:
-     Suppress the VL/VTYPE info backward propagation too early:
-			 ________
-			|   BB0  |
-			|________|
-			    |
-			____|____
-			|   BB1  |
-			|________|
-     In this case, suppose BB 1 has multiple predecessors, BB 0 is one
-     of them. BB1 has VL/VTYPE info (may be VALID or DIRTY) to backward
-     propagate.
-     The AVIN (available in) which is calculated by LCM is empty only
-     in these 2 circumstances:
-       1. all predecessors of BB1 are empty (not VALID
-	  and can not be polluted in backward fusion flow)
-       2. VL/VTYPE info of BB1 predecessors are conflict.
-
-     We keep it as dirty in 2nd circumstance and set it as HARD_EMPTY
-     (can not be polluted as DIRTY any more) in 1st circumstance.
-     We don't backward propagate in 1st circumstance since there is
-     no VALID RVV instruction and no polluted blocks (dirty blocks)
-     by backward propagation from other following blocks.
-     It's meaningless to keep it as Dirty anymore.
-
-     However, since we keep it as dirty in 2nd since there are VALID or
-     Dirty blocks in predecessors, we can still gain the benefits and
-     optimization opportunities. For example, in this case:
-	for (size_t i = 0; i < n; i++)
-	 {
-	   if (i != cond) {
-	     vint8mf8_t v = *(vint8mf8_t*)(in + i + 100);
-	     *(vint8mf8_t*)(out + i + 100) = v;
-	   } else {
-	     vbool1_t v = *(vbool1_t*)(in + i + 400);
-	     *(vbool1_t*)(out + i + 400) = v;
-	   }
-	 }
-     VL/VTYPE in if-else are conflict which will produce empty AVIN LCM result
-     but we can still keep dirty blocks if *(i != cond)* is very unlikely then
-     we can preset vsetvl (VL/VTYPE) info from else (static propability model).
-
-     We don't want to backward propagate VL/VTYPE information too early
-     which is not the optimal and may potentially produce issues.  */
-  if (bitmap_empty_p (avin))
-    {
-      bool hard_empty_p = true;
-      for (const basic_block pred_cfg_bb : pred_cfg_bbs)
-	{
-	  if (pred_cfg_bb == ENTRY_BLOCK_PTR_FOR_FN (cfun))
-	    continue;
-	  sbitmap avout = m_vector_manager->vector_avout[pred_cfg_bb->index];
-	  if (!bitmap_empty_p (avout))
-	    {
-	      hard_empty_p = false;
-	      break;
-	    }
-	}
-      if (hard_empty_p)
-	return true;
-    }
-
-  edge e;
-  edge_iterator ei;
-  bool has_avl_killed_insn_p = false;
-  FOR_EACH_EDGE (e, ei, cfg_bb->succs)
-    {
-      const auto block_info
-	= m_vector_manager->vector_block_infos[e->dest->index];
-      if (block_info.local_dem.dirty_with_killed_avl_p ())
-	{
-	  has_avl_killed_insn_p = true;
-	  break;
-	}
-    }
-  if (!has_avl_killed_insn_p)
-    return false;
-
-  bool any_set_in_bbs_p = false;
-  for (const basic_block pred_cfg_bb : pred_cfg_bbs)
+/* Return true if we can fuse VSETVL demand info into predecessor of earliest
+ * edge.  */
+static bool
+earliest_pred_can_be_fused_p (const bb_info *earliest_pred,
+			      const vector_insn_info &earliest_info,
+			      const vector_insn_info &expr, rtx *vlmax_vl)
+{
+  rtx vl = NULL_RTX;
+  /* Backward VLMAX VL:
+       bb 3:
+	 vsetivli zero, 1 ... -> vsetvli t1, zero
+	 vmv.s.x
+       bb 5:
+	 vsetvli t1, zero ... -> to be elided.
+	 vlse16.v
+
+       We should forward "t1".  */
+  if (!earliest_info.has_avl_reg () && expr.has_avl_reg ())
     {
-      insn_info *def_insn = extract_single_source (set);
-      if (def_insn)
-	{
-	  /* Condition 3:
-
-	    Case 1:                               Case 2:
-		bb 0:                                 bb 0:
-		  def a5 101                             ...
-		bb 1:                                 bb 1:
-		  ...                                    ...
-		bb 2:                                 bb 2:
-		  RVV 1 (use a5 with TAIL ANY)           ...
-		bb 3:                                 bb 3:
-		  def a5 101                             def a5 101
-		bb 4:                                 bb 4:
-		  ...                                    ...
-		bb 5:                                 bb 5:
-		  RVV 2 (use a5 with TU)                 RVV 1 (use a5)
-
-	    Case 1: We can pollute BB3,BB2,BB1,BB0 are all Dirt blocks
-	    with killed AVL so that we can merge TU demand info from RVV 2
-	    into RVV 1 and elide the vsevl instruction in BB5.
-
-	    TODO: We only optimize for single source def since multiple source
-	    def is quite complicated.
-
-	    Case 2: We only can pollute bb 3 as dirty and it has been accepted
-	    in Condition 2 and we can't pollute BB3,BB2,BB1,BB0 like case 1. */
-	  insn_info *last_killed_insn
-	    = find_reg_killed_by (crtl->ssa->bb (pred_cfg_bb), avl);
-	  if (!last_killed_insn || pred_cfg_bb == def_insn->bb ()->cfg_bb ())
-	    continue;
-	  if (source_equal_p (last_killed_insn, def_insn))
-	    {
-	      any_set_in_bbs_p = true;
-	      break;
-	    }
-	}
-      else
+      rtx avl = expr.get_avl ();
+      const insn_info *last_insn = earliest_info.get_insn ();
+      if (vlmax_avl_p (avl))
+	vl = get_vl (expr.get_insn ()->rtl ());
+      /* To fuse demand on earlest edge, we make sure AVL/VL
+	 didn't change from the consume insn to the predecessor
+	 of the edge.  */
+      for (insn_info *i = earliest_pred->end_insn ()->prev_nondebug_insn ();
+	   real_insn_and_same_bb_p (i, earliest_pred)
+	   && after_or_same_p (i, last_insn);
+	   i = i->prev_nondebug_insn ())
 	{
-	  /* Condition 4:
-
-	      bb 0:        bb 1:         bb 3:
-		def1 a5       def2 a5     ...
-		    \         /            /
-		     \       /            /
-		      \     /            /
-		       \   /            /
-			bb 4:          /
-			 |            /
-			 |           /
-			bb 5:       /
-			 |         /
-			 |        /
-			bb 6:    /
-			 |      /
-			 |     /
-			  bb 8:
-			RVV 1 (use a5)
-	  If we get-def (REAL) of a5 from RVV 1 instruction, we will get
-	  def1 from BB0 and def2 from BB1. So we will pollute BB6,BB5,BB4,
-	  BB0,BB1 with DIRTY and set BB3 as HARD_EMPTY so that we won't
-	  propagate AVL to BB3.  */
-	  if (any_set_in_bb_p (sets, crtl->ssa->bb (pred_cfg_bb)))
-	    {
-	      any_set_in_bbs_p = true;
-	      break;
-	    }
+	  if (!vl && find_access (i->defs (), REGNO (avl)))
+	    return false;
+	  if (vl && find_access (i->defs (), REGNO (vl)))
+	    return false;
+	  if (vl && find_access (i->uses (), REGNO (vl)))
+	    return false;
 	}
     }
-  if (!any_set_in_bbs_p)
-    return true;
-  return false;
+  if (vlmax_vl)
+    *vlmax_vl = vl;
+  return true;
 }
 
-/* Compute global backward demanded info.  */
+/* Fuse demand info for earliest edge.  */
 bool
-pass_vsetvl::backward_demand_fusion (void)
+pass_vsetvl::earliest_fusion (void)
 {
-  /* We compute global infos by backward propagation.
-     We want to have better performance in these following cases:
-
-	1. for (size_t i = 0; i < n; i++) {
-	     if (i != cond) {
-	       vint8mf8_t v = *(vint8mf8_t*)(in + i + 100);
-	       *(vint8mf8_t*)(out + i + 100) = v;
-	     } else {
-	       vbool1_t v = *(vbool1_t*)(in + i + 400);
-	       *(vbool1_t*)(out + i + 400) = v;
-	     }
-	   }
-
-	   Since we don't have any RVV instruction in the BEFORE blocks,
-	   LCM fails to optimize such case. We want to backward propagate
-	   them into empty blocks so that we could have better performance
-	   in LCM.
-
-	2. bb 0:
-	     vsetvl e8,mf8 (demand RATIO)
-	   bb 1:
-	     vsetvl e32,mf2 (demand SEW and LMUL)
-	   We backward propagate the first VSETVL into e32,mf2 so that we
-	   could be able to eliminate the second VSETVL in LCM.  */
-
   bool changed_p = false;
-  for (const bb_info *bb : crtl->ssa->reverse_bbs ())
+  for (int ed = 0; ed < NUM_EDGES (m_vector_manager->vector_edge_list); ed++)
     {
-      basic_block cfg_bb = bb->cfg_bb ();
-      const auto &curr_block_info
-	= m_vector_manager->vector_block_infos[cfg_bb->index];
-      const auto &prop = curr_block_info.local_dem;
-
-      /* If there is nothing to propagate, just skip it.  */
-      if (!prop.valid_or_dirty_p ())
-	continue;
-
-      if (!backward_propagate_worthwhile_p (cfg_bb, curr_block_info))
-	continue;
-
-      /* Fix PR108270:
-
-		bb 0 -> bb 1
-	 We don't need to backward fuse VL/VTYPE info from bb 1 to bb 0
-	 if bb 1 is not inside a loop and all predecessors of bb 0 are empty. */
-      if (m_vector_manager->all_empty_predecessor_p (cfg_bb))
-	continue;
-
-      edge e;
-      edge_iterator ei;
-      /* Backward propagate to each predecessor.  */
-      FOR_EACH_EDGE (e, ei, cfg_bb->preds)
+      for (size_t i = 0; i < m_vector_manager->vector_exprs.length (); i++)
 	{
-	  auto &block_info
-	    = m_vector_manager->vector_block_infos[e->src->index];
-
-	  /* We don't propagate through critical edges.  */
-	  if (e->flags & EDGE_COMPLEX)
-	    continue;
-	  if (e->src->index == ENTRY_BLOCK_PTR_FOR_FN (cfun)->index)
-	    continue;
-	  /* If prop is demand of vsetvl instruction and reaching doesn't demand
-	     AVL. We don't backward propagate since vsetvl instruction has no
-	     side effects.  */
-	  if (vsetvl_insn_p (prop.get_insn ()->rtl ())
-	      && propagate_avl_across_demands_p (prop, block_info.reaching_out))
+	  auto &expr = *m_vector_manager->vector_exprs[i];
+	  if (expr.empty_p ())
 	    continue;
+	  edge eg = INDEX_EDGE (m_vector_manager->vector_edge_list, ed);
+	  if (eg->src == ENTRY_BLOCK_PTR_FOR_FN (cfun)
+	      || eg->dest == EXIT_BLOCK_PTR_FOR_FN (cfun))
+	    break;
 
-	  if (block_info.reaching_out.unknown_p ())
-	    continue;
-	  else if (block_info.reaching_out.hard_empty_p ())
-	    continue;
-	  else if (block_info.reaching_out.empty_p ())
+	  if (bitmap_bit_p (m_vector_manager->vector_earliest[ed], i))
 	    {
-	      enum fusion_type type
-		= get_backward_fusion_type (crtl->ssa->bb (e->src), prop);
-	      if (type == INVALID_FUSION)
-		continue;
-
-	      block_info.reaching_out = prop;
-	      block_info.reaching_out.set_dirty (type);
-
-	      if (prop.has_avl_reg () && !vlmax_avl_p (prop.get_avl ()))
-		{
-		  hash_set<set_info *> sets
-		    = get_all_sets (prop.get_avl_source (), true, true, true);
-		  set_info *set = get_same_bb_set (sets, e->src);
-		  if (set)
-		    block_info.reaching_out.set_avl_info (
-		      avl_info (prop.get_avl (), set));
-		}
+	      auto &src_block_info = get_block_info (eg->src);
+	      auto &dest_block_info = get_block_info (eg->dest);
+	      if (src_block_info.reaching_out.unknown_p ())
+		break;
 
-	      block_info.local_dem = block_info.reaching_out;
-	      block_info.probability = curr_block_info.probability;
-	      changed_p = true;
-	    }
-	  else if (block_info.reaching_out.dirty_p ())
-	    {
-	      /* DIRTY -> DIRTY or VALID -> DIRTY.  */
-
-	      /* Forbidden this case fuse because it change the value of a5.
-		   bb 1: vsetvl zero, no_zero_avl
-			 ...
-			 use a5
-			 ...
-		   bb 2: vsetvl a5, zero
-		 =>
-		   bb 1: vsetvl a5, zero
-			 ...
-			 use a5
-			 ...
-		   bb 2:
-	      */
-	      if (block_info.reaching_out.demand_p (DEMAND_NONZERO_AVL)
-		  && vlmax_avl_p (prop.get_avl ()))
-		continue;
-	      vector_insn_info new_info;
+	      gcc_assert (!(eg->flags & EDGE_ABNORMAL));
+	      vector_insn_info new_info = vector_insn_info ();
+	      profile_probability prob = src_block_info.probability;
 
-	      if (block_info.reaching_out.compatible_p (prop))
+	      if (src_block_info.reaching_out.empty_p ())
 		{
-		  if (block_info.reaching_out.available_p (prop))
+		  if (src_block_info.probability
+			== profile_probability::uninitialized ()
+		      || vsetvl_insn_p (expr.get_insn ()->rtl ()))
 		    continue;
-		  new_info = block_info.reaching_out.merge (prop, GLOBAL_MERGE);
-		  new_info.set_dirty (
-		    block_info.reaching_out.dirty_with_killed_avl_p ());
-		  block_info.probability += curr_block_info.probability;
+		  new_info = expr.merge (expr, GLOBAL_MERGE, eg->src->index);
+		  new_info.set_dirty ();
+		  prob = dest_block_info.probability;
+		  update_block_info (eg->src->index, prob, new_info);
+		  changed_p = true;
 		}
-	      else
+	      else if (src_block_info.reaching_out.dirty_p ())
 		{
-		  if (curr_block_info.probability > block_info.probability)
+		  /* DIRTY -> DIRTY or VALID -> DIRTY.  */
+		  if (demands_can_be_fused_p (src_block_info.reaching_out,
+					      expr))
 		    {
-		      enum fusion_type type
-			= get_backward_fusion_type (crtl->ssa->bb (e->src),
-						    prop);
-		      if (type == INVALID_FUSION)
-			continue;
-		      new_info = prop;
-		      new_info.set_dirty (type);
-		      block_info.probability = curr_block_info.probability;
+		      new_info
+			= src_block_info.reaching_out.merge (expr, GLOBAL_MERGE,
+							     eg->src->index);
+		      new_info.set_dirty ();
+		      prob += dest_block_info.probability;
+		    }
+		  else if (!src_block_info.reaching_out.compatible_p (expr)
+			   && !m_vector_manager->earliest_fusion_worthwhile_p (
+			     eg->src))
+		    {
+		      new_info.set_empty ();
+		      prob = profile_probability::uninitialized ();
+		    }
+		  else if (!src_block_info.reaching_out.compatible_p (expr)
+			   && dest_block_info.probability
+				> src_block_info.probability)
+		    {
+		      new_info = expr;
+		      new_info.set_dirty ();
+		      prob = dest_block_info.probability;
 		    }
 		  else
 		    continue;
-		}
 
-	      if (propagate_avl_across_demands_p (prop,
-						  block_info.reaching_out))
-		{
-		  rtx reg = new_info.get_avl_reg_rtx ();
-		  if (find_reg_killed_by (crtl->ssa->bb (e->src), reg))
-		    new_info.set_dirty (true);
+		  update_block_info (eg->src->index, prob, new_info);
+		  changed_p = true;
 		}
-
-	      block_info.local_dem = new_info;
-	      block_info.reaching_out = new_info;
-	      changed_p = true;
-	    }
-	  else
-	    {
-	      /* We not only change the info during backward propagation,
-		 but also change the VSETVL instruction.  */
-	      gcc_assert (block_info.reaching_out.valid_p ());
-	      hash_set<set_info *> sets
-		= get_all_sets (prop.get_avl_source (), true, false, false);
-	      set_info *set = get_same_bb_set (sets, e->src);
-	      if (vsetvl_insn_p (block_info.reaching_out.get_insn ()->rtl ())
-		  && prop.has_avl_reg () && !vlmax_avl_p (prop.get_avl ()))
+	      else
 		{
-		  if (!block_info.reaching_out.same_vlmax_p (prop))
-		    continue;
-		  if (block_info.reaching_out.same_vtype_p (prop))
+		  if (!demands_can_be_fused_p (src_block_info.reaching_out,
+					       expr))
 		    continue;
-		  if (!set)
-		    continue;
-		  if (set->insn () != block_info.reaching_out.get_insn ())
-		    continue;
-		}
-
-	      if (!block_info.reaching_out.compatible_p (prop))
-		continue;
-	      if (block_info.reaching_out.available_p (prop))
-		continue;
-
-	      vector_insn_info be_merged = block_info.reaching_out;
-	      if (block_info.local_dem == block_info.reaching_out)
-		be_merged = block_info.local_dem;
-	      vector_insn_info new_info = be_merged.merge (prop, GLOBAL_MERGE);
-
-	      if (curr_block_info.probability > block_info.probability)
-		block_info.probability = curr_block_info.probability;
-
-	      if (propagate_avl_across_demands_p (prop, block_info.reaching_out)
-		  && !reg_available_p (crtl->ssa->bb (e->src)->end_insn (),
-				       new_info))
-		continue;
-
-	      rtx vl = NULL_RTX;
-	      /* Backward VLMAX VL:
-		   bb 3:
-		     vsetivli zero, 1 ... -> vsetvli t1, zero
-		     vmv.s.x
-		   bb 5:
-		     vsetvli t1, zero ... -> to be elided.
-		     vlse16.v
-
-		   We should forward "t1".  */
-	      if (!block_info.reaching_out.has_avl_reg ()
-		  && vlmax_avl_p (new_info.get_avl ()))
-		vl = get_vl (prop.get_insn ()->rtl ());
-	      change_vsetvl_insn (new_info.get_insn (), new_info, vl);
-	      if (block_info.local_dem == block_info.reaching_out)
-		block_info.local_dem = new_info;
-	      block_info.reaching_out = new_info;
-	      changed_p = true;
-	    }
-	}
-    }
-  return changed_p;
-}
-
-/* Compute global forward demanded info.  */
-bool
-pass_vsetvl::forward_demand_fusion (void)
-{
-  /* Enhance the global information propagation especially
-     backward propagation miss the propagation.
-     Consider such case:
-
-			bb0
-			(TU)
-		       /   \
-		     bb1   bb2
-		     (TU)  (ANY)
-  existing edge -----> \    / (TU) <----- LCM create this edge.
-			bb3
-			(TU)
-
-     Base on the situation, LCM fails to eliminate the VSETVL instruction and
-     insert an edge from bb2 to bb3 since we can't backward propagate bb3 into
-     bb2. To avoid this confusing LCM result and non-optimal codegen, we should
-     forward propagate information from bb0 to bb2 which is friendly to LCM.  */
-  bool changed_p = false;
-  for (const bb_info *bb : crtl->ssa->bbs ())
-    {
-      basic_block cfg_bb = bb->cfg_bb ();
-      const auto &prop
-	= m_vector_manager->vector_block_infos[cfg_bb->index].reaching_out;
-
-      /* If there is nothing to propagate, just skip it.  */
-      if (!prop.valid_or_dirty_p ())
-	continue;
-
-      if (cfg_bb == ENTRY_BLOCK_PTR_FOR_FN (cfun))
-	continue;
 
-      if (vsetvl_insn_p (prop.get_insn ()->rtl ()))
-	continue;
+		  rtx vl = NULL_RTX;
+		  if (!earliest_pred_can_be_fused_p (
+			crtl->ssa->bb (eg->src), src_block_info.reaching_out,
+			expr, &vl))
+		    continue;
 
-      edge e;
-      edge_iterator ei;
-      /* Forward propagate to each successor.  */
-      FOR_EACH_EDGE (e, ei, cfg_bb->succs)
-	{
-	  auto &local_dem
-	    = m_vector_manager->vector_block_infos[e->dest->index].local_dem;
-	  auto &reaching_out
-	    = m_vector_manager->vector_block_infos[e->dest->index].reaching_out;
+		  vector_insn_info new_info
+		    = src_block_info.reaching_out.merge (expr, GLOBAL_MERGE,
+							 eg->src->index);
 
-	  /* It's quite obvious, we don't need to propagate itself.  */
-	  if (e->dest->index == cfg_bb->index)
-	    continue;
-	  /* We don't propagate through critical edges.  */
-	  if (e->flags & EDGE_COMPLEX)
-	    continue;
-	  if (e->dest->index == EXIT_BLOCK_PTR_FOR_FN (cfun)->index)
-	    continue;
+		  if (dest_block_info.probability > src_block_info.probability)
+		    prob = dest_block_info.probability;
 
-	  /* If there is nothing to propagate, just skip it.  */
-	  if (!local_dem.valid_or_dirty_p ())
-	    continue;
-	  if (local_dem.available_p (prop))
-	    continue;
-	  if (!local_dem.compatible_p (prop))
-	    continue;
-	  if (propagate_avl_across_demands_p (prop, local_dem))
-	    continue;
-
-	  vector_insn_info new_info = local_dem.merge (prop, GLOBAL_MERGE);
-	  new_info.set_insn (local_dem.get_insn ());
-	  if (local_dem.dirty_p ())
-	    {
-	      gcc_assert (local_dem == reaching_out);
-	      new_info.set_dirty (local_dem.dirty_with_killed_avl_p ());
-	      local_dem = new_info;
-	      reaching_out = local_dem;
-	    }
-	  else
-	    {
-	      if (reaching_out == local_dem)
-		reaching_out = new_info;
-	      local_dem = new_info;
-	      change_vsetvl_insn (local_dem.get_insn (), new_info);
+		  change_vsetvl_insn (new_info.get_insn (), new_info, vl);
+		  update_block_info (eg->src->index, prob, new_info);
+		  changed_p = true;
+		}
 	    }
-	  auto &prob
-	    = m_vector_manager->vector_block_infos[e->dest->index].probability;
-	  auto &curr_prob
-	    = m_vector_manager->vector_block_infos[cfg_bb->index].probability;
-	  prob = curr_prob * e->probability;
-	  changed_p = true;
 	}
     }
   return changed_p;
 }
 
+/* Fuse demand info according LCM computed location.  */
 void
 pass_vsetvl::demand_fusion (void)
 {
-  bool changed_p = true;
-  while (changed_p)
-    {
-      changed_p = false;
-      /* To optimize the case like this:
-	 void f2 (int8_t * restrict in, int8_t * restrict out, int n, int cond)
-	   {
-	     size_t vl = 101;
-
-	     for (size_t i = 0; i < n; i++)
-	       {
-		 vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + 300, vl);
-		 __riscv_vse8_v_i8mf8 (out + i + 300, v, vl);
-	       }
-
-	     for (size_t i = 0; i < n; i++)
-	       {
-		 vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i, vl);
-		 __riscv_vse8_v_i8mf8 (out + i, v, vl);
-
-		 vint8mf8_t v2 = __riscv_vle8_v_i8mf8_tu (v, in + i + 100, vl);
-		 __riscv_vse8_v_i8mf8 (out + i + 100, v2, vl);
-	       }
+  /* We want to have better performance in these following cases:
+
+	1. for (size_t i = 0; i < n; i++) {
+	     if (i != cond) {
+	       vint8mf8_t v = *(vint8mf8_t*)(in + i + 100);
+	       *(vint8mf8_t*)(out + i + 100) = v;
+	     } else {
+	       vbool1_t v = *(vbool1_t*)(in + i + 400);
+	       *(vbool1_t*)(out + i + 400) = v;
+	     }
 	   }
 
-	  bb 0: li a5, 101 (killed avl)
-	  ...
-	  bb 1: vsetvli zero, a5, ta
-	  ...
-	  bb 2: li a5, 101 (killed avl)
-	  ...
-	  bb 3: vsetvli zero, a3, tu
-
-	We want to fuse VSEVLI instructions on bb 1 and bb 3. However, there is
-	an AVL kill instruction in bb 2 that we can't backward fuse bb 3 or
-	forward bb 1 arbitrarily. We need available information of each block to
-	help for such cases.  */
-      changed_p |= backward_demand_fusion ();
-      changed_p |= forward_demand_fusion ();
-    }
+	   Since we don't have any RVV instruction in the BEFORE blocks,
+	   LCM fails to optimize such case. We want to backward propagate
+	   them into empty blocks so that we could have better performance
+	   in LCM.
 
-  changed_p = true;
+	2. bb 0:
+	     vsetvl e8,mf8 (demand RATIO)
+	   bb 1:
+	     vsetvl e32,mf2 (demand SEW and LMUL)
+	   We backward propagate the first VSETVL into e32,mf2 so that we
+	   could be able to eliminate the second VSETVL in LCM.  */
+  bool changed_p = true;
+  int fusion_no = 0;
+  /* Fuse VSETVL demand info until VSETVL CFG fixed.  */
   while (changed_p)
     {
       changed_p = false;
+      fusion_no++;
       prune_expressions ();
       m_vector_manager->create_bitmap_vectors ();
       compute_local_properties ();
+      /* Compute global availability.  */
       compute_available (m_vector_manager->vector_comp,
 			 m_vector_manager->vector_kill,
 			 m_vector_manager->vector_avout,
 			 m_vector_manager->vector_avin);
-      changed_p |= cleanup_illegal_dirty_blocks ();
+      /* Compute global anticipatability.  */
+      compute_antinout_edge (m_vector_manager->vector_antic,
+			     m_vector_manager->vector_transp,
+			     m_vector_manager->vector_antin,
+			     m_vector_manager->vector_antout);
+      /* Compute earliestness.  */
+      compute_earliest (m_vector_manager->vector_edge_list,
+			m_vector_manager->vector_exprs.length (),
+			m_vector_manager->vector_antin,
+			m_vector_manager->vector_antout,
+			m_vector_manager->vector_avout,
+			m_vector_manager->vector_kill,
+			m_vector_manager->vector_earliest);
+      changed_p |= earliest_fusion ();
+      if (dump_file)
+	{
+	  fprintf (dump_file, "\nEARLIEST fusion %d\n", fusion_no);
+	  m_vector_manager->dump (dump_file);
+	}
       m_vector_manager->free_bitmap_vectors ();
       if (!m_vector_manager->vector_exprs.is_empty ())
 	m_vector_manager->vector_exprs.release ();
     }
-
-  if (dump_file)
-    {
-      fprintf (dump_file, "\n\nDirty blocks list: ");
-      for (const bb_info *bb : crtl->ssa->bbs ())
-	if (m_vector_manager->vector_block_infos[bb->index ()]
-	      .reaching_out.dirty_p ())
-	  fprintf (dump_file, "%d ", bb->index ());
-      fprintf (dump_file, "\n\n");
-    }
-}
-
-/* Cleanup illegal dirty blocks.  */
-bool
-pass_vsetvl::cleanup_illegal_dirty_blocks (void)
-{
-  bool changed_p = false;
-  for (const bb_info *bb : crtl->ssa->bbs ())
-    {
-      basic_block cfg_bb = bb->cfg_bb ();
-      const auto &prop
-	= m_vector_manager->vector_block_infos[cfg_bb->index].reaching_out;
-
-      /* If there is nothing to cleanup, just skip it.  */
-      if (!prop.valid_or_dirty_p ())
-	continue;
-
-      if (hard_empty_block_p (bb, prop))
-	{
-	  m_vector_manager->vector_block_infos[cfg_bb->index].local_dem
-	    = vector_insn_info::get_hard_empty ();
-	  m_vector_manager->vector_block_infos[cfg_bb->index].reaching_out
-	    = vector_insn_info::get_hard_empty ();
-	  changed_p = true;
-	  continue;
-	}
-    }
-  return changed_p;
 }
 
 /* Assemble the candidates expressions for LCM.  */
@@ -3714,6 +3246,8 @@  pass_vsetvl::compute_local_properties (void)
   for (const bb_info *bb : crtl->ssa->bbs ())
     {
       unsigned int curr_bb_idx = bb->index ();
+      if (curr_bb_idx == ENTRY_BLOCK || curr_bb_idx == EXIT_BLOCK)
+	continue;
       const auto local_dem
 	= m_vector_manager->vector_block_infos[curr_bb_idx].local_dem;
       const auto reaching_out
@@ -3722,57 +3256,35 @@  pass_vsetvl::compute_local_properties (void)
       /* Compute transparent.  */
       for (size_t i = 0; i < m_vector_manager->vector_exprs.length (); i++)
 	{
-	  const vector_insn_info *expr = m_vector_manager->vector_exprs[i];
-	  if (local_dem.real_dirty_p () || local_dem.valid_p ()
-	      || local_dem.unknown_p ()
-	      || has_vsetvl_killed_avl_p (bb, local_dem))
+	  const auto *expr = m_vector_manager->vector_exprs[i];
+	  if (local_dem.valid_or_dirty_p () || local_dem.unknown_p ())
 	    bitmap_clear_bit (m_vector_manager->vector_transp[curr_bb_idx], i);
-	  /* FIXME: Here we set the block as non-transparent (killed) if there
-	     is an instruction killed the value of AVL according to the
-	     definition of Local transparent. This is true for such following
-	     case:
-
-		bb 0 (Loop label):
-		  vsetvl zero, a5, e8, mf8
-		bb 1:
-		  def a5
-		bb 2:
-		  branch bb 0 (Loop label).
-
-	     In this case, we known there is a loop bb 0->bb 1->bb 2. According
-	     to LCM definition, it is correct when we set vsetvl zero, a5, e8,
-	     mf8 as non-transparent (killed) so that LCM will not hoist outside
-	     the bb 0.
-
-	     However, such conservative configuration will forbid optimization
-	     on some unlucky case. For example:
-
-		bb 0:
-		  li a5, 101
-		bb 1:
-		  vsetvl zero, a5, e8, mf8
-		bb 2:
-		  li a5, 101
-		bb 3:
-		  vsetvl zero, a5, e8, mf8.
-	     So we also relax def a5 as transparent to gain more optimizations
-	     as long as the all real def insn of avl do not come from this
-	     block. This configuration may be still missing some optimization
-	     opportunities.  */
-	  if (find_reg_killed_by (bb, expr->get_avl ()))
+	  else if (expr->has_avl_reg ())
 	    {
-	      hash_set<set_info *> sets
-		= get_all_sets (expr->get_avl_source (), true, false, false);
-	      if (any_set_in_bb_p (sets, bb))
-		bitmap_clear_bit (m_vector_manager->vector_transp[curr_bb_idx],
-				  i);
+	      rtx avl = vlmax_avl_p (expr->get_avl ())
+			  ? get_vl (expr->get_insn ()->rtl ())
+			  : expr->get_avl ();
+	      for (const insn_info *insn : bb->real_nondebug_insns ())
+		{
+		  if (find_access (insn->defs (), REGNO (avl)))
+		    {
+		      bitmap_clear_bit (
+			m_vector_manager->vector_transp[curr_bb_idx], i);
+		      break;
+		    }
+		  else if (vlmax_avl_p (expr->get_avl ())
+			   && find_access (insn->uses (), REGNO (avl)))
+		    {
+		      bitmap_clear_bit (
+			m_vector_manager->vector_transp[curr_bb_idx], i);
+		      break;
+		    }
+		}
 	    }
 	}
 
       /* Compute anticipatable occurrences.  */
-      if (local_dem.valid_p () || local_dem.real_dirty_p ()
-	  || (has_vsetvl_killed_avl_p (bb, local_dem)
-	      && vlmax_avl_p (local_dem.get_avl ())))
+      if (local_dem.valid_or_dirty_p ())
 	if (anticipatable_occurrence_p (bb, local_dem))
 	  bitmap_set_bit (m_vector_manager->vector_antic[curr_bb_idx],
 			  m_vector_manager->get_expr_id (local_dem));
@@ -3786,13 +3298,17 @@  pass_vsetvl::compute_local_properties (void)
 	    {
 	      const vector_insn_info *expr
 		= m_vector_manager->vector_exprs[available_list[i]];
-	      if (reaching_out.real_dirty_p ()
-		  || has_vsetvl_killed_avl_p (bb, reaching_out)
-		  || available_occurrence_p (bb, *expr))
+	      if (available_occurrence_p (bb, *expr))
 		bitmap_set_bit (m_vector_manager->vector_comp[curr_bb_idx],
 				available_list[i]);
 	    }
 	}
+
+      if (loop_basic_block_p (bb->cfg_bb ()) && local_dem.valid_or_dirty_p ()
+	  && reaching_out.valid_or_dirty_p ()
+	  && !local_dem.compatible_p (reaching_out))
+	bitmap_clear_bit (m_vector_manager->vector_antic[curr_bb_idx],
+			  m_vector_manager->get_expr_id (local_dem));
     }
 
   /* Compute kill for each basic block using:
@@ -3892,7 +3408,7 @@  pass_vsetvl::refine_vsetvls (void) const
   basic_block cfg_bb;
   FOR_EACH_BB_FN (cfg_bb, cfun)
     {
-      auto info = get_block_info(cfg_bb).local_dem;
+      auto info = get_block_info (cfg_bb).local_dem;
       insn_info *insn = info.get_insn ();
       if (!info.valid_p ())
 	continue;
@@ -3938,8 +3454,7 @@  pass_vsetvl::cleanup_vsetvls ()
   basic_block cfg_bb;
   FOR_EACH_BB_FN (cfg_bb, cfun)
     {
-      auto &info
-	= get_block_info(cfg_bb).reaching_out;
+      auto &info = get_block_info (cfg_bb).reaching_out;
       gcc_assert (m_vector_manager->expr_set_num (
 		    m_vector_manager->vector_del[cfg_bb->index])
 		  <= 1);
@@ -3951,9 +3466,7 @@  pass_vsetvl::cleanup_vsetvls ()
 		info.set_unknown ();
 	      else
 		{
-		  const auto dem
-		    = get_block_info(cfg_bb)
-			.local_dem;
+		  const auto dem = get_block_info (cfg_bb).local_dem;
 		  gcc_assert (dem == *m_vector_manager->vector_exprs[i]);
 		  insn_info *insn = dem.get_insn ();
 		  gcc_assert (insn && insn->rtl ());
@@ -4013,6 +3526,16 @@  pass_vsetvl::commit_vsetvls (void)
 	      gcc_assert (!(eg->flags & EDGE_ABNORMAL));
 	      need_commit = true;
 	      insert_insn_on_edge (rinsn, eg);
+
+	      if (dump_file)
+		{
+		  fprintf (dump_file,
+			   "\nInsert vsetvl insn %d at edge %d from <bb %d> to "
+			   "<bb %d>:\n",
+			   INSN_UID (rinsn), ed, eg->src->index,
+			   eg->dest->index);
+		  print_rtl_single (dump_file, rinsn);
+		}
 	    }
 	}
     }
@@ -4020,33 +3543,10 @@  pass_vsetvl::commit_vsetvls (void)
   for (const bb_info *bb : crtl->ssa->bbs ())
     {
       basic_block cfg_bb = bb->cfg_bb ();
-      const auto reaching_out
-	= get_block_info(cfg_bb).reaching_out;
+      const auto reaching_out = get_block_info (cfg_bb).reaching_out;
       if (!reaching_out.dirty_p ())
 	continue;
 
-      if (reaching_out.dirty_with_killed_avl_p ())
-	{
-	  if (!has_vsetvl_killed_avl_p (bb, reaching_out))
-	    continue;
-
-	  unsigned int bb_index;
-	  sbitmap_iterator sbi;
-	  sbitmap avin = m_vector_manager->vector_avin[cfg_bb->index];
-	  bool available_p = false;
-	  EXECUTE_IF_SET_IN_BITMAP (avin, 0, bb_index, sbi)
-	  {
-	    if (m_vector_manager->vector_exprs[bb_index]->available_p (
-		  reaching_out))
-	      {
-		available_p = true;
-		break;
-	      }
-	  }
-	  if (available_p)
-	    continue;
-	}
-
       rtx new_pat;
       if (!reaching_out.demand_p (DEMAND_AVL))
 	{
@@ -4058,23 +3558,43 @@  pass_vsetvl::commit_vsetvls (void)
 	new_pat
 	  = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, reaching_out, NULL_RTX);
       else if (vlmax_avl_p (reaching_out.get_avl ()))
-	new_pat = gen_vsetvl_pat (VSETVL_NORMAL, reaching_out,
-				  reaching_out.get_avl_reg_rtx ());
+	{
+	  rtx vl = NULL_RTX;
+	  if (!reaching_out.get_avl_source ())
+	    {
+	      gcc_assert (vsetvl_insn_p (reaching_out.get_insn ()->rtl ()));
+	      vl = get_vl (reaching_out.get_insn ()->rtl ());
+	    }
+	  else
+	    vl = reaching_out.get_avl_reg_rtx ();
+	  new_pat = gen_vsetvl_pat (VSETVL_NORMAL, reaching_out, vl);
+	}
       else
 	new_pat
 	  = gen_vsetvl_pat (VSETVL_DISCARD_RESULT, reaching_out, NULL_RTX);
 
-      start_sequence ();
-      emit_insn (new_pat);
-      rtx_insn *rinsn = get_insns ();
-      end_sequence ();
-      rtx_insn *new_insn = insert_insn_end_basic_block (rinsn, cfg_bb);
-      if (dump_file)
+      edge eg;
+      edge_iterator eg_iterator;
+
+      FOR_EACH_EDGE (eg, eg_iterator, cfg_bb->succs)
 	{
-	  fprintf (dump_file,
-		   "\nInsert vsetvl insn %d at the end of <bb %d>:\n",
-		   INSN_UID (new_insn), cfg_bb->index);
-	  print_rtl_single (dump_file, new_insn);
+	  /* We should not get an abnormal edge here.  */
+	  gcc_assert (!(eg->flags & EDGE_ABNORMAL));
+
+	  start_sequence ();
+	  emit_insn (copy_rtx (new_pat));
+	  rtx_insn *rinsn = get_insns ();
+	  end_sequence ();
+
+	  insert_insn_on_edge (rinsn, eg);
+	  need_commit = true;
+	  if (dump_file)
+	    {
+	      fprintf (dump_file,
+		       "\nInsert vsetvl insn %d from <bb %d> to <bb %d>:\n",
+		       INSN_UID (rinsn), cfg_bb->index, eg->dest->index);
+	      print_rtl_single (dump_file, rinsn);
+	    }
 	}
     }
 
@@ -4263,7 +3783,8 @@  pass_vsetvl::local_eliminate_vsetvl_insn (const bb_info *bb) const
 
       /* Local AVL compatibility checking is simpler than global, we only
 	 need to check the REGNO is same.  */
-      if (prev_dem.valid_or_dirty_p () && prev_dem.skip_avl_compatible_p (curr_dem)
+      if (prev_dem.valid_or_dirty_p ()
+	  && prev_dem.skip_avl_compatible_p (curr_dem)
 	  && local_avl_compatible_p (prev_avl, curr_avl))
 	{
 	  /* curr_dem and prev_dem is compatible!  */
@@ -4298,7 +3819,8 @@  pass_vsetvl::local_eliminate_vsetvl_insn (const bb_info *bb) const
    none exists or if a user RVV instruction is enountered
    prior to any vsetvl.  */
 static rtx_insn *
-get_first_vsetvl_before_rvv_insns (basic_block cfg_bb)
+get_first_vsetvl_before_rvv_insns (basic_block cfg_bb,
+				   enum vsetvl_type insn_type)
 {
   rtx_insn *rinsn;
   FOR_BB_INSNS (cfg_bb, rinsn)
@@ -4310,7 +3832,11 @@  get_first_vsetvl_before_rvv_insns (basic_block cfg_bb)
       if (has_vtype_op (rinsn) || vsetvl_insn_p (rinsn))
 	return nullptr;
 
-      if (vsetvl_discard_result_insn_p (rinsn))
+      if (insn_type == VSETVL_DISCARD_RESULT
+	  && vsetvl_discard_result_insn_p (rinsn))
+	return rinsn;
+      if (insn_type == VSETVL_VTYPE_CHANGE_ONLY
+	  && vsetvl_vtype_change_only_p (rinsn))
 	return rinsn;
     }
   return nullptr;
@@ -4358,16 +3884,15 @@  pass_vsetvl::global_eliminate_vsetvl_insn (const bb_info *bb) const
     {
       /* Optimize the local vsetvl.  */
       dem = block_info.local_dem;
-      vsetvl_rinsn = get_first_vsetvl_before_rvv_insns (cfg_bb);
+      vsetvl_rinsn
+	= get_first_vsetvl_before_rvv_insns (cfg_bb, VSETVL_DISCARD_RESULT);
     }
   if (!vsetvl_rinsn)
     /* Optimize the global vsetvl inserted by LCM.  */
     vsetvl_rinsn = get_vsetvl_at_end (bb, &dem);
 
   /* No need to optimize if block doesn't have vsetvl instructions.  */
-  if (!dem.valid_or_dirty_p ()
-      || !vsetvl_rinsn
-      || !dem.get_avl_source ()
+  if (!dem.valid_or_dirty_p () || !vsetvl_rinsn || !dem.get_avl_source ()
       || !dem.has_avl_reg ())
     return false;
 
@@ -4382,7 +3907,7 @@  pass_vsetvl::global_eliminate_vsetvl_insn (const bb_info *bb) const
 
   unsigned int bb_index;
   sbitmap_iterator sbi;
-  rtx avl = get_avl (dem.get_insn ()->rtl ());
+  rtx avl = dem.get_avl ();
   hash_set<set_info *> sets
     = get_all_sets (dem.get_avl_source (), true, false, false);
   /* Condition 2: All VL/VTYPE available in are all compatible.  */
@@ -4406,7 +3931,10 @@  pass_vsetvl::global_eliminate_vsetvl_insn (const bb_info *bb) const
     {
       sbitmap avout = m_vector_manager->vector_avout[e->src->index];
       if (e->src == ENTRY_BLOCK_PTR_FOR_FN (cfun)
-	  || e->src == EXIT_BLOCK_PTR_FOR_FN (cfun) || bitmap_empty_p (avout))
+	  || e->src == EXIT_BLOCK_PTR_FOR_FN (cfun)
+	  || (unsigned int) e->src->index
+	       >= m_vector_manager->vector_block_infos.length ()
+	  || bitmap_empty_p (avout))
 	return false;
 
       EXECUTE_IF_SET_IN_BITMAP (avout, 0, bb_index, sbi)
@@ -4550,6 +4078,61 @@  has_no_uses (basic_block cfg_bb, rtx_insn *rinsn, int regno)
   return true;
 }
 
+/* For many reasons, we failed to elide the redundant vsetvls
+   in Phase 3 and Phase 4.
+
+     - VLMAX-AVL case: 'vlmax_avl<mode>' may locate at some unlucky
+       point which make us set ANTLOC as false for LCM in 'O1'.
+       We don't want to complicate phase 3 and phase 4 too much,
+       so we do the post optimization for redundant VSETVLs here.
+*/
+bool
+pass_vsetvl::cleanup_earliest_vsetvls (const basic_block cfg_bb) const
+{
+  bool is_earliest_p = false;
+  if (cfg_bb->index >= (int) m_vector_manager->vector_block_infos.length ())
+    is_earliest_p = true;
+
+  rtx_insn *rinsn
+    = get_first_vsetvl_before_rvv_insns (cfg_bb, VSETVL_VTYPE_CHANGE_ONLY);
+  if (!rinsn)
+    return is_earliest_p;
+
+  sbitmap avail;
+  if (is_earliest_p)
+    {
+      gcc_assert (single_succ_p (cfg_bb) && single_pred_p (cfg_bb));
+      const bb_info *pred_bb = crtl->ssa->bb (single_pred (cfg_bb));
+      gcc_assert (pred_bb->index ()
+		  < m_vector_manager->vector_block_infos.length ());
+      avail = m_vector_manager->vector_avout[pred_bb->index ()];
+    }
+  else
+    avail = m_vector_manager->vector_avin[cfg_bb->index];
+
+  if (!bitmap_empty_p (avail))
+    {
+      unsigned int bb_index;
+      sbitmap_iterator sbi;
+      vector_insn_info strictest_info = vector_insn_info ();
+      EXECUTE_IF_SET_IN_BITMAP (avail, 0, bb_index, sbi)
+	{
+	  const auto *expr = m_vector_manager->vector_exprs[bb_index];
+	  if (strictest_info.uninit_p ()
+	      || !expr->compatible_p (
+		static_cast<const vl_vtype_info &> (strictest_info)))
+	    strictest_info = *expr;
+	}
+      vector_insn_info info;
+      info.parse_insn (rinsn);
+      if (!strictest_info.same_vtype_p (info))
+	return is_earliest_p;
+      eliminate_insn (rinsn);
+    }
+
+  return is_earliest_p;
+}
+
 /* This function does the following post optimization base on dataflow
    analysis:
 
@@ -4569,6 +4152,8 @@  pass_vsetvl::df_post_optimization (void) const
   rtx_insn *rinsn;
   FOR_ALL_BB_FN (cfg_bb, cfun)
     {
+      if (cleanup_earliest_vsetvls (cfg_bb))
+	continue;
       FOR_BB_INSNS (cfg_bb, rinsn)
 	{
 	  if (NONDEBUG_INSN_P (rinsn) && vsetvl_insn_p (rinsn))
@@ -4655,8 +4240,7 @@  pass_vsetvl::compute_probabilities (void)
   for (const bb_info *bb : crtl->ssa->bbs ())
     {
       basic_block cfg_bb = bb->cfg_bb ();
-      auto &curr_prob
-	= get_block_info(cfg_bb).probability;
+      auto &curr_prob = get_block_info (cfg_bb).probability;
 
       /* GCC assume entry block (bb 0) are always so
 	 executed so set its probability as "always".  */
@@ -4669,8 +4253,7 @@  pass_vsetvl::compute_probabilities (void)
       gcc_assert (curr_prob.initialized_p ());
       FOR_EACH_EDGE (e, ei, cfg_bb->succs)
 	{
-	  auto &new_prob
-	    = get_block_info(e->dest).probability;
+	  auto &new_prob = get_block_info (e->dest).probability;
 	  if (!new_prob.initialized_p ())
 	    new_prob = curr_prob * e->probability;
 	  else if (new_prob == profile_probability::always ())
@@ -4714,8 +4297,6 @@  pass_vsetvl::lazy_vsetvl (void)
   if (dump_file)
     fprintf (dump_file, "\nPhase 3: Demands propagation across blocks\n");
   demand_fusion ();
-  if (dump_file)
-    m_vector_manager->dump (dump_file);
 
   /* Phase 4 - Lazy code motion.  */
   if (dump_file)
diff --git a/gcc/config/riscv/riscv-vsetvl.def b/gcc/config/riscv/riscv-vsetvl.def
index 7a73149f1da..7289c01efcf 100644
--- a/gcc/config/riscv/riscv-vsetvl.def
+++ b/gcc/config/riscv/riscv-vsetvl.def
@@ -319,7 +319,7 @@  DEF_SEW_LMUL_FUSE_RULE (/*SEW*/ DEMAND_TRUE, /*LMUL*/ DEMAND_FALSE,
 			/*RATIO*/ DEMAND_TRUE, /*GE_SEW*/ DEMAND_FALSE,
 			/*NEW_DEMAND_SEW*/ true,
 			/*NEW_DEMAND_LMUL*/ false,
-			/*NEW_DEMAND_RATIO*/ false,
+			/*NEW_DEMAND_RATIO*/ true,
 			/*NEW_DEMAND_GE_SEW*/ true, first_sew,
 			vlmul_for_first_sew_second_ratio, second_ratio)
 DEF_SEW_LMUL_FUSE_RULE (/*SEW*/ DEMAND_TRUE, /*LMUL*/ DEMAND_FALSE,
diff --git a/gcc/config/riscv/riscv-vsetvl.h b/gcc/config/riscv/riscv-vsetvl.h
index 87cdd2e886e..d30d47e170a 100644
--- a/gcc/config/riscv/riscv-vsetvl.h
+++ b/gcc/config/riscv/riscv-vsetvl.h
@@ -167,6 +167,7 @@  public:
   avl_info (rtx, rtl_ssa::set_info *);
   rtx get_value () const { return m_value; }
   rtl_ssa::set_info *get_source () const { return m_source; }
+  void set_source (rtl_ssa::set_info *set) { m_source = set; }
   bool single_source_equal_p (const avl_info &) const;
   bool multiple_source_equal_p (const avl_info &) const;
   avl_info &operator= (const avl_info &);
@@ -225,6 +226,7 @@  public:
   rtx get_avl () const { return m_avl.get_value (); }
   const avl_info &get_avl_info () const { return m_avl; }
   rtl_ssa::set_info *get_avl_source () const { return m_avl.get_source (); }
+  void set_avl_source (rtl_ssa::set_info *set) { m_avl.set_source (set); }
   void set_avl_info (const avl_info &avl) { m_avl = avl; }
   uint8_t get_sew () const { return m_sew; }
   riscv_vector::vlmul_type get_vlmul () const { return m_vlmul; }
@@ -246,31 +248,11 @@  private:
     VALID,
     UNKNOWN,
     EMPTY,
-    /* The empty block can not be polluted as dirty.  */
-    HARD_EMPTY,
 
     /* The block is polluted as containing VSETVL instruction during dem
        backward propagation to gain better LCM optimization even though
        such VSETVL instruction is not really emit yet during this time.  */
     DIRTY,
-    /* The block is polluted with killed AVL.
-       We will backward propagate such case:
-	 bb 0: def a5, 55 (empty).
-	 ...
-	 bb 1: vsetvli zero, a5.
-	 ...
-	 bb 2: empty.
-	 ...
-	 bb 3: def a3, 55 (empty).
-	 ...
-	 bb 4: vsetvli zero, a3.
-
-       To elide vsetvli in bb 4, we need to backward pollute bb 3 and bb 2
-       as DIRTY block as long as there is a block def AVL which has the same
-       source with AVL in bb 4. Such polluted block, we call it as
-       DIRTY_WITH_KILLED_AVL
-    */
-    DIRTY_WITH_KILLED_AVL
   };
 
   enum state_type m_state;
@@ -316,21 +298,12 @@  public:
   bool uninit_p () const { return m_state == UNINITIALIZED; }
   bool valid_p () const { return m_state == VALID; }
   bool unknown_p () const { return m_state == UNKNOWN; }
-  bool empty_p () const { return m_state == EMPTY || m_state == HARD_EMPTY; }
-  bool hard_empty_p () const { return m_state == HARD_EMPTY; }
-  bool dirty_p () const
-  {
-    return m_state == DIRTY || m_state == DIRTY_WITH_KILLED_AVL;
-  }
-  bool dirty_with_killed_avl_p () const
-  {
-    return m_state == DIRTY_WITH_KILLED_AVL;
-  }
+  bool empty_p () const { return m_state == EMPTY; }
+  bool dirty_p () const { return m_state == DIRTY; }
   bool real_dirty_p () const { return m_state == DIRTY; }
   bool valid_or_dirty_p () const
   {
-    return m_state == VALID || m_state == DIRTY
-	   || m_state == DIRTY_WITH_KILLED_AVL;
+    return m_state == VALID || m_state == DIRTY;
   }
   bool available_p (const vector_insn_info &) const;
 
@@ -341,32 +314,10 @@  public:
     return info;
   }
 
-  static vector_insn_info get_hard_empty ()
-  {
-    vector_insn_info info;
-    info.set_hard_empty ();
-    return info;
-  }
-
   void set_valid () { m_state = VALID; }
   void set_unknown () { m_state = UNKNOWN; }
   void set_empty () { m_state = EMPTY; }
-  void set_hard_empty () { m_state = HARD_EMPTY; }
-  void set_dirty (enum fusion_type type)
-  {
-    gcc_assert (type == VALID_AVL_FUSION || type == KILLED_AVL_FUSION);
-    if (type == VALID_AVL_FUSION)
-      m_state = DIRTY;
-    else
-      m_state = DIRTY_WITH_KILLED_AVL;
-  }
-  void set_dirty (bool dirty_with_killed_avl_p)
-  {
-    if (dirty_with_killed_avl_p)
-      m_state = DIRTY_WITH_KILLED_AVL;
-    else
-      m_state = DIRTY;
-  }
+  void set_dirty () { m_state = DIRTY; }
   void set_insn (rtl_ssa::insn_info *insn) { m_insn = insn; }
 
   bool demand_p (enum demand_type type) const { return m_demands[type]; }
@@ -386,7 +337,8 @@  public:
   bool compatible_avl_p (const avl_info &) const;
   bool compatible_vtype_p (const vl_vtype_info &) const;
   bool compatible_p (const vl_vtype_info &) const;
-  vector_insn_info merge (const vector_insn_info &, enum merge_type) const;
+  vector_insn_info merge (const vector_insn_info &, enum merge_type,
+			  unsigned = 0) const;
 
   rtl_ssa::insn_info *get_insn () const { return m_insn; }
   const bool *get_demands (void) const { return m_demands; }
@@ -431,6 +383,9 @@  public:
   sbitmap *vector_comp;
   sbitmap *vector_avin;
   sbitmap *vector_avout;
+  sbitmap *vector_antin;
+  sbitmap *vector_antout;
+  sbitmap *vector_earliest;
 
   vector_infos_manager ();
 
@@ -452,7 +407,7 @@  public:
   /* Return true if all expression set in bitmap are same ratio.  */
   bool all_same_ratio_p (sbitmap) const;
 
-  bool all_empty_predecessor_p (const basic_block) const;
+  bool earliest_fusion_worthwhile_p (const basic_block) const;
   bool all_avail_in_compatible_p (const basic_block) const;
 
   bool to_delete_p (rtx_insn *rinsn)
diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
index 1252d6f851a..f3ce66ccdd4 100644
--- a/gcc/config/riscv/t-riscv
+++ b/gcc/config/riscv/t-riscv
@@ -62,7 +62,8 @@  riscv-vsetvl.o: $(srcdir)/config/riscv/riscv-vsetvl.cc \
   $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(REGS_H) \
   $(TARGET_H) tree-pass.h df.h rtl-ssa.h cfgcleanup.h insn-config.h \
   insn-attr.h insn-opinit.h tm-constrs.h cfgrtl.h cfganal.h lcm.h \
-  predict.h profile-count.h $(srcdir)/config/riscv/riscv-vsetvl.h
+  predict.h profile-count.h $(srcdir)/config/riscv/riscv-vsetvl.h \
+  $(srcdir)/config/riscv/riscv-vsetvl.def
 	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
 		$(srcdir)/config/riscv/riscv-vsetvl.cc
 
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index e772e79057d..6ceae25dbed 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -1363,7 +1363,11 @@ 
   "TARGET_VECTOR"
   "vsetvli\tzero,zero,e%0,%m1,t%p2,m%p3"
   [(set_attr "type" "vsetvl")
-   (set_attr "mode" "SI")])
+   (set_attr "mode" "SI")
+   (set (attr "sew") (symbol_ref "INTVAL (operands[0])"))
+   (set (attr "vlmul") (symbol_ref "INTVAL (operands[1])"))
+   (set (attr "ta") (symbol_ref "INTVAL (operands[2])"))
+   (set (attr "ma") (symbol_ref "INTVAL (operands[3])"))])
 
 ;; vsetvl zero,rs1,vtype instruction.
 ;; The reason we need this pattern since we should avoid setting X0 register
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c
index b4e2ead8ca9..2fb525d8ffc 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c
@@ -7,6 +7,12 @@ 
 int
 main (void)
 {
+  /* FIXME: The purpose of this assembly is to ensure that the vtype register is
+     initialized befor instructions such as vmv1r.v are executed. Otherwise you
+     will get illegal instruction errors when running with spike+pk. This is an
+     interim solution for reduce unnecessary failures and a unified solution
+     will come later. */
+  asm volatile("vsetivli x0, 0, e8, m1, ta, ma");
 #define RUN_LOOP(DATA_TYPE, INDEX_TYPE)                                        \
   DATA_TYPE dest_##DATA_TYPE##_##INDEX_TYPE[202] = {0};                        \
   DATA_TYPE src_##DATA_TYPE##_##INDEX_TYPE[202] = {0};                         \
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-8.c b/gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-8.c
index 3ed0d00d1e9..69736b0da5d 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-8.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-8.c
@@ -14,5 +14,5 @@  void f (void * in, void *out, int32_t x, int n, int m)
   }
 }
 
-/* { dg-final { scan-assembler-times {csrwi\s+vxrm,\s*2\s+vsetivli\s+zero,\s*4,\s*e32,\s*m1,\s*tu,\s*m[au]} 1 } } */
+/* { dg-final { scan-assembler-times {csrwi\s+vxrm,\s*2\s+\.L[0-9]+} 1 } } */
 /* { dg-final { scan-assembler-times {csrwi\s+vxrm,\s*2} 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-9.c b/gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-9.c
index 0939705b2e7..e86b829fe00 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-9.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-9.c
@@ -22,5 +22,5 @@  void f (void * in, void *out, int32_t x, int n, int m)
   }
 }
 
-/* { dg-final { scan-assembler-times {csrwi\s+vxrm,\s*2\s+vsetivli\s+zero,\s*4,\s*e32,\s*m1,\s*tu,\s*m[au]} 1 } } */
+/* { dg-final { scan-assembler-times {csrwi\s+vxrm,\s*2\s+\.L[0-9]+} 1 } } */
 /* { dg-final { scan-assembler-times {csrwi\s+vxrm,\s*2} 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-7.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-7.c
index c1d1986528c..96dc76faa0d 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-7.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-7.c
@@ -37,4 +37,4 @@  void f (void * restrict in, void * restrict out, int l, int n, int m, int cond)
   }
 }
 
-/* { dg-final { scan-assembler {add\s+\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+\s+vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*t[au],\s*m[au]} { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler {add\s+\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+\s+ble\s+[a-x0-9]+,\s*zero,\.L[0-9]+\s+vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*t[au],\s*m[au]} { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-8.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-8.c
index 7ccb7124174..9ada32e3d06 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-8.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-8.c
@@ -36,4 +36,4 @@  void f (void * restrict in, void * restrict out, int l, int n, int m, int cond)
   }
 }
 
-/* { dg-final { scan-assembler {add\s+\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+\s+vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*t[au],\s*m[au]} { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler {add\s+\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+\s+ble\s+[a-x0-9]+,\s*zero,\.L[0-9]+\s+vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*t[au],\s*m[au]} { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-102.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-102.c
index 8236d4e7f18..ae1208cd7aa 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-102.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-102.c
@@ -6,6 +6,7 @@ 
 void f (int8_t* base1,int8_t* base2,int8_t* out,int n)
 {
   vint8mf4_t v = __riscv_vle8_v_i8mf4 (base1, 32);
+  v = __riscv_vle8_v_i8mf4_tu (v, base2 + 100, 32);
   for (int i = 0; i < n; i++){
     v = __riscv_vor_vx_i8mf4 (v, 101, 32);
     v = __riscv_vle8_v_i8mf4_tu (v, base2, 32);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-103.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-103.c
new file mode 100644
index 00000000000..51306fd7a63
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-103.c
@@ -0,0 +1,27 @@ 
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-schedule-insns -fno-schedule-insns2 -fno-tree-vectorize" } */
+
+#include "riscv_vector.h"
+
+void f(void *base, void *out, void *mask_in, size_t m) {
+  vbool64_t mask = *(vbool64_t*)mask_in;
+  size_t vl = 105;
+  vint32mf2_t v0 = __riscv_vle32_v_i32mf2(base + 1000, vl);
+  __riscv_vse32_v_i32mf2 (out + 1000, v0, vl);
+  for (size_t i = 0; i < m; i++) {
+    if (i % 2 == 0) {
+      vint8mf8_t v0 = __riscv_vle8_v_i8mf8(base + i, vl);
+      vint8mf8_t v1 = __riscv_vle8_v_i8mf8_tu(v0, base + i + 100, vl);
+      v1 = __riscv_vadd_vv_i8mf8 (v0,v1,vl);
+      __riscv_vse8_v_i8mf8 (out + i, v1, vl);
+    } else {
+      vint16mf4_t v0 = __riscv_vle16_v_i16mf4(base + i, vl);
+      vint16mf4_t v1 = __riscv_vle16_v_i16mf4_mu(mask, v0, base + i + 100, vl);
+      __riscv_vse16_v_i16mf4 (out + i, v1, vl);
+    }
+  }
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0" no-opts "-Os" no-opts "-O1" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*tu,\s*mu} 1 { target { no-opts "-O0" no-opts "-Os" no-opts "-O1" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-g" no-opts "-funroll-loops" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-14.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-14.c
index b3a1d46b3c0..e464ceb0bd6 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-14.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-14.c
@@ -5,7 +5,7 @@ 
 
 void f (int8_t * restrict in, int8_t * restrict out, int n, int cond)
 {
-  size_t vl = 101;
+  size_t vl = cond + 101;
   
   for (size_t i = 0; i < n; i++)
     {
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-15.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-15.c
index 501e0766c22..e32b60f05f9 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-15.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-15.c
@@ -5,7 +5,7 @@ 
 
 void f (int8_t * restrict in, int8_t * restrict out, int n, int cond)
 {
-  size_t vl = 101;
+  size_t vl = cond + 101;
   
   for (size_t i = 0; i < n; i++)
     {
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-27.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-27.c
index 22004dab18c..0fa72c3d314 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-27.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-27.c
@@ -7,7 +7,7 @@  void f2 (int8_t * restrict in, int8_t * restrict out, int n, int m, unsigned con
 {
   vbool64_t mask = *(vbool64_t*) (in + 1000000);
 
-  vl = 101;
+  vl = vl + 10000;
   if (cond > 0) {
     for (size_t i = 0; i < n; i++)
       {
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-28.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-28.c
index b5b3fda1bab..d5f909168c6 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-28.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-28.c
@@ -7,7 +7,7 @@  void f2 (int8_t * restrict in, int8_t * restrict out, int n, int m, unsigned con
 {
   vbool64_t mask = *(vbool64_t*) (in + 1000000);
 
-  vl = 101;
+  vl = vl + 10000;
   if (cond > 0) {
     vint8mf8_t v = __riscv_vle8_v_i8mf8 (in, vl);
     __riscv_vse8_v_i8mf8 (out, v, vl);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-29.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-29.c
index f6296e0af93..44297d11147 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-29.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-29.c
@@ -7,7 +7,7 @@  void f2 (int8_t * restrict in, int8_t * restrict out, int n, int m, unsigned con
 {
   vbool64_t mask = *(vbool64_t*) (in + 1000000);
 
-  vl = 101;
+  vl = vl + 10000;
   if (cond > 0) {
     vint8mf8_t v = __riscv_vle8_v_i8mf8 (in, vl);
     __riscv_vse8_v_i8mf8 (out, v, vl);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-30.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-30.c
index 687d84dcf8c..92df1783b0b 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-30.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-30.c
@@ -7,7 +7,7 @@  void f (int8_t * restrict in, int8_t * restrict out, int n, int m, unsigned cond
 {
   vbool64_t mask = *(vbool64_t*) (in + 1000000);
 
-  vl = 101;
+  vl = vl + 10000;
   if (cond > 0) {
     vint8mf8_t v = __riscv_vle8_v_i8mf8 (in, vl);
     __riscv_vse8_v_i8mf8 (out, v, vl);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-35.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-35.c
index 28230914cf7..d1daafdee86 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-35.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-35.c
@@ -7,6 +7,7 @@  static int vl = 0x5545515;
 
 void f (int8_t * restrict in, int8_t * restrict out, int n, int cond)
 {
+  vl = vl + 101;
   for (size_t i = 0; i < n; i++)
     {
       vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + 300, vl);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-36.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-36.c
index 3c93675a32d..7db2dc55d09 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-36.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-36.c
@@ -7,19 +7,19 @@  void f (int8_t * restrict in, int8_t * restrict out, int n, int cond)
 {
   for (size_t i = 0; i < n; i++)
     {
-      vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + 300,555);
-      __riscv_vse8_v_i8mf8 (out + i + 300, v,555);
+      vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + 300,555 + cond);
+      __riscv_vse8_v_i8mf8 (out + i + 300, v,555 + cond);
     }
 
   for (size_t i = 0; i < n; i++)
     {
-      vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i,555);
-      __riscv_vse8_v_i8mf8 (out + i, v,555);
+      vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i,555 + cond);
+      __riscv_vse8_v_i8mf8 (out + i, v,555 + cond);
       
-      vint8mf8_t v2 = __riscv_vle8_v_i8mf8_tu (v, in + i + 100,555);
-      __riscv_vse8_v_i8mf8 (out + i + 100, v2,555);
+      vint8mf8_t v2 = __riscv_vle8_v_i8mf8_tu (v, in + i + 100,555 + cond);
+      __riscv_vse8_v_i8mf8 (out + i + 100, v2,555 + cond);
     }
 }
 
 /* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*tu,\s*m[au]} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
-/* { dg-final { scan-assembler-times {vsetvli} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli} 1 { target { no-opts "-O0" no-opts "-Os" no-opts "-g" no-opts "-funroll-loops" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-46.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-46.c
index 1c5ee6a60cc..99fdd67db64 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-46.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-46.c
@@ -3,9 +3,9 @@ 
 
 #include "riscv_vector.h"
 
-void f (int8_t * restrict in, int8_t * restrict out, int n, int cond)
+void f (int8_t * restrict in, int8_t * restrict out, int n, int cond, size_t vl)
 {
-  int vl = 101;
+  vl = 101 + vl;
   if (n > cond) {
     vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + 600, vl);
     vint8mf8_t v2 = __riscv_vle8_v_i8mf8_tu (v, in + 600, vl);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-48.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-48.c
index 79af2ef450a..f1453487ee2 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-48.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-48.c
@@ -28,5 +28,5 @@  void f (int8_t * restrict in, int8_t * restrict out, int n, int n2)
   }
 }
 
-/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*tu,\s*m[au]} 2 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 2 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
 /* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-50.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-50.c
index 1ee2ce3e71a..e7bdb6dd035 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-50.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-50.c
@@ -3,13 +3,14 @@ 
 
 #include "riscv_vector.h"
 
-void f(void *base, void *out, void *mask_in, size_t m) {
+void f(void *base, void *out, void *mask_in, size_t m, size_t vl) {
   vbool64_t mask = *(vbool64_t*)mask_in;
-  size_t vl = 105;
+  vl = 105 + vl;
   for (size_t i = 0; i < m; i++) {
     if (i % 2 == 0) {
       vint8mf8_t v0 = __riscv_vle8_v_i8mf8(base + i, vl);
       vint8mf8_t v1 = __riscv_vle8_v_i8mf8_tu(v0, base + i + 100, vl);
+      v1 = __riscv_vadd_vv_i8mf8 (v0,v1,vl);
       __riscv_vse8_v_i8mf8 (out + i, v1, vl);
     } else {
       vint16mf4_t v0 = __riscv_vle16_v_i16mf4(base + i, vl);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-51.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-51.c
index dc0da57e1eb..2451e7c8dbe 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-51.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-51.c
@@ -3,14 +3,15 @@ 
 
 #include "riscv_vector.h"
 
-void f(void *base, void *out, void *mask_in, size_t m, size_t n) {
+void f(void *base, void *out, void *mask_in, size_t m, size_t n, size_t vl) {
   vbool64_t mask = *(vbool64_t*)mask_in;
-  size_t vl = 106;
+  vl = 106 + vl;
   for (size_t i = 0; i < m; i++) {
     for (size_t j = 0; j < n; j++){
       if ((i + j) % 2 == 0) {
         vint8mf8_t v0 = __riscv_vle8_v_i8mf8(base + i + j, vl);
         vint8mf8_t v1 = __riscv_vle8_v_i8mf8_tu(v0, base + i + j + 100, vl);
+        v1 = __riscv_vadd_vv_i8mf8 (v0,v1,vl);
         __riscv_vse8_v_i8mf8 (out + i + j, v1, vl);
       } else {
         vint16mf4_t v0 = __riscv_vle16_v_i16mf4(base + i + j, vl);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-6.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-6.c
index 7a8163925f8..3d3d71815fd 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-6.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-6.c
@@ -18,5 +18,5 @@  void f (void * restrict in, void * restrict out, int l, int n, int m, size_t vl)
 }
 
 /* { dg-final { scan-assembler-times {vsetvli} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
-/* { dg-final { scan-assembler-times {add\s+\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+\s+vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
-/* { dg-final { scan-assembler-times {add\s+\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+\s+li\s+[a-x0-9]+,0\s+vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-O2" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {add\s+\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+\s+ble\s+[a-x0-9]+,\s*zero,\.L[0-9]+\s+vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
+
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-66.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-66.c
index 77b1d2ac1e4..aac6b52c752 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-66.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-66.c
@@ -17,5 +17,5 @@  void f2 (void * restrict in, void * restrict out, int l, int n, int m, size_t vl
   }
 }
 
-/* { dg-final { scan-assembler-times {add\s+\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+\s+vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {add\s+\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+\s+ble\s+[a-x0-9]+,\s*zero,\.L[0-9]+\s+} 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
 /* { dg-final { scan-assembler-times {vsetvli} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-67.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-67.c
index 8890c32020e..7e77e0eff44 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-67.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-67.c
@@ -21,7 +21,7 @@  void f2 (void * restrict in, void * restrict out, int l, int n, int m)
   }
 }
 
-/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
-/* { dg-final { scan-assembler-times {vsetvli} 3 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 4 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli} 4 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
 /* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*tu,\s*m[au]} 1 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
 /* { dg-final { scan-assembler-times {addi\s+[a-x0-9]+,\s*[a-x0-9]+,\s*44} 1 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-68.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-68.c
index 0a4855a2eea..a366a15f71b 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-68.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-68.c
@@ -21,6 +21,6 @@  void f2 (void * restrict in, void * restrict out, int l, int n, int m)
   }
 }
 
-/* { dg-final { scan-assembler-times {vsetvli} 1 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
-/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*tu,\s*m[au]} 1 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*tu,\s*m[au]} 2 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
 /* { dg-final { scan-assembler-times {addi\s+[a-x0-9]+,\s*[a-x0-9]+,\s*44} 1 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-69.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-69.c
index 37707b00488..633c4afd4f2 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-69.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-69.c
@@ -5,7 +5,7 @@ 
 
 void f (int8_t * restrict in, int8_t * restrict out, int l, int n, int m, size_t cond)
 {
-  size_t vl = 555;
+  size_t vl = 555 + cond;
   for (int i = 0; i < l; i++){
     for (int j = 0; j < m; j++){
       for (int k = 0; k < n; k++)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-70.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-70.c
index c066510336a..8927fb128fb 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-70.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-70.c
@@ -5,7 +5,7 @@ 
 
 void f (int8_t * restrict in, int8_t * restrict out, int l, int n, int m, size_t cond)
 {
-  size_t vl = 555;
+  size_t vl = cond + 555;
   
   if (cond) {
     for (int i = 0; i < l; i++){
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-71.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-71.c
index 8409d06796a..c59cc008e98 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-71.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-71.c
@@ -5,7 +5,7 @@ 
 
 void f (int8_t * restrict in, int8_t * restrict out, int l, int n, int m, size_t cond)
 {
-  size_t vl = 555;
+  size_t vl = in[0] + 555;
   
   if (cond) {
     for (int i = 0; i < l; i++){
@@ -50,5 +50,5 @@  void f (int8_t * restrict in, int8_t * restrict out, int l, int n, int m, size_t
   }
 }
 
-/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*tu,\s*m[au]} 3 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
-/* { dg-final { scan-assembler-times {vsetvli} 3 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*tu,\s*m[au]} 2 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-72.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-72.c
index b1e28abd4fe..45b00f68ba3 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-72.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-72.c
@@ -3,9 +3,9 @@ 
 
 #include "riscv_vector.h"
 
-void f (void * restrict in, void * restrict out, int n, int cond)
+void f (void * restrict in, void * restrict out, int n, int cond, size_t vl)
 {
-  size_t vl = 101;
+  vl = vl + 101;
   for (size_t i = 0; i < n; i++)
     {
       vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i, vl);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-76.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-76.c
index 1b6e818d209..142e43c2baa 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-76.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-76.c
@@ -5,7 +5,7 @@ 
 
 void f (void * restrict in, void * restrict out, int n, int cond)
 {
-  size_t vl = 101;
+  size_t vl = 101 + cond;
   for (size_t i = 0; i < n; i++)
     {
       vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i, vl);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-77.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-77.c
index 9fb16052385..ddd6766e1ef 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-77.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-77.c
@@ -5,7 +5,7 @@ 
 
 void f (void * restrict in, void * restrict out, int n, int cond)
 {
-  size_t vl = 101;
+  size_t vl = 101 + cond;
   for (size_t i = 0; i < n; i++)
     {
       vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i, vl);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-82.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-82.c
index af1f08826cf..17bac422508 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-82.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-82.c
@@ -25,6 +25,6 @@  float f0 (int8_t * restrict in, int8_t * restrict out, int n, int m, unsigned co
 }
 
 /* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*mf2,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-O1" no-opts "-g" no-opts "-funroll-loops" } } } } */
-/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*4,\s*e32,\s*mf2,\s*tu,\s*mu} 2 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-O1" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*4,\s*e32,\s*mf2,\s*tu,\s*mu} 1 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-O1" no-opts "-g" no-opts "-funroll-loops" } } } } */
 /* { dg-final { scan-assembler-times {vsetvli} 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
-/* { dg-final { scan-assembler-times {vsetivli} 2 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {vsetivli} 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-83.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-83.c
index 07263712cdb..c7a3383d02e 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-83.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-83.c
@@ -26,6 +26,6 @@  float f0 (int8_t * restrict in, int8_t * restrict out, int n, int m, unsigned co
 }
 
 /* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*mf2,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-O1" no-opts "-g" no-opts "-funroll-loops" } } } } */
-/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*4,\s*e32,\s*mf2,\s*tu,\s*mu} 2 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-O1" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*4,\s*e32,\s*mf2,\s*tu,\s*mu} 1 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-O1" no-opts "-g" no-opts "-funroll-loops" } } } } */
 /* { dg-final { scan-assembler-times {vsetvli} 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
-/* { dg-final { scan-assembler-times {vsetivli} 2 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {vsetivli} 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-84.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-84.c
index f772af81ec4..b3e90d260e7 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-84.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-84.c
@@ -17,7 +17,7 @@  double f0 (int8_t * restrict in, int8_t * restrict out, int n, int m, unsigned c
 }
 
 /* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*m2,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-O1" no-opts "-g" no-opts "-funroll-loops" } } } } */
-/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*3,\s*e64,\s*m1,\s*t[au],\s*m[au]} 2 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-O1" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*3,\s*e64,\s*m1,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-O1" no-opts "-g" no-opts "-funroll-loops" } } } } */
 /* { dg-final { scan-assembler-times {vsetvli} 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
 /* { dg-final { scan-assembler-times {vsetivli} 2 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-89.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-89.c
index a4ef350afc3..9f850880ae5 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-89.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-89.c
@@ -25,7 +25,7 @@  float f (int8_t * restrict in, int8_t * restrict out, int n, int m, unsigned con
   return __riscv_vfmv_f_s_f32m1_f32 (v);
 }
 
-/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*3,\s*e32,\s*m1,\s*t[au],\s*m[au]} 2 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-O1" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*3,\s*e32,\s*m1,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-O1" no-opts "-g" no-opts "-funroll-loops" } } } } */
 /* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*mf2,\s*tu,\s*mu} 1 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-O1" no-opts "-g" no-opts "-funroll-loops" } } } } */
 /* { dg-final { scan-assembler-times {vsetvli} 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
-/* { dg-final { scan-assembler-times {vsetivli} 2 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {vsetivli} 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-93.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-93.c
index 592e067cfc6..eaed7f1a127 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-93.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-93.c
@@ -16,6 +16,6 @@  float f (int8_t * restrict in, int8_t * restrict out, int n, int m, unsigned con
     *(vfloat32m1_t*)(out + 100000) = v;
 }
 
-/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*3,\s*e64,\s*m1,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-O1" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*3,\s*e64,\s*m1,\s*t[au],\s*m[au]} 2 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-O1" no-opts "-g" no-opts "-funroll-loops" } } } } */
 /* { dg-final { scan-assembler-not {vsetvli} { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
-/* { dg-final { scan-assembler-times {vsetivli} 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {vsetivli} 2 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-94.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-94.c
index 694d591eeae..a2f32dff7c1 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-94.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-94.c
@@ -15,6 +15,6 @@  float f (int8_t * restrict in, int8_t * restrict out, int n, int m, unsigned con
     *(vfloat32m1_t*)(out + 100000) = v;
 }
 
-/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*3,\s*e32,\s*m1,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-O1" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*3,\s*e32,\s*m1,\s*t[au],\s*m[au]} 2 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-O1" no-opts "-g" no-opts "-funroll-loops" } } } } */
 /* { dg-final { scan-assembler-not {vsetvli} { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
-/* { dg-final { scan-assembler-times {vsetivli} 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {vsetivli} 2 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-95.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-95.c
index 22644e76423..5dac25ee59c 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-95.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-95.c
@@ -15,6 +15,6 @@  float f (int8_t * restrict in, int8_t * restrict out, int n, int m, unsigned con
     *(vfloat32m1_t*)(out + 100000) = v;
 }
 
-/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*3,\s*e32,\s*m2,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-O1" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*3,\s*e64,\s*m4,\s*t[au],\s*m[au]} 2 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-O1" no-opts "-g" no-opts "-funroll-loops" } } } } */
 /* { dg-final { scan-assembler-not {vsetvli} { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
-/* { dg-final { scan-assembler-times {vsetivli} 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {vsetivli} 2 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-96.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-96.c
index 0e261d888a4..19516eb271e 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-96.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-96.c
@@ -16,6 +16,6 @@  float f (int8_t * restrict in, int8_t * restrict out, int n, int m, unsigned con
     return __riscv_vfmv_f_s_f32m1_f32 (v);
 }
 
-/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*3,\s*e32,\s*m2,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-O1" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*3,\s*e32,\s*m2,\s*t[au],\s*m[au]} 2 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-O1" no-opts "-g" no-opts "-funroll-loops" } } } } */
 /* { dg-final { scan-assembler-not {vsetvli} { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
-/* { dg-final { scan-assembler-times {vsetivli} 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {vsetivli} 2 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/ffload-5.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/ffload-5.c
index 895180cc54e..04eb78b91c6 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/ffload-5.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/ffload-5.c
@@ -26,4 +26,4 @@  void f (int8_t * restrict in, int8_t * restrict out, int n, int m, int cond)
     }
 }
 
-/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*tu,\s*mu} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*tu,\s*mu} 1 { target { no-opts "-O0" no-opts "-Os" no-opts "-g" no-opts "-funroll-loops" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_bb_prop-3.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_bb_prop-3.c
index dbe6c67ee87..a6894749017 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_bb_prop-3.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_bb_prop-3.c
@@ -10,6 +10,7 @@  void f(void *base, void *out, void *mask_in, size_t vl, size_t m) {
     if (i % 2 == 0) {
       vint8mf8_t v0 = __riscv_vle8_v_i8mf8(base + i, 4);
       vint8mf8_t v1 = __riscv_vle8_v_i8mf8_tu(v0, base + i + 100, 4);
+      v1 = __riscv_vadd_vv_i8mf8 (v0,v1,4);
       __riscv_vse8_v_i8mf8 (out + i, v1, 4);
     } else {
       vint16mf4_t v0 = __riscv_vle16_v_i16mf4(base + i, 4);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_bb_prop-4.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_bb_prop-4.c
index 4fbeffb8b54..13d7c23ec6e 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_bb_prop-4.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_bb_prop-4.c
@@ -11,6 +11,7 @@  void f(void *base, void *out, void *mask_in, size_t vl, size_t m, size_t n) {
       if ((i + j) % 2 == 0) {
         vint8mf8_t v0 = __riscv_vle8_v_i8mf8(base + i + j, 4);
         vint8mf8_t v1 = __riscv_vle8_v_i8mf8_tu(v0, base + i + j + 100, 4);
+        v1 = __riscv_vadd_vv_i8mf8 (v0,v1,4);
         __riscv_vse8_v_i8mf8 (out + i + j, v1, 4);
       } else {
         vint16mf4_t v0 = __riscv_vle16_v_i16mf4(base + i + j, 4);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_bb_prop-9.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_bb_prop-9.c
index 3b486df4fe5..94d08bce54d 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_bb_prop-9.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_bb_prop-9.c
@@ -14,6 +14,7 @@  void f(void *base, void *out, void *mask_in, size_t vl, size_t m, size_t n) {
         } else {
           vint8mf8_t v0 = __riscv_vle8_v_i8mf8(base + i + 300 + j, 4);
           vint8mf8_t v1 = __riscv_vle8_v_i8mf8_tu(v0, base + i + 300 + j, 4);
+          v1 = __riscv_vadd_vv_i8mf8 (v0,v1,4);
           __riscv_vse8_v_i8mf8 (out + i + 300, v1, 4);
         }
       }
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_switch-7.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_switch-7.c
index 8b67dcc216c..af110ec3ba5 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_switch-7.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_switch-7.c
@@ -25,5 +25,5 @@  void f (void * restrict in, void * restrict out, int n)
 }
 
 /* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*5,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
-/* { dg-final { scan-assembler-times {\.L[0-9]+\:\s+vsetivli\s+zero,\s*5,\s*e16,\s*mf4,\s*t[au],\s*m[au]\s+\.L[0-9]+} 1 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {\.L[0-9]+\:\s+vsetivli\s+zero,\s*5,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
 /* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*8,\s*e16,\s*mf2,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_switch-9.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_switch-9.c
index 3825aea16f1..322e2719f3e 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_switch-9.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_switch-9.c
@@ -41,7 +41,5 @@  void f (void * restrict in, void * restrict out, void * restrict mask_in, int n,
     }
 }
 
-/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*5,\s*e16,\s*mf4,\s*tu,\s*mu} 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
-/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*5,\s*e32,\s*mf2,\s*tu,\s*m[au]} 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {vsetivli} 11 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-45.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-45.c
index f6ddacc8f8e..2721ebb5589 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-45.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-45.c
@@ -30,5 +30,4 @@  void foo5_5 (int32_t * restrict in, int32_t * restrict out, size_t n, size_t m,
     }
 }
 
-/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]\s+j\s+\.L[0-9]+} 1 { target { no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */
 /* { dg-final { scan-assembler-times {vsetvli} 1 { target { no-opts "-O0"  no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-1.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-1.c
index 24958def604..20c2193c703 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-1.c
@@ -171,12 +171,12 @@  void f6 (int8_t * restrict in, int8_t * restrict out, int n)
   }
 }
 
-/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" no-opts "-O2" } } } } */
-/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf4,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" no-opts "-O2" } } } } */
-/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf2,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" no-opts "-O2" } } } } */
-/* { dg-final { scan-assembler-times {add\ta[0-7],a[0-7],a[0-7]\s+\.L[0-9][0-9]\:\s+vle8\.v\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\s*\([a-x0-9]+\)} 3 { target { no-opts "-O0"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" no-opts "-O2" } } } } */
-/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" no-opts "-O2" } } } } */
-/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf2,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" no-opts "-O2" } } } } */
-/* { dg-final { scan-assembler-times {add\ta[0-7],a[0-7],a[0-7]\s+\.L[0-9][0-9]\:\s+vle16\.v\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\s*\([a-x0-9]+\)} 2 { target { no-opts "-O0"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" no-opts "-O2" } } } } */
-/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*mf2,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" no-opts "-O2" } } } } */
-/* { dg-final { scan-assembler-times {add\ta[0-7],a[0-7],a[0-7]\s+\.L[0-9][0-9]\:\s+vle32\.v\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\s*\([a-x0-9]+\)} 1 { target { no-opts "-O0"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" no-opts "-O2" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0"  no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" no-opts "-O2" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf4,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0"  no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" no-opts "-O2" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf2,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0"  no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" no-opts "-O2" } } } } */
+/* { dg-final { scan-assembler-times {add\ta[0-7],a[0-7],a[0-7]\s+\.L[0-9][0-9]\:\s+vle8\.v\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\s*\([a-x0-9]+\)} 3 { target { no-opts "-O0"  no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" no-opts "-O2" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0"  no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" no-opts "-O2" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf2,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0"  no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" no-opts "-O2" } } } } */
+/* { dg-final { scan-assembler-times {add\ta[0-7],a[0-7],a[0-7]\s+\.L[0-9][0-9]\:\s+vle16\.v\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\s*\([a-x0-9]+\)} 2 { target { no-opts "-O0"  no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" no-opts "-O2" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*mf2,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0"  no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" no-opts "-O2" } } } } */
+/* { dg-final { scan-assembler-times {add\ta[0-7],a[0-7],a[0-7]\s+\.L[0-9][0-9]\:\s+vle32\.v\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\s*\([a-x0-9]+\)} 1 { target { no-opts "-O0"  no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" no-opts "-O2" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-11.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-11.c
index 4e2a717197b..0de651d6b25 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-11.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-11.c
@@ -39,5 +39,5 @@  void f (int8_t * restrict in, int8_t * restrict out, int n)
   }
 }
 
-/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*mf2,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times {add\ta[0-7],a[0-7],a[0-7]\s+\.L[0-9][0-9]\:\s+vle32\.v\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\s*\([a-x0-9]+\)} 1 { target { no-opts "-O0"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*mf2,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-times {add\ta[0-7],a[0-7],a[0-7]\s+\.L[0-9][0-9]\:\s+vle32\.v\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\s*\([a-x0-9]+\)} 1 { target { no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-3.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-3.c
index ca57ecad7cf..305bedcc025 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-3.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-3.c
@@ -32,4 +32,4 @@  void f (int8_t * restrict in, int8_t * restrict out, int n)
 }
 
 /* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*mf2,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times {add\ta[0-7],a[0-7],a[0-7]\s+\.L[0-9][0-9]\:\s+vle32\.v\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\s*\([a-x0-9]+\)} 1 { target { no-opts "-O0"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-times {add\ta[0-7],a[0-7],a[0-7]\s+\.L[0-9][0-9]\:\s+vle32\.v\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\s*\([a-x0-9]+\)} 1 { target { no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-4.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-4.c
index fc6161edbba..81cdad2ea9d 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-4.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-4.c
@@ -199,12 +199,12 @@  void f7 (int8_t * restrict in, int8_t * restrict out, int n)
   }
 }
 
-/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" no-opts "-O2" } } } } */
-/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf4,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" no-opts "-O2" } } } } */
-/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf2,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" no-opts "-O2" } } } } */
-/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*m1,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" no-opts "-O2" } } } } */
-/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*m2,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" no-opts "-O2" } } } } */
-/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*m4,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" no-opts "-O2" } } } } */
-/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*m8,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" no-opts "-O2" } } } } */
-/* { dg-final { scan-assembler-times {add\ta[0-7],a[0-7],a[0-7]\s+\.L[0-9][0-9]\:\s+vlm\.v\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\s*\([a-x0-9]+\)} 6 { target { no-opts "-O0"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" no-opts "-O2" } } } } */
-/* { dg-final { scan-assembler-times {add\ta[0-7],a[0-7],a[0-7]\s+\.L[0-9][0-9][0-9]\:\s+vlm\.v\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\s*\([a-x0-9]+\)} 1 { target { no-opts "-O0"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" no-opts "-O2" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" no-opts "-O2" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf4,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" no-opts "-O2" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf2,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" no-opts "-O2" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*m1,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" no-opts "-O2" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*m2,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" no-opts "-O2" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*m4,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" no-opts "-O2" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*m8,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" no-opts "-O2" } } } } */
+/* { dg-final { scan-assembler-times {add\ta[0-7],a[0-7],a[0-7]\s+\.L[0-9][0-9]\:\s+vlm\.v\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\s*\([a-x0-9]+\)} 6 { target { no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" no-opts "-O2" } } } } */
+/* { dg-final { scan-assembler-times {add\ta[0-7],a[0-7],a[0-7]\s+\.L[0-9][0-9][0-9]\:\s+vlm\.v\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\s*\([a-x0-9]+\)} 1 { target { no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" no-opts "-O2" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c
index 60ad108666f..b5ba532db09 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c
@@ -20,6 +20,6 @@  void f (int32_t * restrict in, int32_t * restrict out, size_t n, size_t cond, si
     }
 }
 
-/* { dg-final { scan-assembler-times {vsetvli} 4 { target { no-opts "-O0"  no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli} 5 { target { no-opts "-O0"  no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */
 /* { dg-final { scan-assembler-times {j\s+\.L[0-9]+\s+\.L[0-9]+:\s+vlm\.v} 1 { target { no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */
 /* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*m8,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-1.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-1.c
index eebc6c0862e..7648b8a2dc8 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-1.c
@@ -23,4 +23,4 @@  void f (void * restrict in, void * restrict out, int n)
 }
 
 /* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0"  no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-flto" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times {vsetvli} 1 { target { no-opts "-O0" no-opts "-funroll-loops" no-opts "-Os" no-opts "-Oz" no-opts "-flto" no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli} 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-funroll-loops" no-opts "-Os" no-opts "-Oz" no-opts "-flto" no-opts "-g" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-16.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-16.c
index 1ab92df0fdc..24c3dc53764 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-16.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-16.c
@@ -52,7 +52,7 @@  void f (void * restrict in, void * restrict out, int32_t * a, int32_t * b, int n
 }
 
 /* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0"  no-opts "-funroll-loops" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-flto" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*zero,\s*e32,\s*mf2,\s*t[au],\s*m[au]} 2 { target { no-opts "-O0" no-opts "-funroll-loops" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-flto" no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*zero} 2 { target { no-opts "-O0" no-opts "-funroll-loops" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-flto" no-opts "-g" } } } } */
 /* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf2,\s*t[au],\s*m[au]} 2 { target { no-opts "-O0"  no-opts "-funroll-loops" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-flto" no-opts "-g" } } } } */
 /* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e64,\s*m1,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0"  no-opts "-funroll-loops" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-flto" no-opts "-g" } } } } */
 /* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0"  no-opts "-funroll-loops" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-flto" no-opts "-g" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-11.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-11.c
index 3ef0fdcb66d..96bf21f99d6 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-11.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-11.c
@@ -18,4 +18,4 @@  void foo(int32_t *in1, int32_t *in2, int32_t *in3, int32_t *out, size_t n, int c
   }
 }
 
-/* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli} 3 { target { no-opts "-O0" no-opts "-Os" no-opts "-g" no-opts "-funroll-loops" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvlmax-2.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvlmax-2.c
index 482a48314e2..853ffee1414 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvlmax-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvlmax-2.c
@@ -17,5 +17,5 @@  void foo(int32_t *in1, int32_t *in2, int32_t *in3, int32_t *out, size_t n) {
   }
 }
 
-/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*m1,\s*tu,\s*m[au]} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
-/* { dg-final { scan-assembler-times {vsetvli} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*m1,\s*tu,\s*m[au]} 1 { target { no-opts "-O0" no-opts "-Os" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli} 1 { target { no-opts "-O0" no-opts "-Os" no-opts "-g" no-opts "-funroll-loops" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvlmax-4.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvlmax-4.c
index 3b9865e3bab..2a535b5f2a8 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvlmax-4.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvlmax-4.c
@@ -17,5 +17,5 @@  void foo(int32_t *in1, int32_t *in2, int32_t *in3, int32_t *out, size_t n) {
   }
 }
 
-/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*m1,\s*tu,\s*m[au]} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
-/* { dg-final { scan-assembler-times {vsetvli} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*m1,\s*tu,\s*m[au]} 1 { target { no-opts "-O0" no-opts "-Os" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli} 1 { target { no-opts "-O0" no-opts "-Os" no-opts "-g" no-opts "-funroll-loops" } } } } */