diff mbox series

[v2] RISC-V: Support RVV FP16 ZVFH floating-point intrinsic API

Message ID 20230605082043.1707158-1-pan2.li@intel.com
State New
Headers show
Series [v2] RISC-V: Support RVV FP16 ZVFH floating-point intrinsic API | expand

Commit Message

Li, Pan2 via Gcc-patches June 5, 2023, 8:20 a.m. UTC
From: Pan Li <pan2.li@intel.com>

This patch support the intrinsic API of FP16 ZVFH floating-point. Aka
SEW=16 for below instructions:

vfadd vfsub vfrsub vfwadd vfwsub
vfmul vfdiv vfrdiv vfwmul
vfmacc vfnmacc vfmsac vfnmsac vfmadd
vfnmadd vfmsub vfnmsub vfwmacc vfwnmacc vfwmsac vfwnmsac
vfsqrt vfrsqrt7 vfrec7
vfmin vfmax
vfsgnj vfsgnjn vfsgnjx
vmfeq vmfne vmflt vmfle vmfgt vmfge
vfclass vfmerge
vfmv
vfcvt vfwcvt vfncvt

Then users can leverage the instrinsic APIs to perform the FP=16 related
operations. Please note not all the instrinsic APIs are coverred in the
test files, only pick some typical ones due to too many. We will perform
the FP16 related instrinsic API test entirely soon.

Signed-off-by: Pan Li <pan2.li@intel.com>

gcc/ChangeLog:

	* config/riscv/riscv-vector-builtins-types.def
	(vfloat32mf2_t): New type for DEF_RVV_WEXTF_OPS.
	(vfloat32m1_t): Ditto.
	(vfloat32m2_t): Ditto.
	(vfloat32m4_t): Ditto.
	(vfloat32m8_t): Ditto.
	(vint16mf4_t): New type for DEF_RVV_CONVERT_I_OPS.
	(vint16mf2_t): Ditto.
	(vint16m1_t): Ditto.
	(vint16m2_t): Ditto.
	(vint16m4_t): Ditto.
	(vint16m8_t): Ditto.
	(vuint16mf4_t): New type for DEF_RVV_CONVERT_U_OPS.
	(vuint16mf2_t): Ditto.
	(vuint16m1_t): Ditto.
	(vuint16m2_t): Ditto.
	(vuint16m4_t): Ditto.
	(vuint16m8_t): Ditto.
	(vint32mf2_t): New type for DEF_RVV_WCONVERT_I_OPS.
	(vint32m1_t): Ditto.
	(vint32m2_t): Ditto.
	(vint32m4_t): Ditto.
	(vint32m8_t): Ditto.
	(vuint32mf2_t): New type for DEF_RVV_WCONVERT_U_OPS.
	(vuint32m1_t): Ditto.
	(vuint32m2_t): Ditto.
	(vuint32m4_t): Ditto.
	(vuint32m8_t): Ditto.
	* config/riscv/vector-iterators.md: Add FP=16 support for V,
	VWCONVERTI, VCONVERT, VNCONVERT, VMUL1 and vlmul1.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/base/zvfh-intrinsic.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
---
 .../riscv/riscv-vector-builtins-types.def     |  32 ++
 gcc/config/riscv/vector-iterators.md          |  21 +
 .../riscv/rvv/base/zvfh-intrinsic.c           | 418 ++++++++++++++++++
 3 files changed, 471 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-intrinsic.c

Comments

juzhe.zhong@rivai.ai June 5, 2023, 8:26 a.m. UTC | #1
LGTM,



juzhe.zhong@rivai.ai
 
From: pan2.li
Date: 2023-06-05 16:20
To: gcc-patches
CC: juzhe.zhong; kito.cheng; pan2.li; yanzhang.wang
Subject: [PATCH v2] RISC-V: Support RVV FP16 ZVFH floating-point intrinsic API
From: Pan Li <pan2.li@intel.com>
 
This patch support the intrinsic API of FP16 ZVFH floating-point. Aka
SEW=16 for below instructions:
 
vfadd vfsub vfrsub vfwadd vfwsub
vfmul vfdiv vfrdiv vfwmul
vfmacc vfnmacc vfmsac vfnmsac vfmadd
vfnmadd vfmsub vfnmsub vfwmacc vfwnmacc vfwmsac vfwnmsac
vfsqrt vfrsqrt7 vfrec7
vfmin vfmax
vfsgnj vfsgnjn vfsgnjx
vmfeq vmfne vmflt vmfle vmfgt vmfge
vfclass vfmerge
vfmv
vfcvt vfwcvt vfncvt
 
Then users can leverage the instrinsic APIs to perform the FP=16 related
operations. Please note not all the instrinsic APIs are coverred in the
test files, only pick some typical ones due to too many. We will perform
the FP16 related instrinsic API test entirely soon.
 
Signed-off-by: Pan Li <pan2.li@intel.com>
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-types.def
(vfloat32mf2_t): New type for DEF_RVV_WEXTF_OPS.
(vfloat32m1_t): Ditto.
(vfloat32m2_t): Ditto.
(vfloat32m4_t): Ditto.
(vfloat32m8_t): Ditto.
(vint16mf4_t): New type for DEF_RVV_CONVERT_I_OPS.
(vint16mf2_t): Ditto.
(vint16m1_t): Ditto.
(vint16m2_t): Ditto.
(vint16m4_t): Ditto.
(vint16m8_t): Ditto.
(vuint16mf4_t): New type for DEF_RVV_CONVERT_U_OPS.
(vuint16mf2_t): Ditto.
(vuint16m1_t): Ditto.
(vuint16m2_t): Ditto.
(vuint16m4_t): Ditto.
(vuint16m8_t): Ditto.
(vint32mf2_t): New type for DEF_RVV_WCONVERT_I_OPS.
(vint32m1_t): Ditto.
(vint32m2_t): Ditto.
(vint32m4_t): Ditto.
(vint32m8_t): Ditto.
(vuint32mf2_t): New type for DEF_RVV_WCONVERT_U_OPS.
(vuint32m1_t): Ditto.
(vuint32m2_t): Ditto.
(vuint32m4_t): Ditto.
(vuint32m8_t): Ditto.
* config/riscv/vector-iterators.md: Add FP=16 support for V,
VWCONVERTI, VCONVERT, VNCONVERT, VMUL1 and vlmul1.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/zvfh-intrinsic.c: New test.
 
Signed-off-by: Pan Li <pan2.li@intel.com>
---
.../riscv/riscv-vector-builtins-types.def     |  32 ++
gcc/config/riscv/vector-iterators.md          |  21 +
.../riscv/rvv/base/zvfh-intrinsic.c           | 418 ++++++++++++++++++
3 files changed, 471 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-intrinsic.c
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def b/gcc/config/riscv/riscv-vector-builtins-types.def
index 9cb3aca992e..1e2491de6d6 100644
--- a/gcc/config/riscv/riscv-vector-builtins-types.def
+++ b/gcc/config/riscv/riscv-vector-builtins-types.def
@@ -518,11 +518,24 @@ DEF_RVV_FULL_V_U_OPS (vuint64m2_t, RVV_REQUIRE_FULL_V)
DEF_RVV_FULL_V_U_OPS (vuint64m4_t, RVV_REQUIRE_FULL_V)
DEF_RVV_FULL_V_U_OPS (vuint64m8_t, RVV_REQUIRE_FULL_V)
+DEF_RVV_WEXTF_OPS (vfloat32mf2_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_WEXTF_OPS (vfloat32m1_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32)
+DEF_RVV_WEXTF_OPS (vfloat32m2_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32)
+DEF_RVV_WEXTF_OPS (vfloat32m4_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32)
+DEF_RVV_WEXTF_OPS (vfloat32m8_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32)
+
DEF_RVV_WEXTF_OPS (vfloat64m1_t, RVV_REQUIRE_ELEN_FP_64)
DEF_RVV_WEXTF_OPS (vfloat64m2_t, RVV_REQUIRE_ELEN_FP_64)
DEF_RVV_WEXTF_OPS (vfloat64m4_t, RVV_REQUIRE_ELEN_FP_64)
DEF_RVV_WEXTF_OPS (vfloat64m8_t, RVV_REQUIRE_ELEN_FP_64)
+DEF_RVV_CONVERT_I_OPS (vint16mf4_t, TARGET_ZVFH | RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_CONVERT_I_OPS (vint16mf2_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_I_OPS (vint16m1_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_I_OPS (vint16m2_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_I_OPS (vint16m4_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_I_OPS (vint16m8_t, TARGET_ZVFH)
+
DEF_RVV_CONVERT_I_OPS (vint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_CONVERT_I_OPS (vint32m1_t, 0)
DEF_RVV_CONVERT_I_OPS (vint32m2_t, 0)
@@ -533,6 +546,13 @@ DEF_RVV_CONVERT_I_OPS (vint64m2_t, RVV_REQUIRE_ELEN_64)
DEF_RVV_CONVERT_I_OPS (vint64m4_t, RVV_REQUIRE_ELEN_64)
DEF_RVV_CONVERT_I_OPS (vint64m8_t, RVV_REQUIRE_ELEN_64)
+DEF_RVV_CONVERT_U_OPS (vuint16mf4_t, TARGET_ZVFH | RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_CONVERT_U_OPS (vuint16mf2_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_U_OPS (vuint16m1_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_U_OPS (vuint16m2_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_U_OPS (vuint16m4_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_U_OPS (vuint16m8_t, TARGET_ZVFH)
+
DEF_RVV_CONVERT_U_OPS (vuint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_CONVERT_U_OPS (vuint32m1_t, 0)
DEF_RVV_CONVERT_U_OPS (vuint32m2_t, 0)
@@ -543,11 +563,23 @@ DEF_RVV_CONVERT_U_OPS (vuint64m2_t, RVV_REQUIRE_ELEN_64)
DEF_RVV_CONVERT_U_OPS (vuint64m4_t, RVV_REQUIRE_ELEN_64)
DEF_RVV_CONVERT_U_OPS (vuint64m8_t, RVV_REQUIRE_ELEN_64)
+DEF_RVV_WCONVERT_I_OPS (vint32mf2_t, TARGET_ZVFH | RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_WCONVERT_I_OPS (vint32m1_t, TARGET_ZVFH)
+DEF_RVV_WCONVERT_I_OPS (vint32m2_t, TARGET_ZVFH)
+DEF_RVV_WCONVERT_I_OPS (vint32m4_t, TARGET_ZVFH)
+DEF_RVV_WCONVERT_I_OPS (vint32m8_t, TARGET_ZVFH)
+
DEF_RVV_WCONVERT_I_OPS (vint64m1_t, RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_ELEN_64)
DEF_RVV_WCONVERT_I_OPS (vint64m2_t, RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_ELEN_64)
DEF_RVV_WCONVERT_I_OPS (vint64m4_t, RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_ELEN_64)
DEF_RVV_WCONVERT_I_OPS (vint64m8_t, RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_ELEN_64)
+DEF_RVV_WCONVERT_U_OPS (vuint32mf2_t, TARGET_ZVFH | RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_WCONVERT_U_OPS (vuint32m1_t, TARGET_ZVFH)
+DEF_RVV_WCONVERT_U_OPS (vuint32m2_t, TARGET_ZVFH)
+DEF_RVV_WCONVERT_U_OPS (vuint32m4_t, TARGET_ZVFH)
+DEF_RVV_WCONVERT_U_OPS (vuint32m8_t, TARGET_ZVFH)
+
DEF_RVV_WCONVERT_U_OPS (vuint64m1_t, RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_ELEN_64)
DEF_RVV_WCONVERT_U_OPS (vuint64m2_t, RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_ELEN_64)
DEF_RVV_WCONVERT_U_OPS (vuint64m4_t, RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_ELEN_64)
diff --git a/gcc/config/riscv/vector-iterators.md b/gcc/config/riscv/vector-iterators.md
index 90743ed76c5..e4f2ba90799 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -296,6 +296,14 @@ (define_mode_iterator VWI_ZVE32 [
])
(define_mode_iterator VF [
+  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
+  (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
+  (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
+
   (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
   (VNx2SF "TARGET_VECTOR_ELEN_FP_32")
   (VNx4SF "TARGET_VECTOR_ELEN_FP_32")
@@ -496,6 +504,13 @@ (define_mode_iterator VWEXTF [
])
(define_mode_iterator VWCONVERTI [
+  (VNx1SI "TARGET_MIN_VLEN < 128 && TARGET_VECTOR_ELEN_FP_16")
+  (VNx2SI "TARGET_VECTOR_ELEN_FP_16")
+  (VNx4SI "TARGET_VECTOR_ELEN_FP_16")
+  (VNx8SI "TARGET_VECTOR_ELEN_FP_16")
+  (VNx16SI "TARGET_MIN_VLEN > 32 && TARGET_VECTOR_ELEN_FP_16")
+  (VNx32SI "TARGET_MIN_VLEN >= 128 && TARGET_VECTOR_ELEN_FP_16")
+
   (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
   (VNx2DI "TARGET_VECTOR_ELEN_64 && TARGET_VECTOR_ELEN_FP_32")
   (VNx4DI "TARGET_VECTOR_ELEN_64 && TARGET_VECTOR_ELEN_FP_32")
@@ -1239,17 +1254,21 @@ (define_mode_attr VINDEX_OCT_EXT [
])
(define_mode_attr VCONVERT [
+  (VNx1HF "VNx1HI") (VNx2HF "VNx2HI") (VNx4HF "VNx4HI") (VNx8HF "VNx8HI") (VNx16HF "VNx16HI") (VNx32HF "VNx32HI") (VNx64HF "VNx64HI")
   (VNx1SF "VNx1SI") (VNx2SF "VNx2SI") (VNx4SF "VNx4SI") (VNx8SF "VNx8SI") (VNx16SF "VNx16SI") (VNx32SF "VNx32SI")
   (VNx1DF "VNx1DI") (VNx2DF "VNx2DI") (VNx4DF "VNx4DI") (VNx8DF "VNx8DI") (VNx16DF "VNx16DI")
])
(define_mode_attr vconvert [
+  (VNx1HF "vnx1hi") (VNx2HF "vnx2hi") (VNx4HF "vnx4hi") (VNx8HF "vnx8hi") (VNx16HF "vnx16hi") (VNx32HF "vnx32hi") (VNx64HF "vnx64hi")
   (VNx1SF "vnx1si") (VNx2SF "vnx2si") (VNx4SF "vnx4si") (VNx8SF "vnx8si") (VNx16SF "vnx16si") (VNx32SF "vnx32si")
   (VNx1DF "vnx1di") (VNx2DF "vnx2di") (VNx4DF "vnx4di") (VNx8DF "vnx8di") (VNx16DF "vnx16di")
])
(define_mode_attr VNCONVERT [
+  (VNx1HF "VNx1QI") (VNx2HF "VNx2QI") (VNx4HF "VNx4QI") (VNx8HF "VNx8QI") (VNx16HF "VNx16QI") (VNx32HF "VNx32QI") (VNx64HF "VNx64QI")
   (VNx1SF "VNx1HI") (VNx2SF "VNx2HI") (VNx4SF "VNx4HI") (VNx8SF "VNx8HI") (VNx16SF "VNx16HI") (VNx32SF "VNx32HI")
+  (VNx1SI "VNx1HF") (VNx2SI "VNx2HF") (VNx4SI "VNx4HF") (VNx8SI "VNx8HF") (VNx16SI "VNx16HF") (VNx32SI "VNx32HF")
   (VNx1DI "VNx1SF") (VNx2DI "VNx2SF") (VNx4DI "VNx4SF") (VNx8DI "VNx8SF") (VNx16DI "VNx16SF")
   (VNx1DF "VNx1SI") (VNx2DF "VNx2SI") (VNx4DF "VNx4SI") (VNx8DF "VNx8SI") (VNx16DF "VNx16SI")
])
@@ -1263,6 +1282,7 @@ (define_mode_attr VLMUL1 [
   (VNx8SI "VNx4SI") (VNx16SI "VNx4SI") (VNx32SI "VNx4SI")
   (VNx1DI "VNx2DI") (VNx2DI "VNx2DI")
   (VNx4DI "VNx2DI") (VNx8DI "VNx2DI") (VNx16DI "VNx2DI")
+  (VNx1HF "VNx8HF") (VNx2HF "VNx8HF") (VNx4HF "VNx8HF") (VNx8HF "VNx8HF") (VNx16HF "VNx8HF") (VNx32HF "VNx8HF") (VNx64HF "VNx8HF")
   (VNx1SF "VNx4SF") (VNx2SF "VNx4SF")
   (VNx4SF "VNx4SF") (VNx8SF "VNx4SF") (VNx16SF "VNx4SF") (VNx32SF "VNx4SF")
   (VNx1DF "VNx2DF") (VNx2DF "VNx2DF")
@@ -1333,6 +1353,7 @@ (define_mode_attr vlmul1 [
   (VNx8SI "vnx4si") (VNx16SI "vnx4si") (VNx32SI "vnx4si")
   (VNx1DI "vnx2di") (VNx2DI "vnx2di")
   (VNx4DI "vnx2di") (VNx8DI "vnx2di") (VNx16DI "vnx2di")
+  (VNx1HF "vnx8hf") (VNx2HF "vnx8hf") (VNx4HF "vnx8hf") (VNx8HF "vnx8hf") (VNx16HF "vnx8hf") (VNx32HF "vnx8hf") (VNx64HF "vnx8hf")
   (VNx1SF "vnx4sf") (VNx2SF "vnx4sf")
   (VNx4SF "vnx4sf") (VNx8SF "vnx4sf") (VNx16SF "vnx4sf") (VNx32SF "vnx4sf")
   (VNx1DF "vnx2df") (VNx2DF "vnx2df")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-intrinsic.c b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-intrinsic.c
new file mode 100644
index 00000000000..0d244aac9ec
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-intrinsic.c
@@ -0,0 +1,418 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+typedef _Float16 float16_t;
+
+vfloat16mf4_t test_vfadd_vv_f16mf4(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vfadd_vv_f16mf4(op1, op2, vl);
+}
+
+vfloat16m8_t test_vfadd_vf_f16m8(vfloat16m8_t op1, float16_t op2, size_t vl) {
+  return __riscv_vfadd_vf_f16m8(op1, op2, vl);
+}
+
+vfloat16mf4_t test_vfsub_vv_f16mf4(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vfsub_vv_f16mf4(op1, op2, vl);
+}
+
+vfloat16m8_t test_vfsub_vf_f16m8(vfloat16m8_t op1, float16_t op2, size_t vl) {
+  return __riscv_vfsub_vf_f16m8(op1, op2, vl);
+}
+
+vfloat16mf4_t test_vfrsub_vf_f16mf4(vfloat16mf4_t op1, float16_t op2, size_t vl) {
+  return __riscv_vfrsub_vf_f16mf4(op1, op2, vl);
+}
+
+vfloat16m8_t test_vfrsub_vf_f16m8(vfloat16m8_t op1, float16_t op2, size_t vl) {
+  return __riscv_vfrsub_vf_f16m8(op1, op2, vl);
+}
+
+vfloat32mf2_t test_vfwadd_vv_f32mf2(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vfwadd_vv_f32mf2(op1, op2, vl);
+}
+
+vfloat32m8_t test_vfwadd_vv_f32m8(vfloat16m4_t op1, vfloat16m4_t op2, size_t vl) {
+  return __riscv_vfwadd_vv_f32m8(op1, op2, vl);
+}
+
+vfloat32mf2_t test_vfwadd_wv_f32mf2(vfloat32mf2_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vfwadd_wv_f32mf2(op1, op2, vl);
+}
+
+vfloat32m8_t test_vfwadd_wv_f32m8(vfloat32m8_t op1, vfloat16m4_t op2, size_t vl) {
+  return __riscv_vfwadd_wv_f32m8(op1, op2, vl);
+}
+
+vfloat32mf2_t test_vfwsub_vv_f32mf2(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vfwsub_vv_f32mf2(op1, op2, vl);
+}
+
+vfloat32m8_t test_vfwsub_vv_f32m8(vfloat16m4_t op1, vfloat16m4_t op2, size_t vl) {
+  return __riscv_vfwsub_vv_f32m8(op1, op2, vl);
+}
+
+vfloat32mf2_t test_vfwsub_wv_f32mf2(vfloat32mf2_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vfwsub_wv_f32mf2(op1, op2, vl);
+}
+
+vfloat32m8_t test_vfwsub_wv_f32m8(vfloat32m8_t op1, vfloat16m4_t op2, size_t vl) {
+  return __riscv_vfwsub_wv_f32m8(op1, op2, vl);
+}
+
+vfloat16mf4_t test_vfmul_vv_f16mf4(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vfmul_vv_f16mf4(op1, op2, vl);
+}
+
+vfloat16m8_t test_vfmul_vf_f16m8(vfloat16m8_t op1, float16_t op2, size_t vl) {
+  return __riscv_vfmul_vf_f16m8(op1, op2, vl);
+}
+
+vfloat16mf4_t test_vfdiv_vv_f16mf4(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vfdiv_vv_f16mf4(op1, op2, vl);
+}
+
+vfloat16m8_t test_vfdiv_vf_f16m8(vfloat16m8_t op1, float16_t op2, size_t vl) {
+  return __riscv_vfdiv_vf_f16m8(op1, op2, vl);
+}
+
+vfloat16mf4_t test_vfrdiv_vf_f16mf4(vfloat16mf4_t op1, float16_t op2, size_t vl) {
+  return __riscv_vfrdiv_vf_f16mf4(op1, op2, vl);
+}
+
+vfloat16m8_t test_vfrdiv_vf_f16m8(vfloat16m8_t op1, float16_t op2, size_t vl) {
+  return __riscv_vfrdiv_vf_f16m8(op1, op2, vl);
+}
+
+vfloat32mf2_t test_vfwmul_vv_f32mf2(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vfwmul_vv_f32mf2(op1, op2, vl);
+}
+
+vfloat32m8_t test_vfwmul_vf_f32m8(vfloat16m4_t op1, float16_t op2, size_t vl) {
+  return __riscv_vfwmul_vf_f32m8(op1, op2, vl);
+}
+
+vfloat16mf4_t test_vfmacc_vv_f16mf4(vfloat16mf4_t vd, vfloat16mf4_t vs1, vfloat16mf4_t vs2, size_t vl) {
+  return __riscv_vfmacc_vv_f16mf4(vd, vs1, vs2, vl);
+}
+
+vfloat16m8_t test_vfmacc_vf_f16m8(vfloat16m8_t vd, float16_t rs1, vfloat16m8_t vs2, size_t vl) {
+  return __riscv_vfmacc_vf_f16m8(vd, rs1, vs2, vl);
+}
+
+vfloat16mf4_t test_vfnmacc_vv_f16mf4(vfloat16mf4_t vd, vfloat16mf4_t vs1, vfloat16mf4_t vs2, size_t vl) {
+  return __riscv_vfnmacc_vv_f16mf4(vd, vs1, vs2, vl);
+}
+
+vfloat16m8_t test_vfnmacc_vf_f16m8(vfloat16m8_t vd, float16_t rs1, vfloat16m8_t vs2, size_t vl) {
+  return __riscv_vfnmacc_vf_f16m8(vd, rs1, vs2, vl);
+}
+
+vfloat16mf4_t test_vfmsac_vv_f16mf4(vfloat16mf4_t vd, vfloat16mf4_t vs1, vfloat16mf4_t vs2, size_t vl) {
+  return __riscv_vfmsac_vv_f16mf4(vd, vs1, vs2, vl);
+}
+
+vfloat16m8_t test_vfmsac_vf_f16m8(vfloat16m8_t vd, float16_t rs1, vfloat16m8_t vs2, size_t vl) {
+  return __riscv_vfmsac_vf_f16m8(vd, rs1, vs2, vl);
+}
+
+vfloat16mf4_t test_vfnmsac_vv_f16mf4(vfloat16mf4_t vd, vfloat16mf4_t vs1, vfloat16mf4_t vs2, size_t vl) {
+  return __riscv_vfnmsac_vv_f16mf4(vd, vs1, vs2, vl);
+}
+
+vfloat16m8_t test_vfnmsac_vf_f16m8(vfloat16m8_t vd, float16_t rs1, vfloat16m8_t vs2, size_t vl) {
+  return __riscv_vfnmsac_vf_f16m8(vd, rs1, vs2, vl);
+}
+
+vfloat16mf4_t test_vfmadd_vv_f16mf4(vfloat16mf4_t vd, vfloat16mf4_t vs1, vfloat16mf4_t vs2, size_t vl) {
+  return __riscv_vfmadd_vv_f16mf4(vd, vs1, vs2, vl);
+}
+
+vfloat16m8_t test_vfmadd_vf_f16m8(vfloat16m8_t vd, float16_t rs1, vfloat16m8_t vs2, size_t vl) {
+  return __riscv_vfmadd_vf_f16m8(vd, rs1, vs2, vl);
+}
+
+vfloat16mf4_t test_vfnmadd_vv_f16mf4(vfloat16mf4_t vd, vfloat16mf4_t vs1, vfloat16mf4_t vs2, size_t vl) {
+  return __riscv_vfnmadd_vv_f16mf4(vd, vs1, vs2, vl);
+}
+
+vfloat16m8_t test_vfnmadd_vf_f16m8(vfloat16m8_t vd, float16_t rs1, vfloat16m8_t vs2, size_t vl) {
+  return __riscv_vfnmadd_vf_f16m8(vd, rs1, vs2, vl);
+}
+
+vfloat16mf4_t test_vfmsub_vv_f16mf4(vfloat16mf4_t vd, vfloat16mf4_t vs1, vfloat16mf4_t vs2, size_t vl) {
+  return __riscv_vfmsub_vv_f16mf4(vd, vs1, vs2, vl);
+}
+
+vfloat16m8_t test_vfmsub_vf_f16m8(vfloat16m8_t vd, float16_t rs1, vfloat16m8_t vs2, size_t vl) {
+  return __riscv_vfmsub_vf_f16m8(vd, rs1, vs2, vl);
+}
+
+vfloat16mf4_t test_vfnmsub_vv_f16mf4(vfloat16mf4_t vd, vfloat16mf4_t vs1, vfloat16mf4_t vs2, size_t vl) {
+  return __riscv_vfnmsub_vv_f16mf4(vd, vs1, vs2, vl);
+}
+
+vfloat16m8_t test_vfnmsub_vf_f16m8(vfloat16m8_t vd, float16_t rs1, vfloat16m8_t vs2, size_t vl) {
+  return __riscv_vfnmsub_vf_f16m8(vd, rs1, vs2, vl);
+}
+
+vfloat32mf2_t test_vfwmacc_vv_f32mf2(vfloat32mf2_t vd, vfloat16mf4_t vs1, vfloat16mf4_t vs2, size_t vl) {
+  return __riscv_vfwmacc_vv_f32mf2(vd, vs1, vs2, vl);
+}
+
+vfloat32m8_t test_vfwmacc_vf_f32m8(vfloat32m8_t vd, float16_t vs1, vfloat16m4_t vs2, size_t vl) {
+  return __riscv_vfwmacc_vf_f32m8(vd, vs1, vs2, vl);
+}
+
+vfloat32mf2_t test_vfwnmacc_vv_f32mf2(vfloat32mf2_t vd, vfloat16mf4_t vs1, vfloat16mf4_t vs2, size_t vl) {
+  return __riscv_vfwnmacc_vv_f32mf2(vd, vs1, vs2, vl);
+}
+
+vfloat32m8_t test_vfwnmacc_vf_f32m8(vfloat32m8_t vd, float16_t vs1, vfloat16m4_t vs2, size_t vl) {
+  return __riscv_vfwnmacc_vf_f32m8(vd, vs1, vs2, vl);
+}
+
+vfloat32mf2_t test_vfwmsac_vv_f32mf2(vfloat32mf2_t vd, vfloat16mf4_t vs1, vfloat16mf4_t vs2, size_t vl) {
+  return __riscv_vfwmsac_vv_f32mf2(vd, vs1, vs2, vl);
+}
+
+vfloat32m8_t test_vfwmsac_vf_f32m8(vfloat32m8_t vd, float16_t vs1, vfloat16m4_t vs2, size_t vl) {
+  return __riscv_vfwmsac_vf_f32m8(vd, vs1, vs2, vl);
+}
+
+vfloat32mf2_t test_vfwnmsac_vv_f32mf2(vfloat32mf2_t vd, vfloat16mf4_t vs1, vfloat16mf4_t vs2, size_t vl) {
+  return __riscv_vfwnmsac_vv_f32mf2(vd, vs1, vs2, vl);
+}
+
+vfloat32m8_t test_vfwnmsac_vf_f32m8(vfloat32m8_t vd, float16_t vs1, vfloat16m4_t vs2, size_t vl) {
+  return __riscv_vfwnmsac_vf_f32m8(vd, vs1, vs2, vl);
+}
+
+vfloat16mf4_t test_vfsqrt_v_f16mf4(vfloat16mf4_t op1, size_t vl) {
+  return __riscv_vfsqrt_v_f16mf4(op1, vl);
+}
+
+vfloat16m8_t test_vfsqrt_v_f16m8(vfloat16m8_t op1, size_t vl) {
+  return __riscv_vfsqrt_v_f16m8(op1, vl);
+}
+
+vfloat16mf4_t test_vfrsqrt7_v_f16mf4(vfloat16mf4_t op1, size_t vl) {
+  return __riscv_vfrsqrt7_v_f16mf4(op1, vl);
+}
+
+vfloat16m8_t test_vfrsqrt7_v_f16m8(vfloat16m8_t op1, size_t vl) {
+  return __riscv_vfrsqrt7_v_f16m8(op1, vl);
+}
+
+vfloat16mf4_t test_vfrec7_v_f16mf4(vfloat16mf4_t op1, size_t vl) {
+  return __riscv_vfrec7_v_f16mf4(op1, vl);
+}
+
+vfloat16m8_t test_vfrec7_v_f16m8(vfloat16m8_t op1, size_t vl) {
+  return __riscv_vfrec7_v_f16m8(op1, vl);
+}
+
+vfloat16mf4_t test_vfmin_vv_f16mf4(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vfmin_vv_f16mf4(op1, op2, vl);
+}
+
+vfloat16m8_t test_vfmin_vf_f16m8(vfloat16m8_t op1, float16_t op2, size_t vl) {
+  return __riscv_vfmin_vf_f16m8(op1, op2, vl);
+}
+
+vfloat16mf4_t test_vfmax_vv_f16mf4(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vfmax_vv_f16mf4(op1, op2, vl);
+}
+
+vfloat16m8_t test_vfmax_vf_f16m8(vfloat16m8_t op1, float16_t op2, size_t vl) {
+  return __riscv_vfmax_vf_f16m8(op1, op2, vl);
+}
+
+vfloat16mf4_t test_vfsgnj_vv_f16mf4(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vfsgnj_vv_f16mf4(op1, op2, vl);
+}
+
+vfloat16m8_t test_vfsgnj_vf_f16m8(vfloat16m8_t op1, float16_t op2, size_t vl) {
+  return __riscv_vfsgnj_vf_f16m8(op1, op2, vl);
+}
+
+vfloat16mf4_t test_vfsgnjn_vv_f16mf4(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vfsgnjn_vv_f16mf4(op1, op2, vl);
+}
+
+vfloat16m8_t test_vfsgnjn_vf_f16m8(vfloat16m8_t op1, float16_t op2, size_t vl) {
+  return __riscv_vfsgnjn_vf_f16m8(op1, op2, vl);
+}
+
+vfloat16mf4_t test_vfsgnjx_vv_f16mf4(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vfsgnjx_vv_f16mf4(op1, op2, vl);
+}
+
+vfloat16m8_t test_vfsgnjx_vf_f16m8(vfloat16m8_t op1, float16_t op2, size_t vl) {
+  return __riscv_vfsgnjx_vf_f16m8(op1, op2, vl);
+}
+
+vbool64_t test_vmfeq_vv_f16mf4_b64(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vmfeq_vv_f16mf4_b64(op1, op2, vl);
+}
+
+vbool2_t test_vmfeq_vf_f16m8_b2(vfloat16m8_t op1, float16_t op2, size_t vl) {
+  return __riscv_vmfeq_vf_f16m8_b2(op1, op2, vl);
+}
+
+vbool64_t test_vmfne_vv_f16mf4_b64(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vmfne_vv_f16mf4_b64(op1, op2, vl);
+}
+
+vbool2_t test_vmfne_vf_f16m8_b2(vfloat16m8_t op1, float16_t op2, size_t vl) {
+  return __riscv_vmfne_vf_f16m8_b2(op1, op2, vl);
+}
+
+vbool64_t test_vmflt_vv_f16mf4_b64(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vmflt_vv_f16mf4_b64(op1, op2, vl);
+}
+
+vbool2_t test_vmflt_vf_f16m8_b2(vfloat16m8_t op1, float16_t op2, size_t vl) {
+  return __riscv_vmflt_vf_f16m8_b2(op1, op2, vl);
+}
+
+vbool64_t test_vmfle_vv_f16mf4_b64(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vmfle_vv_f16mf4_b64(op1, op2, vl);
+}
+
+vbool2_t test_vmfle_vf_f16m8_b2(vfloat16m8_t op1, float16_t op2, size_t vl) {
+  return __riscv_vmfle_vf_f16m8_b2(op1, op2, vl);
+}
+
+vbool64_t test_vmfgt_vv_f16mf4_b64(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vmfgt_vv_f16mf4_b64(op1, op2, vl);
+}
+
+vbool2_t test_vmfgt_vf_f16m8_b2(vfloat16m8_t op1, float16_t op2, size_t vl) {
+  return __riscv_vmfgt_vf_f16m8_b2(op1, op2, vl);
+}
+
+vbool64_t test_vmfge_vv_f16mf4_b64(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vmfge_vv_f16mf4_b64(op1, op2, vl);
+}
+
+vbool2_t test_vmfge_vf_f16m8_b2(vfloat16m8_t op1, float16_t op2, size_t vl) {
+  return __riscv_vmfge_vf_f16m8_b2(op1, op2, vl);
+}
+
+vuint16mf4_t test_vfclass_v_u16mf4(vfloat16mf4_t op1, size_t vl) {
+  return __riscv_vfclass_v_u16mf4(op1, vl);
+}
+
+vuint16m8_t test_vfclass_v_u16m8(vfloat16m8_t op1, size_t vl) {
+  return __riscv_vfclass_v_u16m8(op1, vl);
+}
+
+vfloat16mf4_t test_vfmerge_vfm_f16mf4(vfloat16mf4_t op1, float16_t op2, vbool64_t mask, size_t vl) {
+  return __riscv_vfmerge_vfm_f16mf4(op1, op2, mask, vl);
+}
+
+vfloat16m8_t test_vfmerge_vfm_f16m8(vfloat16m8_t op1, float16_t op2, vbool2_t mask, size_t vl) {
+  return __riscv_vfmerge_vfm_f16m8(op1, op2, mask, vl);
+}
+
+vfloat16mf4_t test_vfmv_v_f_f16mf4(float16_t src, size_t vl) {
+  return __riscv_vfmv_v_f_f16mf4(src, vl);
+}
+
+vfloat16m8_t test_vfmv_v_f_f16m8(float16_t src, size_t vl) {
+  return __riscv_vfmv_v_f_f16m8(src, vl);
+}
+
+vint16mf4_t test_vfcvt_x_f_v_i16mf4(vfloat16mf4_t src, size_t vl) {
+  return __riscv_vfcvt_x_f_v_i16mf4(src, vl);
+}
+
+vuint16m8_t test_vfcvt_xu_f_v_u16m8(vfloat16m8_t src, size_t vl) {
+  return __riscv_vfcvt_xu_f_v_u16m8(src, vl);
+}
+
+vfloat16mf4_t test_vfcvt_f_x_v_f16mf4(vint16mf4_t src, size_t vl) {
+  return __riscv_vfcvt_f_x_v_f16mf4(src, vl);
+}
+
+vfloat16m8_t test_vfcvt_f_xu_v_f16m8(vuint16m8_t src, size_t vl) {
+  return __riscv_vfcvt_f_xu_v_f16m8(src, vl);
+}
+
+vint16mf4_t test_vfcvt_rtz_x_f_v_i16mf4(vfloat16mf4_t src, size_t vl) {
+  return __riscv_vfcvt_rtz_x_f_v_i16mf4(src, vl);
+}
+
+vuint16m8_t test_vfcvt_rtz_xu_f_v_u16m8(vfloat16m8_t src, size_t vl) {
+  return __riscv_vfcvt_rtz_xu_f_v_u16m8(src, vl);
+}
+
+vfloat16mf4_t test_vfwcvt_f_x_v_f16mf4(vint8mf8_t src, size_t vl) {
+  return __riscv_vfwcvt_f_x_v_f16mf4(src, vl);
+}
+
+vuint32m8_t test_vfwcvt_xu_f_v_u32m8(vfloat16m4_t src, size_t vl) {
+  return __riscv_vfwcvt_xu_f_v_u32m8(src, vl);
+}
+
+vint8mf8_t test_vfncvt_x_f_w_i8mf8(vfloat16mf4_t src, size_t vl) {
+  return __riscv_vfncvt_x_f_w_i8mf8(src, vl);
+}
+
+vfloat16m4_t test_vfncvt_f_xu_w_f16m4(vuint32m8_t src, size_t vl) {
+  return __riscv_vfncvt_f_xu_w_f16m4(src, vl);
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 43 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*m4,\s*t[au],\s*m[au]} 11 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*m8,\s*t[au],\s*m[au]} 34 } } */
+/* { dg-final { scan-assembler-times {vfadd\.v[fv]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfsub\.v[fv]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfrsub\.vf\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfwadd\.[wv]v\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vfwsub\.[wv]v\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vfmul\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfdiv\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfrdiv\.vf\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfwmul\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfmacc\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfnmacc\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfmsac\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfnmsac\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfmadd\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfnmadd\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfmsub\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfnmsub\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfwmacc\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfwnmacc\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfwmsac\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfwnmsac\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfsqrt\.v\s+v[0-9]+,\s*v[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfrsqrt7\.v\s+v[0-9]+,\s*v[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfrec7\.v\s+v[0-9]+,\s*v[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfmin\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfmax\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfsgnj\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfsgnjn\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfsgnjx\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vmfeq\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vmfne\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vmflt\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vmfle\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vmfgt\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vmfge\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfclass\.v\s+v[0-9]+,\s*v[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfmerge\.vfm\s+v[0-9]+,\s*v[0-9]+,\s*fa[0-9]+,\s*v0} 2 } } */
+/* { dg-final { scan-assembler-times {vfmv\.v\.f\s+v[0-9]+,\s*fa[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vfcvt\.xu\.f\.v\s+v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vfcvt\.rtz\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vfcvt\.rtz\.xu\.f\.v\s+v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vfwcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vfwcvt\.xu\.f\.v\s+v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vfncvt\.x\.f\.w\s+v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vfncvt\.f\.xu\.w\s+v[0-9]+,\s*v[0-9]+} 1 } } */
Kito Cheng June 5, 2023, 8:47 a.m. UTC | #2
LGTM too, thanks :)

On Mon, Jun 5, 2023 at 4:27 PM juzhe.zhong@rivai.ai
<juzhe.zhong@rivai.ai> wrote:
>
> LGTM,
>
> ________________________________
> juzhe.zhong@rivai.ai
>
>
> From: pan2.li
> Date: 2023-06-05 16:20
> To: gcc-patches
> CC: juzhe.zhong; kito.cheng; pan2.li; yanzhang.wang
> Subject: [PATCH v2] RISC-V: Support RVV FP16 ZVFH floating-point intrinsic API
> From: Pan Li <pan2.li@intel.com>
>
> This patch support the intrinsic API of FP16 ZVFH floating-point. Aka
> SEW=16 for below instructions:
>
> vfadd vfsub vfrsub vfwadd vfwsub
> vfmul vfdiv vfrdiv vfwmul
> vfmacc vfnmacc vfmsac vfnmsac vfmadd
> vfnmadd vfmsub vfnmsub vfwmacc vfwnmacc vfwmsac vfwnmsac
> vfsqrt vfrsqrt7 vfrec7
> vfmin vfmax
> vfsgnj vfsgnjn vfsgnjx
> vmfeq vmfne vmflt vmfle vmfgt vmfge
> vfclass vfmerge
> vfmv
> vfcvt vfwcvt vfncvt
>
> Then users can leverage the instrinsic APIs to perform the FP=16 related
> operations. Please note not all the instrinsic APIs are coverred in the
> test files, only pick some typical ones due to too many. We will perform
> the FP16 related instrinsic API test entirely soon.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins-types.def
> (vfloat32mf2_t): New type for DEF_RVV_WEXTF_OPS.
> (vfloat32m1_t): Ditto.
> (vfloat32m2_t): Ditto.
> (vfloat32m4_t): Ditto.
> (vfloat32m8_t): Ditto.
> (vint16mf4_t): New type for DEF_RVV_CONVERT_I_OPS.
> (vint16mf2_t): Ditto.
> (vint16m1_t): Ditto.
> (vint16m2_t): Ditto.
> (vint16m4_t): Ditto.
> (vint16m8_t): Ditto.
> (vuint16mf4_t): New type for DEF_RVV_CONVERT_U_OPS.
> (vuint16mf2_t): Ditto.
> (vuint16m1_t): Ditto.
> (vuint16m2_t): Ditto.
> (vuint16m4_t): Ditto.
> (vuint16m8_t): Ditto.
> (vint32mf2_t): New type for DEF_RVV_WCONVERT_I_OPS.
> (vint32m1_t): Ditto.
> (vint32m2_t): Ditto.
> (vint32m4_t): Ditto.
> (vint32m8_t): Ditto.
> (vuint32mf2_t): New type for DEF_RVV_WCONVERT_U_OPS.
> (vuint32m1_t): Ditto.
> (vuint32m2_t): Ditto.
> (vuint32m4_t): Ditto.
> (vuint32m8_t): Ditto.
> * config/riscv/vector-iterators.md: Add FP=16 support for V,
> VWCONVERTI, VCONVERT, VNCONVERT, VMUL1 and vlmul1.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/zvfh-intrinsic.c: New test.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
> .../riscv/riscv-vector-builtins-types.def     |  32 ++
> gcc/config/riscv/vector-iterators.md          |  21 +
> .../riscv/rvv/base/zvfh-intrinsic.c           | 418 ++++++++++++++++++
> 3 files changed, 471 insertions(+)
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-intrinsic.c
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def b/gcc/config/riscv/riscv-vector-builtins-types.def
> index 9cb3aca992e..1e2491de6d6 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-types.def
> +++ b/gcc/config/riscv/riscv-vector-builtins-types.def
> @@ -518,11 +518,24 @@ DEF_RVV_FULL_V_U_OPS (vuint64m2_t, RVV_REQUIRE_FULL_V)
> DEF_RVV_FULL_V_U_OPS (vuint64m4_t, RVV_REQUIRE_FULL_V)
> DEF_RVV_FULL_V_U_OPS (vuint64m8_t, RVV_REQUIRE_FULL_V)
> +DEF_RVV_WEXTF_OPS (vfloat32mf2_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_MIN_VLEN_64)
> +DEF_RVV_WEXTF_OPS (vfloat32m1_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32)
> +DEF_RVV_WEXTF_OPS (vfloat32m2_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32)
> +DEF_RVV_WEXTF_OPS (vfloat32m4_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32)
> +DEF_RVV_WEXTF_OPS (vfloat32m8_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32)
> +
> DEF_RVV_WEXTF_OPS (vfloat64m1_t, RVV_REQUIRE_ELEN_FP_64)
> DEF_RVV_WEXTF_OPS (vfloat64m2_t, RVV_REQUIRE_ELEN_FP_64)
> DEF_RVV_WEXTF_OPS (vfloat64m4_t, RVV_REQUIRE_ELEN_FP_64)
> DEF_RVV_WEXTF_OPS (vfloat64m8_t, RVV_REQUIRE_ELEN_FP_64)
> +DEF_RVV_CONVERT_I_OPS (vint16mf4_t, TARGET_ZVFH | RVV_REQUIRE_MIN_VLEN_64)
> +DEF_RVV_CONVERT_I_OPS (vint16mf2_t, TARGET_ZVFH)
> +DEF_RVV_CONVERT_I_OPS (vint16m1_t, TARGET_ZVFH)
> +DEF_RVV_CONVERT_I_OPS (vint16m2_t, TARGET_ZVFH)
> +DEF_RVV_CONVERT_I_OPS (vint16m4_t, TARGET_ZVFH)
> +DEF_RVV_CONVERT_I_OPS (vint16m8_t, TARGET_ZVFH)
> +
> DEF_RVV_CONVERT_I_OPS (vint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
> DEF_RVV_CONVERT_I_OPS (vint32m1_t, 0)
> DEF_RVV_CONVERT_I_OPS (vint32m2_t, 0)
> @@ -533,6 +546,13 @@ DEF_RVV_CONVERT_I_OPS (vint64m2_t, RVV_REQUIRE_ELEN_64)
> DEF_RVV_CONVERT_I_OPS (vint64m4_t, RVV_REQUIRE_ELEN_64)
> DEF_RVV_CONVERT_I_OPS (vint64m8_t, RVV_REQUIRE_ELEN_64)
> +DEF_RVV_CONVERT_U_OPS (vuint16mf4_t, TARGET_ZVFH | RVV_REQUIRE_MIN_VLEN_64)
> +DEF_RVV_CONVERT_U_OPS (vuint16mf2_t, TARGET_ZVFH)
> +DEF_RVV_CONVERT_U_OPS (vuint16m1_t, TARGET_ZVFH)
> +DEF_RVV_CONVERT_U_OPS (vuint16m2_t, TARGET_ZVFH)
> +DEF_RVV_CONVERT_U_OPS (vuint16m4_t, TARGET_ZVFH)
> +DEF_RVV_CONVERT_U_OPS (vuint16m8_t, TARGET_ZVFH)
> +
> DEF_RVV_CONVERT_U_OPS (vuint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
> DEF_RVV_CONVERT_U_OPS (vuint32m1_t, 0)
> DEF_RVV_CONVERT_U_OPS (vuint32m2_t, 0)
> @@ -543,11 +563,23 @@ DEF_RVV_CONVERT_U_OPS (vuint64m2_t, RVV_REQUIRE_ELEN_64)
> DEF_RVV_CONVERT_U_OPS (vuint64m4_t, RVV_REQUIRE_ELEN_64)
> DEF_RVV_CONVERT_U_OPS (vuint64m8_t, RVV_REQUIRE_ELEN_64)
> +DEF_RVV_WCONVERT_I_OPS (vint32mf2_t, TARGET_ZVFH | RVV_REQUIRE_MIN_VLEN_64)
> +DEF_RVV_WCONVERT_I_OPS (vint32m1_t, TARGET_ZVFH)
> +DEF_RVV_WCONVERT_I_OPS (vint32m2_t, TARGET_ZVFH)
> +DEF_RVV_WCONVERT_I_OPS (vint32m4_t, TARGET_ZVFH)
> +DEF_RVV_WCONVERT_I_OPS (vint32m8_t, TARGET_ZVFH)
> +
> DEF_RVV_WCONVERT_I_OPS (vint64m1_t, RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_ELEN_64)
> DEF_RVV_WCONVERT_I_OPS (vint64m2_t, RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_ELEN_64)
> DEF_RVV_WCONVERT_I_OPS (vint64m4_t, RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_ELEN_64)
> DEF_RVV_WCONVERT_I_OPS (vint64m8_t, RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_ELEN_64)
> +DEF_RVV_WCONVERT_U_OPS (vuint32mf2_t, TARGET_ZVFH | RVV_REQUIRE_MIN_VLEN_64)
> +DEF_RVV_WCONVERT_U_OPS (vuint32m1_t, TARGET_ZVFH)
> +DEF_RVV_WCONVERT_U_OPS (vuint32m2_t, TARGET_ZVFH)
> +DEF_RVV_WCONVERT_U_OPS (vuint32m4_t, TARGET_ZVFH)
> +DEF_RVV_WCONVERT_U_OPS (vuint32m8_t, TARGET_ZVFH)
> +
> DEF_RVV_WCONVERT_U_OPS (vuint64m1_t, RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_ELEN_64)
> DEF_RVV_WCONVERT_U_OPS (vuint64m2_t, RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_ELEN_64)
> DEF_RVV_WCONVERT_U_OPS (vuint64m4_t, RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_ELEN_64)
> diff --git a/gcc/config/riscv/vector-iterators.md b/gcc/config/riscv/vector-iterators.md
> index 90743ed76c5..e4f2ba90799 100644
> --- a/gcc/config/riscv/vector-iterators.md
> +++ b/gcc/config/riscv/vector-iterators.md
> @@ -296,6 +296,14 @@ (define_mode_iterator VWI_ZVE32 [
> ])
> (define_mode_iterator VF [
> +  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
> +  (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
> +  (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
> +  (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
> +  (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
> +  (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
> +  (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
> +
>    (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
>    (VNx2SF "TARGET_VECTOR_ELEN_FP_32")
>    (VNx4SF "TARGET_VECTOR_ELEN_FP_32")
> @@ -496,6 +504,13 @@ (define_mode_iterator VWEXTF [
> ])
> (define_mode_iterator VWCONVERTI [
> +  (VNx1SI "TARGET_MIN_VLEN < 128 && TARGET_VECTOR_ELEN_FP_16")
> +  (VNx2SI "TARGET_VECTOR_ELEN_FP_16")
> +  (VNx4SI "TARGET_VECTOR_ELEN_FP_16")
> +  (VNx8SI "TARGET_VECTOR_ELEN_FP_16")
> +  (VNx16SI "TARGET_MIN_VLEN > 32 && TARGET_VECTOR_ELEN_FP_16")
> +  (VNx32SI "TARGET_MIN_VLEN >= 128 && TARGET_VECTOR_ELEN_FP_16")
> +
>    (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
>    (VNx2DI "TARGET_VECTOR_ELEN_64 && TARGET_VECTOR_ELEN_FP_32")
>    (VNx4DI "TARGET_VECTOR_ELEN_64 && TARGET_VECTOR_ELEN_FP_32")
> @@ -1239,17 +1254,21 @@ (define_mode_attr VINDEX_OCT_EXT [
> ])
> (define_mode_attr VCONVERT [
> +  (VNx1HF "VNx1HI") (VNx2HF "VNx2HI") (VNx4HF "VNx4HI") (VNx8HF "VNx8HI") (VNx16HF "VNx16HI") (VNx32HF "VNx32HI") (VNx64HF "VNx64HI")
>    (VNx1SF "VNx1SI") (VNx2SF "VNx2SI") (VNx4SF "VNx4SI") (VNx8SF "VNx8SI") (VNx16SF "VNx16SI") (VNx32SF "VNx32SI")
>    (VNx1DF "VNx1DI") (VNx2DF "VNx2DI") (VNx4DF "VNx4DI") (VNx8DF "VNx8DI") (VNx16DF "VNx16DI")
> ])
> (define_mode_attr vconvert [
> +  (VNx1HF "vnx1hi") (VNx2HF "vnx2hi") (VNx4HF "vnx4hi") (VNx8HF "vnx8hi") (VNx16HF "vnx16hi") (VNx32HF "vnx32hi") (VNx64HF "vnx64hi")
>    (VNx1SF "vnx1si") (VNx2SF "vnx2si") (VNx4SF "vnx4si") (VNx8SF "vnx8si") (VNx16SF "vnx16si") (VNx32SF "vnx32si")
>    (VNx1DF "vnx1di") (VNx2DF "vnx2di") (VNx4DF "vnx4di") (VNx8DF "vnx8di") (VNx16DF "vnx16di")
> ])
> (define_mode_attr VNCONVERT [
> +  (VNx1HF "VNx1QI") (VNx2HF "VNx2QI") (VNx4HF "VNx4QI") (VNx8HF "VNx8QI") (VNx16HF "VNx16QI") (VNx32HF "VNx32QI") (VNx64HF "VNx64QI")
>    (VNx1SF "VNx1HI") (VNx2SF "VNx2HI") (VNx4SF "VNx4HI") (VNx8SF "VNx8HI") (VNx16SF "VNx16HI") (VNx32SF "VNx32HI")
> +  (VNx1SI "VNx1HF") (VNx2SI "VNx2HF") (VNx4SI "VNx4HF") (VNx8SI "VNx8HF") (VNx16SI "VNx16HF") (VNx32SI "VNx32HF")
>    (VNx1DI "VNx1SF") (VNx2DI "VNx2SF") (VNx4DI "VNx4SF") (VNx8DI "VNx8SF") (VNx16DI "VNx16SF")
>    (VNx1DF "VNx1SI") (VNx2DF "VNx2SI") (VNx4DF "VNx4SI") (VNx8DF "VNx8SI") (VNx16DF "VNx16SI")
> ])
> @@ -1263,6 +1282,7 @@ (define_mode_attr VLMUL1 [
>    (VNx8SI "VNx4SI") (VNx16SI "VNx4SI") (VNx32SI "VNx4SI")
>    (VNx1DI "VNx2DI") (VNx2DI "VNx2DI")
>    (VNx4DI "VNx2DI") (VNx8DI "VNx2DI") (VNx16DI "VNx2DI")
> +  (VNx1HF "VNx8HF") (VNx2HF "VNx8HF") (VNx4HF "VNx8HF") (VNx8HF "VNx8HF") (VNx16HF "VNx8HF") (VNx32HF "VNx8HF") (VNx64HF "VNx8HF")
>    (VNx1SF "VNx4SF") (VNx2SF "VNx4SF")
>    (VNx4SF "VNx4SF") (VNx8SF "VNx4SF") (VNx16SF "VNx4SF") (VNx32SF "VNx4SF")
>    (VNx1DF "VNx2DF") (VNx2DF "VNx2DF")
> @@ -1333,6 +1353,7 @@ (define_mode_attr vlmul1 [
>    (VNx8SI "vnx4si") (VNx16SI "vnx4si") (VNx32SI "vnx4si")
>    (VNx1DI "vnx2di") (VNx2DI "vnx2di")
>    (VNx4DI "vnx2di") (VNx8DI "vnx2di") (VNx16DI "vnx2di")
> +  (VNx1HF "vnx8hf") (VNx2HF "vnx8hf") (VNx4HF "vnx8hf") (VNx8HF "vnx8hf") (VNx16HF "vnx8hf") (VNx32HF "vnx8hf") (VNx64HF "vnx8hf")
>    (VNx1SF "vnx4sf") (VNx2SF "vnx4sf")
>    (VNx4SF "vnx4sf") (VNx8SF "vnx4sf") (VNx16SF "vnx4sf") (VNx32SF "vnx4sf")
>    (VNx1DF "vnx2df") (VNx2DF "vnx2df")
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-intrinsic.c b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-intrinsic.c
> new file mode 100644
> index 00000000000..0d244aac9ec
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-intrinsic.c
> @@ -0,0 +1,418 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +typedef _Float16 float16_t;
> +
> +vfloat16mf4_t test_vfadd_vv_f16mf4(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
> +  return __riscv_vfadd_vv_f16mf4(op1, op2, vl);
> +}
> +
> +vfloat16m8_t test_vfadd_vf_f16m8(vfloat16m8_t op1, float16_t op2, size_t vl) {
> +  return __riscv_vfadd_vf_f16m8(op1, op2, vl);
> +}
> +
> +vfloat16mf4_t test_vfsub_vv_f16mf4(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
> +  return __riscv_vfsub_vv_f16mf4(op1, op2, vl);
> +}
> +
> +vfloat16m8_t test_vfsub_vf_f16m8(vfloat16m8_t op1, float16_t op2, size_t vl) {
> +  return __riscv_vfsub_vf_f16m8(op1, op2, vl);
> +}
> +
> +vfloat16mf4_t test_vfrsub_vf_f16mf4(vfloat16mf4_t op1, float16_t op2, size_t vl) {
> +  return __riscv_vfrsub_vf_f16mf4(op1, op2, vl);
> +}
> +
> +vfloat16m8_t test_vfrsub_vf_f16m8(vfloat16m8_t op1, float16_t op2, size_t vl) {
> +  return __riscv_vfrsub_vf_f16m8(op1, op2, vl);
> +}
> +
> +vfloat32mf2_t test_vfwadd_vv_f32mf2(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
> +  return __riscv_vfwadd_vv_f32mf2(op1, op2, vl);
> +}
> +
> +vfloat32m8_t test_vfwadd_vv_f32m8(vfloat16m4_t op1, vfloat16m4_t op2, size_t vl) {
> +  return __riscv_vfwadd_vv_f32m8(op1, op2, vl);
> +}
> +
> +vfloat32mf2_t test_vfwadd_wv_f32mf2(vfloat32mf2_t op1, vfloat16mf4_t op2, size_t vl) {
> +  return __riscv_vfwadd_wv_f32mf2(op1, op2, vl);
> +}
> +
> +vfloat32m8_t test_vfwadd_wv_f32m8(vfloat32m8_t op1, vfloat16m4_t op2, size_t vl) {
> +  return __riscv_vfwadd_wv_f32m8(op1, op2, vl);
> +}
> +
> +vfloat32mf2_t test_vfwsub_vv_f32mf2(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
> +  return __riscv_vfwsub_vv_f32mf2(op1, op2, vl);
> +}
> +
> +vfloat32m8_t test_vfwsub_vv_f32m8(vfloat16m4_t op1, vfloat16m4_t op2, size_t vl) {
> +  return __riscv_vfwsub_vv_f32m8(op1, op2, vl);
> +}
> +
> +vfloat32mf2_t test_vfwsub_wv_f32mf2(vfloat32mf2_t op1, vfloat16mf4_t op2, size_t vl) {
> +  return __riscv_vfwsub_wv_f32mf2(op1, op2, vl);
> +}
> +
> +vfloat32m8_t test_vfwsub_wv_f32m8(vfloat32m8_t op1, vfloat16m4_t op2, size_t vl) {
> +  return __riscv_vfwsub_wv_f32m8(op1, op2, vl);
> +}
> +
> +vfloat16mf4_t test_vfmul_vv_f16mf4(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
> +  return __riscv_vfmul_vv_f16mf4(op1, op2, vl);
> +}
> +
> +vfloat16m8_t test_vfmul_vf_f16m8(vfloat16m8_t op1, float16_t op2, size_t vl) {
> +  return __riscv_vfmul_vf_f16m8(op1, op2, vl);
> +}
> +
> +vfloat16mf4_t test_vfdiv_vv_f16mf4(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
> +  return __riscv_vfdiv_vv_f16mf4(op1, op2, vl);
> +}
> +
> +vfloat16m8_t test_vfdiv_vf_f16m8(vfloat16m8_t op1, float16_t op2, size_t vl) {
> +  return __riscv_vfdiv_vf_f16m8(op1, op2, vl);
> +}
> +
> +vfloat16mf4_t test_vfrdiv_vf_f16mf4(vfloat16mf4_t op1, float16_t op2, size_t vl) {
> +  return __riscv_vfrdiv_vf_f16mf4(op1, op2, vl);
> +}
> +
> +vfloat16m8_t test_vfrdiv_vf_f16m8(vfloat16m8_t op1, float16_t op2, size_t vl) {
> +  return __riscv_vfrdiv_vf_f16m8(op1, op2, vl);
> +}
> +
> +vfloat32mf2_t test_vfwmul_vv_f32mf2(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
> +  return __riscv_vfwmul_vv_f32mf2(op1, op2, vl);
> +}
> +
> +vfloat32m8_t test_vfwmul_vf_f32m8(vfloat16m4_t op1, float16_t op2, size_t vl) {
> +  return __riscv_vfwmul_vf_f32m8(op1, op2, vl);
> +}
> +
> +vfloat16mf4_t test_vfmacc_vv_f16mf4(vfloat16mf4_t vd, vfloat16mf4_t vs1, vfloat16mf4_t vs2, size_t vl) {
> +  return __riscv_vfmacc_vv_f16mf4(vd, vs1, vs2, vl);
> +}
> +
> +vfloat16m8_t test_vfmacc_vf_f16m8(vfloat16m8_t vd, float16_t rs1, vfloat16m8_t vs2, size_t vl) {
> +  return __riscv_vfmacc_vf_f16m8(vd, rs1, vs2, vl);
> +}
> +
> +vfloat16mf4_t test_vfnmacc_vv_f16mf4(vfloat16mf4_t vd, vfloat16mf4_t vs1, vfloat16mf4_t vs2, size_t vl) {
> +  return __riscv_vfnmacc_vv_f16mf4(vd, vs1, vs2, vl);
> +}
> +
> +vfloat16m8_t test_vfnmacc_vf_f16m8(vfloat16m8_t vd, float16_t rs1, vfloat16m8_t vs2, size_t vl) {
> +  return __riscv_vfnmacc_vf_f16m8(vd, rs1, vs2, vl);
> +}
> +
> +vfloat16mf4_t test_vfmsac_vv_f16mf4(vfloat16mf4_t vd, vfloat16mf4_t vs1, vfloat16mf4_t vs2, size_t vl) {
> +  return __riscv_vfmsac_vv_f16mf4(vd, vs1, vs2, vl);
> +}
> +
> +vfloat16m8_t test_vfmsac_vf_f16m8(vfloat16m8_t vd, float16_t rs1, vfloat16m8_t vs2, size_t vl) {
> +  return __riscv_vfmsac_vf_f16m8(vd, rs1, vs2, vl);
> +}
> +
> +vfloat16mf4_t test_vfnmsac_vv_f16mf4(vfloat16mf4_t vd, vfloat16mf4_t vs1, vfloat16mf4_t vs2, size_t vl) {
> +  return __riscv_vfnmsac_vv_f16mf4(vd, vs1, vs2, vl);
> +}
> +
> +vfloat16m8_t test_vfnmsac_vf_f16m8(vfloat16m8_t vd, float16_t rs1, vfloat16m8_t vs2, size_t vl) {
> +  return __riscv_vfnmsac_vf_f16m8(vd, rs1, vs2, vl);
> +}
> +
> +vfloat16mf4_t test_vfmadd_vv_f16mf4(vfloat16mf4_t vd, vfloat16mf4_t vs1, vfloat16mf4_t vs2, size_t vl) {
> +  return __riscv_vfmadd_vv_f16mf4(vd, vs1, vs2, vl);
> +}
> +
> +vfloat16m8_t test_vfmadd_vf_f16m8(vfloat16m8_t vd, float16_t rs1, vfloat16m8_t vs2, size_t vl) {
> +  return __riscv_vfmadd_vf_f16m8(vd, rs1, vs2, vl);
> +}
> +
> +vfloat16mf4_t test_vfnmadd_vv_f16mf4(vfloat16mf4_t vd, vfloat16mf4_t vs1, vfloat16mf4_t vs2, size_t vl) {
> +  return __riscv_vfnmadd_vv_f16mf4(vd, vs1, vs2, vl);
> +}
> +
> +vfloat16m8_t test_vfnmadd_vf_f16m8(vfloat16m8_t vd, float16_t rs1, vfloat16m8_t vs2, size_t vl) {
> +  return __riscv_vfnmadd_vf_f16m8(vd, rs1, vs2, vl);
> +}
> +
> +vfloat16mf4_t test_vfmsub_vv_f16mf4(vfloat16mf4_t vd, vfloat16mf4_t vs1, vfloat16mf4_t vs2, size_t vl) {
> +  return __riscv_vfmsub_vv_f16mf4(vd, vs1, vs2, vl);
> +}
> +
> +vfloat16m8_t test_vfmsub_vf_f16m8(vfloat16m8_t vd, float16_t rs1, vfloat16m8_t vs2, size_t vl) {
> +  return __riscv_vfmsub_vf_f16m8(vd, rs1, vs2, vl);
> +}
> +
> +vfloat16mf4_t test_vfnmsub_vv_f16mf4(vfloat16mf4_t vd, vfloat16mf4_t vs1, vfloat16mf4_t vs2, size_t vl) {
> +  return __riscv_vfnmsub_vv_f16mf4(vd, vs1, vs2, vl);
> +}
> +
> +vfloat16m8_t test_vfnmsub_vf_f16m8(vfloat16m8_t vd, float16_t rs1, vfloat16m8_t vs2, size_t vl) {
> +  return __riscv_vfnmsub_vf_f16m8(vd, rs1, vs2, vl);
> +}
> +
> +vfloat32mf2_t test_vfwmacc_vv_f32mf2(vfloat32mf2_t vd, vfloat16mf4_t vs1, vfloat16mf4_t vs2, size_t vl) {
> +  return __riscv_vfwmacc_vv_f32mf2(vd, vs1, vs2, vl);
> +}
> +
> +vfloat32m8_t test_vfwmacc_vf_f32m8(vfloat32m8_t vd, float16_t vs1, vfloat16m4_t vs2, size_t vl) {
> +  return __riscv_vfwmacc_vf_f32m8(vd, vs1, vs2, vl);
> +}
> +
> +vfloat32mf2_t test_vfwnmacc_vv_f32mf2(vfloat32mf2_t vd, vfloat16mf4_t vs1, vfloat16mf4_t vs2, size_t vl) {
> +  return __riscv_vfwnmacc_vv_f32mf2(vd, vs1, vs2, vl);
> +}
> +
> +vfloat32m8_t test_vfwnmacc_vf_f32m8(vfloat32m8_t vd, float16_t vs1, vfloat16m4_t vs2, size_t vl) {
> +  return __riscv_vfwnmacc_vf_f32m8(vd, vs1, vs2, vl);
> +}
> +
> +vfloat32mf2_t test_vfwmsac_vv_f32mf2(vfloat32mf2_t vd, vfloat16mf4_t vs1, vfloat16mf4_t vs2, size_t vl) {
> +  return __riscv_vfwmsac_vv_f32mf2(vd, vs1, vs2, vl);
> +}
> +
> +vfloat32m8_t test_vfwmsac_vf_f32m8(vfloat32m8_t vd, float16_t vs1, vfloat16m4_t vs2, size_t vl) {
> +  return __riscv_vfwmsac_vf_f32m8(vd, vs1, vs2, vl);
> +}
> +
> +vfloat32mf2_t test_vfwnmsac_vv_f32mf2(vfloat32mf2_t vd, vfloat16mf4_t vs1, vfloat16mf4_t vs2, size_t vl) {
> +  return __riscv_vfwnmsac_vv_f32mf2(vd, vs1, vs2, vl);
> +}
> +
> +vfloat32m8_t test_vfwnmsac_vf_f32m8(vfloat32m8_t vd, float16_t vs1, vfloat16m4_t vs2, size_t vl) {
> +  return __riscv_vfwnmsac_vf_f32m8(vd, vs1, vs2, vl);
> +}
> +
> +vfloat16mf4_t test_vfsqrt_v_f16mf4(vfloat16mf4_t op1, size_t vl) {
> +  return __riscv_vfsqrt_v_f16mf4(op1, vl);
> +}
> +
> +vfloat16m8_t test_vfsqrt_v_f16m8(vfloat16m8_t op1, size_t vl) {
> +  return __riscv_vfsqrt_v_f16m8(op1, vl);
> +}
> +
> +vfloat16mf4_t test_vfrsqrt7_v_f16mf4(vfloat16mf4_t op1, size_t vl) {
> +  return __riscv_vfrsqrt7_v_f16mf4(op1, vl);
> +}
> +
> +vfloat16m8_t test_vfrsqrt7_v_f16m8(vfloat16m8_t op1, size_t vl) {
> +  return __riscv_vfrsqrt7_v_f16m8(op1, vl);
> +}
> +
> +vfloat16mf4_t test_vfrec7_v_f16mf4(vfloat16mf4_t op1, size_t vl) {
> +  return __riscv_vfrec7_v_f16mf4(op1, vl);
> +}
> +
> +vfloat16m8_t test_vfrec7_v_f16m8(vfloat16m8_t op1, size_t vl) {
> +  return __riscv_vfrec7_v_f16m8(op1, vl);
> +}
> +
> +vfloat16mf4_t test_vfmin_vv_f16mf4(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
> +  return __riscv_vfmin_vv_f16mf4(op1, op2, vl);
> +}
> +
> +vfloat16m8_t test_vfmin_vf_f16m8(vfloat16m8_t op1, float16_t op2, size_t vl) {
> +  return __riscv_vfmin_vf_f16m8(op1, op2, vl);
> +}
> +
> +vfloat16mf4_t test_vfmax_vv_f16mf4(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
> +  return __riscv_vfmax_vv_f16mf4(op1, op2, vl);
> +}
> +
> +vfloat16m8_t test_vfmax_vf_f16m8(vfloat16m8_t op1, float16_t op2, size_t vl) {
> +  return __riscv_vfmax_vf_f16m8(op1, op2, vl);
> +}
> +
> +vfloat16mf4_t test_vfsgnj_vv_f16mf4(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
> +  return __riscv_vfsgnj_vv_f16mf4(op1, op2, vl);
> +}
> +
> +vfloat16m8_t test_vfsgnj_vf_f16m8(vfloat16m8_t op1, float16_t op2, size_t vl) {
> +  return __riscv_vfsgnj_vf_f16m8(op1, op2, vl);
> +}
> +
> +vfloat16mf4_t test_vfsgnjn_vv_f16mf4(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
> +  return __riscv_vfsgnjn_vv_f16mf4(op1, op2, vl);
> +}
> +
> +vfloat16m8_t test_vfsgnjn_vf_f16m8(vfloat16m8_t op1, float16_t op2, size_t vl) {
> +  return __riscv_vfsgnjn_vf_f16m8(op1, op2, vl);
> +}
> +
> +vfloat16mf4_t test_vfsgnjx_vv_f16mf4(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
> +  return __riscv_vfsgnjx_vv_f16mf4(op1, op2, vl);
> +}
> +
> +vfloat16m8_t test_vfsgnjx_vf_f16m8(vfloat16m8_t op1, float16_t op2, size_t vl) {
> +  return __riscv_vfsgnjx_vf_f16m8(op1, op2, vl);
> +}
> +
> +vbool64_t test_vmfeq_vv_f16mf4_b64(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
> +  return __riscv_vmfeq_vv_f16mf4_b64(op1, op2, vl);
> +}
> +
> +vbool2_t test_vmfeq_vf_f16m8_b2(vfloat16m8_t op1, float16_t op2, size_t vl) {
> +  return __riscv_vmfeq_vf_f16m8_b2(op1, op2, vl);
> +}
> +
> +vbool64_t test_vmfne_vv_f16mf4_b64(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
> +  return __riscv_vmfne_vv_f16mf4_b64(op1, op2, vl);
> +}
> +
> +vbool2_t test_vmfne_vf_f16m8_b2(vfloat16m8_t op1, float16_t op2, size_t vl) {
> +  return __riscv_vmfne_vf_f16m8_b2(op1, op2, vl);
> +}
> +
> +vbool64_t test_vmflt_vv_f16mf4_b64(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
> +  return __riscv_vmflt_vv_f16mf4_b64(op1, op2, vl);
> +}
> +
> +vbool2_t test_vmflt_vf_f16m8_b2(vfloat16m8_t op1, float16_t op2, size_t vl) {
> +  return __riscv_vmflt_vf_f16m8_b2(op1, op2, vl);
> +}
> +
> +vbool64_t test_vmfle_vv_f16mf4_b64(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
> +  return __riscv_vmfle_vv_f16mf4_b64(op1, op2, vl);
> +}
> +
> +vbool2_t test_vmfle_vf_f16m8_b2(vfloat16m8_t op1, float16_t op2, size_t vl) {
> +  return __riscv_vmfle_vf_f16m8_b2(op1, op2, vl);
> +}
> +
> +vbool64_t test_vmfgt_vv_f16mf4_b64(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
> +  return __riscv_vmfgt_vv_f16mf4_b64(op1, op2, vl);
> +}
> +
> +vbool2_t test_vmfgt_vf_f16m8_b2(vfloat16m8_t op1, float16_t op2, size_t vl) {
> +  return __riscv_vmfgt_vf_f16m8_b2(op1, op2, vl);
> +}
> +
> +vbool64_t test_vmfge_vv_f16mf4_b64(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
> +  return __riscv_vmfge_vv_f16mf4_b64(op1, op2, vl);
> +}
> +
> +vbool2_t test_vmfge_vf_f16m8_b2(vfloat16m8_t op1, float16_t op2, size_t vl) {
> +  return __riscv_vmfge_vf_f16m8_b2(op1, op2, vl);
> +}
> +
> +vuint16mf4_t test_vfclass_v_u16mf4(vfloat16mf4_t op1, size_t vl) {
> +  return __riscv_vfclass_v_u16mf4(op1, vl);
> +}
> +
> +vuint16m8_t test_vfclass_v_u16m8(vfloat16m8_t op1, size_t vl) {
> +  return __riscv_vfclass_v_u16m8(op1, vl);
> +}
> +
> +vfloat16mf4_t test_vfmerge_vfm_f16mf4(vfloat16mf4_t op1, float16_t op2, vbool64_t mask, size_t vl) {
> +  return __riscv_vfmerge_vfm_f16mf4(op1, op2, mask, vl);
> +}
> +
> +vfloat16m8_t test_vfmerge_vfm_f16m8(vfloat16m8_t op1, float16_t op2, vbool2_t mask, size_t vl) {
> +  return __riscv_vfmerge_vfm_f16m8(op1, op2, mask, vl);
> +}
> +
> +vfloat16mf4_t test_vfmv_v_f_f16mf4(float16_t src, size_t vl) {
> +  return __riscv_vfmv_v_f_f16mf4(src, vl);
> +}
> +
> +vfloat16m8_t test_vfmv_v_f_f16m8(float16_t src, size_t vl) {
> +  return __riscv_vfmv_v_f_f16m8(src, vl);
> +}
> +
> +vint16mf4_t test_vfcvt_x_f_v_i16mf4(vfloat16mf4_t src, size_t vl) {
> +  return __riscv_vfcvt_x_f_v_i16mf4(src, vl);
> +}
> +
> +vuint16m8_t test_vfcvt_xu_f_v_u16m8(vfloat16m8_t src, size_t vl) {
> +  return __riscv_vfcvt_xu_f_v_u16m8(src, vl);
> +}
> +
> +vfloat16mf4_t test_vfcvt_f_x_v_f16mf4(vint16mf4_t src, size_t vl) {
> +  return __riscv_vfcvt_f_x_v_f16mf4(src, vl);
> +}
> +
> +vfloat16m8_t test_vfcvt_f_xu_v_f16m8(vuint16m8_t src, size_t vl) {
> +  return __riscv_vfcvt_f_xu_v_f16m8(src, vl);
> +}
> +
> +vint16mf4_t test_vfcvt_rtz_x_f_v_i16mf4(vfloat16mf4_t src, size_t vl) {
> +  return __riscv_vfcvt_rtz_x_f_v_i16mf4(src, vl);
> +}
> +
> +vuint16m8_t test_vfcvt_rtz_xu_f_v_u16m8(vfloat16m8_t src, size_t vl) {
> +  return __riscv_vfcvt_rtz_xu_f_v_u16m8(src, vl);
> +}
> +
> +vfloat16mf4_t test_vfwcvt_f_x_v_f16mf4(vint8mf8_t src, size_t vl) {
> +  return __riscv_vfwcvt_f_x_v_f16mf4(src, vl);
> +}
> +
> +vuint32m8_t test_vfwcvt_xu_f_v_u32m8(vfloat16m4_t src, size_t vl) {
> +  return __riscv_vfwcvt_xu_f_v_u32m8(src, vl);
> +}
> +
> +vint8mf8_t test_vfncvt_x_f_w_i8mf8(vfloat16mf4_t src, size_t vl) {
> +  return __riscv_vfncvt_x_f_w_i8mf8(src, vl);
> +}
> +
> +vfloat16m4_t test_vfncvt_f_xu_w_f16m4(vuint32m8_t src, size_t vl) {
> +  return __riscv_vfncvt_f_xu_w_f16m4(src, vl);
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 43 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*m4,\s*t[au],\s*m[au]} 11 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*m8,\s*t[au],\s*m[au]} 34 } } */
> +/* { dg-final { scan-assembler-times {vfadd\.v[fv]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vfsub\.v[fv]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vfrsub\.vf\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vfwadd\.[wv]v\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 4 } } */
> +/* { dg-final { scan-assembler-times {vfwsub\.[wv]v\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 4 } } */
> +/* { dg-final { scan-assembler-times {vfmul\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vfdiv\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vfrdiv\.vf\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vfwmul\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vfmacc\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vfnmacc\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vfmsac\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vfnmsac\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vfmadd\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vfnmadd\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vfmsub\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vfnmsub\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vfwmacc\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vfwnmacc\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vfwmsac\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vfwnmsac\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vfsqrt\.v\s+v[0-9]+,\s*v[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vfrsqrt7\.v\s+v[0-9]+,\s*v[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vfrec7\.v\s+v[0-9]+,\s*v[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vfmin\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vfmax\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vfsgnj\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vfsgnjn\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vfsgnjx\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vmfeq\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vmfne\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vmflt\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vmfle\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vmfgt\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vmfge\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vfclass\.v\s+v[0-9]+,\s*v[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vfmerge\.vfm\s+v[0-9]+,\s*v[0-9]+,\s*fa[0-9]+,\s*v0} 2 } } */
> +/* { dg-final { scan-assembler-times {vfmv\.v\.f\s+v[0-9]+,\s*fa[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+} 1 } } */
> +/* { dg-final { scan-assembler-times {vfcvt\.xu\.f\.v\s+v[0-9]+,\s*v[0-9]+} 1 } } */
> +/* { dg-final { scan-assembler-times {vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+} 1 } } */
> +/* { dg-final { scan-assembler-times {vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+} 1 } } */
> +/* { dg-final { scan-assembler-times {vfcvt\.rtz\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+} 1 } } */
> +/* { dg-final { scan-assembler-times {vfcvt\.rtz\.xu\.f\.v\s+v[0-9]+,\s*v[0-9]+} 1 } } */
> +/* { dg-final { scan-assembler-times {vfwcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+} 1 } } */
> +/* { dg-final { scan-assembler-times {vfwcvt\.xu\.f\.v\s+v[0-9]+,\s*v[0-9]+} 1 } } */
> +/* { dg-final { scan-assembler-times {vfncvt\.x\.f\.w\s+v[0-9]+,\s*v[0-9]+} 1 } } */
> +/* { dg-final { scan-assembler-times {vfncvt\.f\.xu\.w\s+v[0-9]+,\s*v[0-9]+} 1 } } */
> --
> 2.34.1
>
>
Li, Pan2 via Gcc-patches June 5, 2023, 8:50 a.m. UTC | #3
Committed, thanks Kito and Juzhe.

Pan

-----Original Message-----
From: Kito Cheng <kito.cheng@sifive.com> 
Sent: Monday, June 5, 2023 4:47 PM
To: juzhe.zhong@rivai.ai
Cc: Li, Pan2 <pan2.li@intel.com>; gcc-patches <gcc-patches@gcc.gnu.org>; Wang, Yanzhang <yanzhang.wang@intel.com>
Subject: Re: [PATCH v2] RISC-V: Support RVV FP16 ZVFH floating-point intrinsic API

LGTM too, thanks :)

On Mon, Jun 5, 2023 at 4:27 PM juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai> wrote:
>
> LGTM,
>
> ________________________________
> juzhe.zhong@rivai.ai
>
>
> From: pan2.li
> Date: 2023-06-05 16:20
> To: gcc-patches
> CC: juzhe.zhong; kito.cheng; pan2.li; yanzhang.wang
> Subject: [PATCH v2] RISC-V: Support RVV FP16 ZVFH floating-point 
> intrinsic API
> From: Pan Li <pan2.li@intel.com>
>
> This patch support the intrinsic API of FP16 ZVFH floating-point. Aka
> SEW=16 for below instructions:
>
> vfadd vfsub vfrsub vfwadd vfwsub
> vfmul vfdiv vfrdiv vfwmul
> vfmacc vfnmacc vfmsac vfnmsac vfmadd
> vfnmadd vfmsub vfnmsub vfwmacc vfwnmacc vfwmsac vfwnmsac vfsqrt 
> vfrsqrt7 vfrec7 vfmin vfmax vfsgnj vfsgnjn vfsgnjx vmfeq vmfne vmflt 
> vmfle vmfgt vmfge vfclass vfmerge vfmv vfcvt vfwcvt vfncvt
>
> Then users can leverage the instrinsic APIs to perform the FP=16 
> related operations. Please note not all the instrinsic APIs are 
> coverred in the test files, only pick some typical ones due to too 
> many. We will perform the FP16 related instrinsic API test entirely soon.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins-types.def
> (vfloat32mf2_t): New type for DEF_RVV_WEXTF_OPS.
> (vfloat32m1_t): Ditto.
> (vfloat32m2_t): Ditto.
> (vfloat32m4_t): Ditto.
> (vfloat32m8_t): Ditto.
> (vint16mf4_t): New type for DEF_RVV_CONVERT_I_OPS.
> (vint16mf2_t): Ditto.
> (vint16m1_t): Ditto.
> (vint16m2_t): Ditto.
> (vint16m4_t): Ditto.
> (vint16m8_t): Ditto.
> (vuint16mf4_t): New type for DEF_RVV_CONVERT_U_OPS.
> (vuint16mf2_t): Ditto.
> (vuint16m1_t): Ditto.
> (vuint16m2_t): Ditto.
> (vuint16m4_t): Ditto.
> (vuint16m8_t): Ditto.
> (vint32mf2_t): New type for DEF_RVV_WCONVERT_I_OPS.
> (vint32m1_t): Ditto.
> (vint32m2_t): Ditto.
> (vint32m4_t): Ditto.
> (vint32m8_t): Ditto.
> (vuint32mf2_t): New type for DEF_RVV_WCONVERT_U_OPS.
> (vuint32m1_t): Ditto.
> (vuint32m2_t): Ditto.
> (vuint32m4_t): Ditto.
> (vuint32m8_t): Ditto.
> * config/riscv/vector-iterators.md: Add FP=16 support for V, 
> VWCONVERTI, VCONVERT, VNCONVERT, VMUL1 and vlmul1.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/zvfh-intrinsic.c: New test.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
> .../riscv/riscv-vector-builtins-types.def     |  32 ++
> gcc/config/riscv/vector-iterators.md          |  21 +
> .../riscv/rvv/base/zvfh-intrinsic.c           | 418 ++++++++++++++++++
> 3 files changed, 471 insertions(+)
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-intrinsic.c
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def 
> b/gcc/config/riscv/riscv-vector-builtins-types.def
> index 9cb3aca992e..1e2491de6d6 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-types.def
> +++ b/gcc/config/riscv/riscv-vector-builtins-types.def
> @@ -518,11 +518,24 @@ DEF_RVV_FULL_V_U_OPS (vuint64m2_t, 
> RVV_REQUIRE_FULL_V) DEF_RVV_FULL_V_U_OPS (vuint64m4_t, 
> RVV_REQUIRE_FULL_V) DEF_RVV_FULL_V_U_OPS (vuint64m8_t, 
> RVV_REQUIRE_FULL_V)
> +DEF_RVV_WEXTF_OPS (vfloat32mf2_t, TARGET_ZVFH | 
> +RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_MIN_VLEN_64) DEF_RVV_WEXTF_OPS 
> +(vfloat32m1_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32) 
> +DEF_RVV_WEXTF_OPS (vfloat32m2_t, TARGET_ZVFH | 
> +RVV_REQUIRE_ELEN_FP_32) DEF_RVV_WEXTF_OPS (vfloat32m4_t, TARGET_ZVFH 
> +| RVV_REQUIRE_ELEN_FP_32) DEF_RVV_WEXTF_OPS (vfloat32m8_t, 
> +TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32)
> +
> DEF_RVV_WEXTF_OPS (vfloat64m1_t, RVV_REQUIRE_ELEN_FP_64) 
> DEF_RVV_WEXTF_OPS (vfloat64m2_t, RVV_REQUIRE_ELEN_FP_64) 
> DEF_RVV_WEXTF_OPS (vfloat64m4_t, RVV_REQUIRE_ELEN_FP_64) 
> DEF_RVV_WEXTF_OPS (vfloat64m8_t, RVV_REQUIRE_ELEN_FP_64)
> +DEF_RVV_CONVERT_I_OPS (vint16mf4_t, TARGET_ZVFH | 
> +RVV_REQUIRE_MIN_VLEN_64) DEF_RVV_CONVERT_I_OPS (vint16mf2_t, 
> +TARGET_ZVFH) DEF_RVV_CONVERT_I_OPS (vint16m1_t, TARGET_ZVFH) 
> +DEF_RVV_CONVERT_I_OPS (vint16m2_t, TARGET_ZVFH) DEF_RVV_CONVERT_I_OPS 
> +(vint16m4_t, TARGET_ZVFH) DEF_RVV_CONVERT_I_OPS (vint16m8_t, 
> +TARGET_ZVFH)
> +
> DEF_RVV_CONVERT_I_OPS (vint32mf2_t, RVV_REQUIRE_MIN_VLEN_64) 
> DEF_RVV_CONVERT_I_OPS (vint32m1_t, 0) DEF_RVV_CONVERT_I_OPS 
> (vint32m2_t, 0) @@ -533,6 +546,13 @@ DEF_RVV_CONVERT_I_OPS 
> (vint64m2_t, RVV_REQUIRE_ELEN_64) DEF_RVV_CONVERT_I_OPS (vint64m4_t, 
> RVV_REQUIRE_ELEN_64) DEF_RVV_CONVERT_I_OPS (vint64m8_t, 
> RVV_REQUIRE_ELEN_64)
> +DEF_RVV_CONVERT_U_OPS (vuint16mf4_t, TARGET_ZVFH | 
> +RVV_REQUIRE_MIN_VLEN_64) DEF_RVV_CONVERT_U_OPS (vuint16mf2_t, 
> +TARGET_ZVFH) DEF_RVV_CONVERT_U_OPS (vuint16m1_t, TARGET_ZVFH) 
> +DEF_RVV_CONVERT_U_OPS (vuint16m2_t, TARGET_ZVFH) 
> +DEF_RVV_CONVERT_U_OPS (vuint16m4_t, TARGET_ZVFH) 
> +DEF_RVV_CONVERT_U_OPS (vuint16m8_t, TARGET_ZVFH)
> +
> DEF_RVV_CONVERT_U_OPS (vuint32mf2_t, RVV_REQUIRE_MIN_VLEN_64) 
> DEF_RVV_CONVERT_U_OPS (vuint32m1_t, 0) DEF_RVV_CONVERT_U_OPS 
> (vuint32m2_t, 0) @@ -543,11 +563,23 @@ DEF_RVV_CONVERT_U_OPS 
> (vuint64m2_t, RVV_REQUIRE_ELEN_64) DEF_RVV_CONVERT_U_OPS (vuint64m4_t, 
> RVV_REQUIRE_ELEN_64) DEF_RVV_CONVERT_U_OPS (vuint64m8_t, 
> RVV_REQUIRE_ELEN_64)
> +DEF_RVV_WCONVERT_I_OPS (vint32mf2_t, TARGET_ZVFH | 
> +RVV_REQUIRE_MIN_VLEN_64) DEF_RVV_WCONVERT_I_OPS (vint32m1_t, 
> +TARGET_ZVFH) DEF_RVV_WCONVERT_I_OPS (vint32m2_t, TARGET_ZVFH) 
> +DEF_RVV_WCONVERT_I_OPS (vint32m4_t, TARGET_ZVFH) 
> +DEF_RVV_WCONVERT_I_OPS (vint32m8_t, TARGET_ZVFH)
> +
> DEF_RVV_WCONVERT_I_OPS (vint64m1_t, RVV_REQUIRE_ELEN_FP_32 | 
> RVV_REQUIRE_ELEN_64) DEF_RVV_WCONVERT_I_OPS (vint64m2_t, 
> RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_ELEN_64) DEF_RVV_WCONVERT_I_OPS 
> (vint64m4_t, RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_ELEN_64) 
> DEF_RVV_WCONVERT_I_OPS (vint64m8_t, RVV_REQUIRE_ELEN_FP_32 | 
> RVV_REQUIRE_ELEN_64)
> +DEF_RVV_WCONVERT_U_OPS (vuint32mf2_t, TARGET_ZVFH | 
> +RVV_REQUIRE_MIN_VLEN_64) DEF_RVV_WCONVERT_U_OPS (vuint32m1_t, 
> +TARGET_ZVFH) DEF_RVV_WCONVERT_U_OPS (vuint32m2_t, TARGET_ZVFH) 
> +DEF_RVV_WCONVERT_U_OPS (vuint32m4_t, TARGET_ZVFH) 
> +DEF_RVV_WCONVERT_U_OPS (vuint32m8_t, TARGET_ZVFH)
> +
> DEF_RVV_WCONVERT_U_OPS (vuint64m1_t, RVV_REQUIRE_ELEN_FP_32 | 
> RVV_REQUIRE_ELEN_64) DEF_RVV_WCONVERT_U_OPS (vuint64m2_t, 
> RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_ELEN_64) DEF_RVV_WCONVERT_U_OPS 
> (vuint64m4_t, RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_ELEN_64) diff --git 
> a/gcc/config/riscv/vector-iterators.md 
> b/gcc/config/riscv/vector-iterators.md
> index 90743ed76c5..e4f2ba90799 100644
> --- a/gcc/config/riscv/vector-iterators.md
> +++ b/gcc/config/riscv/vector-iterators.md
> @@ -296,6 +296,14 @@ (define_mode_iterator VWI_ZVE32 [
> ])
> (define_mode_iterator VF [
> +  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")  
> + (VNx2HF "TARGET_VECTOR_ELEN_FP_16")  (VNx4HF 
> + "TARGET_VECTOR_ELEN_FP_16")  (VNx8HF "TARGET_VECTOR_ELEN_FP_16")  
> + (VNx16HF "TARGET_VECTOR_ELEN_FP_16")  (VNx32HF 
> + "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")  (VNx64HF 
> + "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
> +
>    (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
>    (VNx2SF "TARGET_VECTOR_ELEN_FP_32")
>    (VNx4SF "TARGET_VECTOR_ELEN_FP_32") @@ -496,6 +504,13 @@ 
> (define_mode_iterator VWEXTF [
> ])
> (define_mode_iterator VWCONVERTI [
> +  (VNx1SI "TARGET_MIN_VLEN < 128 && TARGET_VECTOR_ELEN_FP_16")  
> + (VNx2SI "TARGET_VECTOR_ELEN_FP_16")  (VNx4SI 
> + "TARGET_VECTOR_ELEN_FP_16")  (VNx8SI "TARGET_VECTOR_ELEN_FP_16")  
> + (VNx16SI "TARGET_MIN_VLEN > 32 && TARGET_VECTOR_ELEN_FP_16")  
> + (VNx32SI "TARGET_MIN_VLEN >= 128 && TARGET_VECTOR_ELEN_FP_16")
> +
>    (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
>    (VNx2DI "TARGET_VECTOR_ELEN_64 && TARGET_VECTOR_ELEN_FP_32")
>    (VNx4DI "TARGET_VECTOR_ELEN_64 && TARGET_VECTOR_ELEN_FP_32") @@ 
> -1239,17 +1254,21 @@ (define_mode_attr VINDEX_OCT_EXT [
> ])
> (define_mode_attr VCONVERT [
> +  (VNx1HF "VNx1HI") (VNx2HF "VNx2HI") (VNx4HF "VNx4HI") (VNx8HF 
> + "VNx8HI") (VNx16HF "VNx16HI") (VNx32HF "VNx32HI") (VNx64HF 
> + "VNx64HI")
>    (VNx1SF "VNx1SI") (VNx2SF "VNx2SI") (VNx4SF "VNx4SI") (VNx8SF "VNx8SI") (VNx16SF "VNx16SI") (VNx32SF "VNx32SI")
>    (VNx1DF "VNx1DI") (VNx2DF "VNx2DI") (VNx4DF "VNx4DI") (VNx8DF 
> "VNx8DI") (VNx16DF "VNx16DI")
> ])
> (define_mode_attr vconvert [
> +  (VNx1HF "vnx1hi") (VNx2HF "vnx2hi") (VNx4HF "vnx4hi") (VNx8HF 
> + "vnx8hi") (VNx16HF "vnx16hi") (VNx32HF "vnx32hi") (VNx64HF 
> + "vnx64hi")
>    (VNx1SF "vnx1si") (VNx2SF "vnx2si") (VNx4SF "vnx4si") (VNx8SF "vnx8si") (VNx16SF "vnx16si") (VNx32SF "vnx32si")
>    (VNx1DF "vnx1di") (VNx2DF "vnx2di") (VNx4DF "vnx4di") (VNx8DF 
> "vnx8di") (VNx16DF "vnx16di")
> ])
> (define_mode_attr VNCONVERT [
> +  (VNx1HF "VNx1QI") (VNx2HF "VNx2QI") (VNx4HF "VNx4QI") (VNx8HF 
> + "VNx8QI") (VNx16HF "VNx16QI") (VNx32HF "VNx32QI") (VNx64HF 
> + "VNx64QI")
>    (VNx1SF "VNx1HI") (VNx2SF "VNx2HI") (VNx4SF "VNx4HI") (VNx8SF 
> "VNx8HI") (VNx16SF "VNx16HI") (VNx32SF "VNx32HI")
> +  (VNx1SI "VNx1HF") (VNx2SI "VNx2HF") (VNx4SI "VNx4HF") (VNx8SI 
> + "VNx8HF") (VNx16SI "VNx16HF") (VNx32SI "VNx32HF")
>    (VNx1DI "VNx1SF") (VNx2DI "VNx2SF") (VNx4DI "VNx4SF") (VNx8DI "VNx8SF") (VNx16DI "VNx16SF")
>    (VNx1DF "VNx1SI") (VNx2DF "VNx2SI") (VNx4DF "VNx4SI") (VNx8DF 
> "VNx8SI") (VNx16DF "VNx16SI")
> ])
> @@ -1263,6 +1282,7 @@ (define_mode_attr VLMUL1 [
>    (VNx8SI "VNx4SI") (VNx16SI "VNx4SI") (VNx32SI "VNx4SI")
>    (VNx1DI "VNx2DI") (VNx2DI "VNx2DI")
>    (VNx4DI "VNx2DI") (VNx8DI "VNx2DI") (VNx16DI "VNx2DI")
> +  (VNx1HF "VNx8HF") (VNx2HF "VNx8HF") (VNx4HF "VNx8HF") (VNx8HF 
> + "VNx8HF") (VNx16HF "VNx8HF") (VNx32HF "VNx8HF") (VNx64HF "VNx8HF")
>    (VNx1SF "VNx4SF") (VNx2SF "VNx4SF")
>    (VNx4SF "VNx4SF") (VNx8SF "VNx4SF") (VNx16SF "VNx4SF") (VNx32SF "VNx4SF")
>    (VNx1DF "VNx2DF") (VNx2DF "VNx2DF") @@ -1333,6 +1353,7 @@ 
> (define_mode_attr vlmul1 [
>    (VNx8SI "vnx4si") (VNx16SI "vnx4si") (VNx32SI "vnx4si")
>    (VNx1DI "vnx2di") (VNx2DI "vnx2di")
>    (VNx4DI "vnx2di") (VNx8DI "vnx2di") (VNx16DI "vnx2di")
> +  (VNx1HF "vnx8hf") (VNx2HF "vnx8hf") (VNx4HF "vnx8hf") (VNx8HF 
> + "vnx8hf") (VNx16HF "vnx8hf") (VNx32HF "vnx8hf") (VNx64HF "vnx8hf")
>    (VNx1SF "vnx4sf") (VNx2SF "vnx4sf")
>    (VNx4SF "vnx4sf") (VNx8SF "vnx4sf") (VNx16SF "vnx4sf") (VNx32SF "vnx4sf")
>    (VNx1DF "vnx2df") (VNx2DF "vnx2df") diff --git 
> a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-intrinsic.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-intrinsic.c
> new file mode 100644
> index 00000000000..0d244aac9ec
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-intrinsic.c
> @@ -0,0 +1,418 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +typedef _Float16 float16_t;
> +
> +vfloat16mf4_t test_vfadd_vv_f16mf4(vfloat16mf4_t op1, vfloat16mf4_t 
> +op2, size_t vl) {
> +  return __riscv_vfadd_vv_f16mf4(op1, op2, vl); }
> +
> +vfloat16m8_t test_vfadd_vf_f16m8(vfloat16m8_t op1, float16_t op2, 
> +size_t vl) {
> +  return __riscv_vfadd_vf_f16m8(op1, op2, vl); }
> +
> +vfloat16mf4_t test_vfsub_vv_f16mf4(vfloat16mf4_t op1, vfloat16mf4_t 
> +op2, size_t vl) {
> +  return __riscv_vfsub_vv_f16mf4(op1, op2, vl); }
> +
> +vfloat16m8_t test_vfsub_vf_f16m8(vfloat16m8_t op1, float16_t op2, 
> +size_t vl) {
> +  return __riscv_vfsub_vf_f16m8(op1, op2, vl); }
> +
> +vfloat16mf4_t test_vfrsub_vf_f16mf4(vfloat16mf4_t op1, float16_t op2, 
> +size_t vl) {
> +  return __riscv_vfrsub_vf_f16mf4(op1, op2, vl); }
> +
> +vfloat16m8_t test_vfrsub_vf_f16m8(vfloat16m8_t op1, float16_t op2, 
> +size_t vl) {
> +  return __riscv_vfrsub_vf_f16m8(op1, op2, vl); }
> +
> +vfloat32mf2_t test_vfwadd_vv_f32mf2(vfloat16mf4_t op1, vfloat16mf4_t 
> +op2, size_t vl) {
> +  return __riscv_vfwadd_vv_f32mf2(op1, op2, vl); }
> +
> +vfloat32m8_t test_vfwadd_vv_f32m8(vfloat16m4_t op1, vfloat16m4_t op2, 
> +size_t vl) {
> +  return __riscv_vfwadd_vv_f32m8(op1, op2, vl); }
> +
> +vfloat32mf2_t test_vfwadd_wv_f32mf2(vfloat32mf2_t op1, vfloat16mf4_t 
> +op2, size_t vl) {
> +  return __riscv_vfwadd_wv_f32mf2(op1, op2, vl); }
> +
> +vfloat32m8_t test_vfwadd_wv_f32m8(vfloat32m8_t op1, vfloat16m4_t op2, 
> +size_t vl) {
> +  return __riscv_vfwadd_wv_f32m8(op1, op2, vl); }
> +
> +vfloat32mf2_t test_vfwsub_vv_f32mf2(vfloat16mf4_t op1, vfloat16mf4_t 
> +op2, size_t vl) {
> +  return __riscv_vfwsub_vv_f32mf2(op1, op2, vl); }
> +
> +vfloat32m8_t test_vfwsub_vv_f32m8(vfloat16m4_t op1, vfloat16m4_t op2, 
> +size_t vl) {
> +  return __riscv_vfwsub_vv_f32m8(op1, op2, vl); }
> +
> +vfloat32mf2_t test_vfwsub_wv_f32mf2(vfloat32mf2_t op1, vfloat16mf4_t 
> +op2, size_t vl) {
> +  return __riscv_vfwsub_wv_f32mf2(op1, op2, vl); }
> +
> +vfloat32m8_t test_vfwsub_wv_f32m8(vfloat32m8_t op1, vfloat16m4_t op2, 
> +size_t vl) {
> +  return __riscv_vfwsub_wv_f32m8(op1, op2, vl); }
> +
> +vfloat16mf4_t test_vfmul_vv_f16mf4(vfloat16mf4_t op1, vfloat16mf4_t 
> +op2, size_t vl) {
> +  return __riscv_vfmul_vv_f16mf4(op1, op2, vl); }
> +
> +vfloat16m8_t test_vfmul_vf_f16m8(vfloat16m8_t op1, float16_t op2, 
> +size_t vl) {
> +  return __riscv_vfmul_vf_f16m8(op1, op2, vl); }
> +
> +vfloat16mf4_t test_vfdiv_vv_f16mf4(vfloat16mf4_t op1, vfloat16mf4_t 
> +op2, size_t vl) {
> +  return __riscv_vfdiv_vv_f16mf4(op1, op2, vl); }
> +
> +vfloat16m8_t test_vfdiv_vf_f16m8(vfloat16m8_t op1, float16_t op2, 
> +size_t vl) {
> +  return __riscv_vfdiv_vf_f16m8(op1, op2, vl); }
> +
> +vfloat16mf4_t test_vfrdiv_vf_f16mf4(vfloat16mf4_t op1, float16_t op2, 
> +size_t vl) {
> +  return __riscv_vfrdiv_vf_f16mf4(op1, op2, vl); }
> +
> +vfloat16m8_t test_vfrdiv_vf_f16m8(vfloat16m8_t op1, float16_t op2, 
> +size_t vl) {
> +  return __riscv_vfrdiv_vf_f16m8(op1, op2, vl); }
> +
> +vfloat32mf2_t test_vfwmul_vv_f32mf2(vfloat16mf4_t op1, vfloat16mf4_t 
> +op2, size_t vl) {
> +  return __riscv_vfwmul_vv_f32mf2(op1, op2, vl); }
> +
> +vfloat32m8_t test_vfwmul_vf_f32m8(vfloat16m4_t op1, float16_t op2, 
> +size_t vl) {
> +  return __riscv_vfwmul_vf_f32m8(op1, op2, vl); }
> +
> +vfloat16mf4_t test_vfmacc_vv_f16mf4(vfloat16mf4_t vd, vfloat16mf4_t 
> +vs1, vfloat16mf4_t vs2, size_t vl) {
> +  return __riscv_vfmacc_vv_f16mf4(vd, vs1, vs2, vl); }
> +
> +vfloat16m8_t test_vfmacc_vf_f16m8(vfloat16m8_t vd, float16_t rs1, 
> +vfloat16m8_t vs2, size_t vl) {
> +  return __riscv_vfmacc_vf_f16m8(vd, rs1, vs2, vl); }
> +
> +vfloat16mf4_t test_vfnmacc_vv_f16mf4(vfloat16mf4_t vd, vfloat16mf4_t 
> +vs1, vfloat16mf4_t vs2, size_t vl) {
> +  return __riscv_vfnmacc_vv_f16mf4(vd, vs1, vs2, vl); }
> +
> +vfloat16m8_t test_vfnmacc_vf_f16m8(vfloat16m8_t vd, float16_t rs1, 
> +vfloat16m8_t vs2, size_t vl) {
> +  return __riscv_vfnmacc_vf_f16m8(vd, rs1, vs2, vl); }
> +
> +vfloat16mf4_t test_vfmsac_vv_f16mf4(vfloat16mf4_t vd, vfloat16mf4_t 
> +vs1, vfloat16mf4_t vs2, size_t vl) {
> +  return __riscv_vfmsac_vv_f16mf4(vd, vs1, vs2, vl); }
> +
> +vfloat16m8_t test_vfmsac_vf_f16m8(vfloat16m8_t vd, float16_t rs1, 
> +vfloat16m8_t vs2, size_t vl) {
> +  return __riscv_vfmsac_vf_f16m8(vd, rs1, vs2, vl); }
> +
> +vfloat16mf4_t test_vfnmsac_vv_f16mf4(vfloat16mf4_t vd, vfloat16mf4_t 
> +vs1, vfloat16mf4_t vs2, size_t vl) {
> +  return __riscv_vfnmsac_vv_f16mf4(vd, vs1, vs2, vl); }
> +
> +vfloat16m8_t test_vfnmsac_vf_f16m8(vfloat16m8_t vd, float16_t rs1, 
> +vfloat16m8_t vs2, size_t vl) {
> +  return __riscv_vfnmsac_vf_f16m8(vd, rs1, vs2, vl); }
> +
> +vfloat16mf4_t test_vfmadd_vv_f16mf4(vfloat16mf4_t vd, vfloat16mf4_t 
> +vs1, vfloat16mf4_t vs2, size_t vl) {
> +  return __riscv_vfmadd_vv_f16mf4(vd, vs1, vs2, vl); }
> +
> +vfloat16m8_t test_vfmadd_vf_f16m8(vfloat16m8_t vd, float16_t rs1, 
> +vfloat16m8_t vs2, size_t vl) {
> +  return __riscv_vfmadd_vf_f16m8(vd, rs1, vs2, vl); }
> +
> +vfloat16mf4_t test_vfnmadd_vv_f16mf4(vfloat16mf4_t vd, vfloat16mf4_t 
> +vs1, vfloat16mf4_t vs2, size_t vl) {
> +  return __riscv_vfnmadd_vv_f16mf4(vd, vs1, vs2, vl); }
> +
> +vfloat16m8_t test_vfnmadd_vf_f16m8(vfloat16m8_t vd, float16_t rs1, 
> +vfloat16m8_t vs2, size_t vl) {
> +  return __riscv_vfnmadd_vf_f16m8(vd, rs1, vs2, vl); }
> +
> +vfloat16mf4_t test_vfmsub_vv_f16mf4(vfloat16mf4_t vd, vfloat16mf4_t 
> +vs1, vfloat16mf4_t vs2, size_t vl) {
> +  return __riscv_vfmsub_vv_f16mf4(vd, vs1, vs2, vl); }
> +
> +vfloat16m8_t test_vfmsub_vf_f16m8(vfloat16m8_t vd, float16_t rs1, 
> +vfloat16m8_t vs2, size_t vl) {
> +  return __riscv_vfmsub_vf_f16m8(vd, rs1, vs2, vl); }
> +
> +vfloat16mf4_t test_vfnmsub_vv_f16mf4(vfloat16mf4_t vd, vfloat16mf4_t 
> +vs1, vfloat16mf4_t vs2, size_t vl) {
> +  return __riscv_vfnmsub_vv_f16mf4(vd, vs1, vs2, vl); }
> +
> +vfloat16m8_t test_vfnmsub_vf_f16m8(vfloat16m8_t vd, float16_t rs1, 
> +vfloat16m8_t vs2, size_t vl) {
> +  return __riscv_vfnmsub_vf_f16m8(vd, rs1, vs2, vl); }
> +
> +vfloat32mf2_t test_vfwmacc_vv_f32mf2(vfloat32mf2_t vd, vfloat16mf4_t 
> +vs1, vfloat16mf4_t vs2, size_t vl) {
> +  return __riscv_vfwmacc_vv_f32mf2(vd, vs1, vs2, vl); }
> +
> +vfloat32m8_t test_vfwmacc_vf_f32m8(vfloat32m8_t vd, float16_t vs1, 
> +vfloat16m4_t vs2, size_t vl) {
> +  return __riscv_vfwmacc_vf_f32m8(vd, vs1, vs2, vl); }
> +
> +vfloat32mf2_t test_vfwnmacc_vv_f32mf2(vfloat32mf2_t vd, vfloat16mf4_t 
> +vs1, vfloat16mf4_t vs2, size_t vl) {
> +  return __riscv_vfwnmacc_vv_f32mf2(vd, vs1, vs2, vl); }
> +
> +vfloat32m8_t test_vfwnmacc_vf_f32m8(vfloat32m8_t vd, float16_t vs1, 
> +vfloat16m4_t vs2, size_t vl) {
> +  return __riscv_vfwnmacc_vf_f32m8(vd, vs1, vs2, vl); }
> +
> +vfloat32mf2_t test_vfwmsac_vv_f32mf2(vfloat32mf2_t vd, vfloat16mf4_t 
> +vs1, vfloat16mf4_t vs2, size_t vl) {
> +  return __riscv_vfwmsac_vv_f32mf2(vd, vs1, vs2, vl); }
> +
> +vfloat32m8_t test_vfwmsac_vf_f32m8(vfloat32m8_t vd, float16_t vs1, 
> +vfloat16m4_t vs2, size_t vl) {
> +  return __riscv_vfwmsac_vf_f32m8(vd, vs1, vs2, vl); }
> +
> +vfloat32mf2_t test_vfwnmsac_vv_f32mf2(vfloat32mf2_t vd, vfloat16mf4_t 
> +vs1, vfloat16mf4_t vs2, size_t vl) {
> +  return __riscv_vfwnmsac_vv_f32mf2(vd, vs1, vs2, vl); }
> +
> +vfloat32m8_t test_vfwnmsac_vf_f32m8(vfloat32m8_t vd, float16_t vs1, 
> +vfloat16m4_t vs2, size_t vl) {
> +  return __riscv_vfwnmsac_vf_f32m8(vd, vs1, vs2, vl); }
> +
> +vfloat16mf4_t test_vfsqrt_v_f16mf4(vfloat16mf4_t op1, size_t vl) {
> +  return __riscv_vfsqrt_v_f16mf4(op1, vl); }
> +
> +vfloat16m8_t test_vfsqrt_v_f16m8(vfloat16m8_t op1, size_t vl) {
> +  return __riscv_vfsqrt_v_f16m8(op1, vl); }
> +
> +vfloat16mf4_t test_vfrsqrt7_v_f16mf4(vfloat16mf4_t op1, size_t vl) {
> +  return __riscv_vfrsqrt7_v_f16mf4(op1, vl); }
> +
> +vfloat16m8_t test_vfrsqrt7_v_f16m8(vfloat16m8_t op1, size_t vl) {
> +  return __riscv_vfrsqrt7_v_f16m8(op1, vl); }
> +
> +vfloat16mf4_t test_vfrec7_v_f16mf4(vfloat16mf4_t op1, size_t vl) {
> +  return __riscv_vfrec7_v_f16mf4(op1, vl); }
> +
> +vfloat16m8_t test_vfrec7_v_f16m8(vfloat16m8_t op1, size_t vl) {
> +  return __riscv_vfrec7_v_f16m8(op1, vl); }
> +
> +vfloat16mf4_t test_vfmin_vv_f16mf4(vfloat16mf4_t op1, vfloat16mf4_t 
> +op2, size_t vl) {
> +  return __riscv_vfmin_vv_f16mf4(op1, op2, vl); }
> +
> +vfloat16m8_t test_vfmin_vf_f16m8(vfloat16m8_t op1, float16_t op2, 
> +size_t vl) {
> +  return __riscv_vfmin_vf_f16m8(op1, op2, vl); }
> +
> +vfloat16mf4_t test_vfmax_vv_f16mf4(vfloat16mf4_t op1, vfloat16mf4_t 
> +op2, size_t vl) {
> +  return __riscv_vfmax_vv_f16mf4(op1, op2, vl); }
> +
> +vfloat16m8_t test_vfmax_vf_f16m8(vfloat16m8_t op1, float16_t op2, 
> +size_t vl) {
> +  return __riscv_vfmax_vf_f16m8(op1, op2, vl); }
> +
> +vfloat16mf4_t test_vfsgnj_vv_f16mf4(vfloat16mf4_t op1, vfloat16mf4_t 
> +op2, size_t vl) {
> +  return __riscv_vfsgnj_vv_f16mf4(op1, op2, vl); }
> +
> +vfloat16m8_t test_vfsgnj_vf_f16m8(vfloat16m8_t op1, float16_t op2, 
> +size_t vl) {
> +  return __riscv_vfsgnj_vf_f16m8(op1, op2, vl); }
> +
> +vfloat16mf4_t test_vfsgnjn_vv_f16mf4(vfloat16mf4_t op1, vfloat16mf4_t 
> +op2, size_t vl) {
> +  return __riscv_vfsgnjn_vv_f16mf4(op1, op2, vl); }
> +
> +vfloat16m8_t test_vfsgnjn_vf_f16m8(vfloat16m8_t op1, float16_t op2, 
> +size_t vl) {
> +  return __riscv_vfsgnjn_vf_f16m8(op1, op2, vl); }
> +
> +vfloat16mf4_t test_vfsgnjx_vv_f16mf4(vfloat16mf4_t op1, vfloat16mf4_t 
> +op2, size_t vl) {
> +  return __riscv_vfsgnjx_vv_f16mf4(op1, op2, vl); }
> +
> +vfloat16m8_t test_vfsgnjx_vf_f16m8(vfloat16m8_t op1, float16_t op2, 
> +size_t vl) {
> +  return __riscv_vfsgnjx_vf_f16m8(op1, op2, vl); }
> +
> +vbool64_t test_vmfeq_vv_f16mf4_b64(vfloat16mf4_t op1, vfloat16mf4_t 
> +op2, size_t vl) {
> +  return __riscv_vmfeq_vv_f16mf4_b64(op1, op2, vl); }
> +
> +vbool2_t test_vmfeq_vf_f16m8_b2(vfloat16m8_t op1, float16_t op2, 
> +size_t vl) {
> +  return __riscv_vmfeq_vf_f16m8_b2(op1, op2, vl); }
> +
> +vbool64_t test_vmfne_vv_f16mf4_b64(vfloat16mf4_t op1, vfloat16mf4_t 
> +op2, size_t vl) {
> +  return __riscv_vmfne_vv_f16mf4_b64(op1, op2, vl); }
> +
> +vbool2_t test_vmfne_vf_f16m8_b2(vfloat16m8_t op1, float16_t op2, 
> +size_t vl) {
> +  return __riscv_vmfne_vf_f16m8_b2(op1, op2, vl); }
> +
> +vbool64_t test_vmflt_vv_f16mf4_b64(vfloat16mf4_t op1, vfloat16mf4_t 
> +op2, size_t vl) {
> +  return __riscv_vmflt_vv_f16mf4_b64(op1, op2, vl); }
> +
> +vbool2_t test_vmflt_vf_f16m8_b2(vfloat16m8_t op1, float16_t op2, 
> +size_t vl) {
> +  return __riscv_vmflt_vf_f16m8_b2(op1, op2, vl); }
> +
> +vbool64_t test_vmfle_vv_f16mf4_b64(vfloat16mf4_t op1, vfloat16mf4_t 
> +op2, size_t vl) {
> +  return __riscv_vmfle_vv_f16mf4_b64(op1, op2, vl); }
> +
> +vbool2_t test_vmfle_vf_f16m8_b2(vfloat16m8_t op1, float16_t op2, 
> +size_t vl) {
> +  return __riscv_vmfle_vf_f16m8_b2(op1, op2, vl); }
> +
> +vbool64_t test_vmfgt_vv_f16mf4_b64(vfloat16mf4_t op1, vfloat16mf4_t 
> +op2, size_t vl) {
> +  return __riscv_vmfgt_vv_f16mf4_b64(op1, op2, vl); }
> +
> +vbool2_t test_vmfgt_vf_f16m8_b2(vfloat16m8_t op1, float16_t op2, 
> +size_t vl) {
> +  return __riscv_vmfgt_vf_f16m8_b2(op1, op2, vl); }
> +
> +vbool64_t test_vmfge_vv_f16mf4_b64(vfloat16mf4_t op1, vfloat16mf4_t 
> +op2, size_t vl) {
> +  return __riscv_vmfge_vv_f16mf4_b64(op1, op2, vl); }
> +
> +vbool2_t test_vmfge_vf_f16m8_b2(vfloat16m8_t op1, float16_t op2, 
> +size_t vl) {
> +  return __riscv_vmfge_vf_f16m8_b2(op1, op2, vl); }
> +
> +vuint16mf4_t test_vfclass_v_u16mf4(vfloat16mf4_t op1, size_t vl) {
> +  return __riscv_vfclass_v_u16mf4(op1, vl); }
> +
> +vuint16m8_t test_vfclass_v_u16m8(vfloat16m8_t op1, size_t vl) {
> +  return __riscv_vfclass_v_u16m8(op1, vl); }
> +
> +vfloat16mf4_t test_vfmerge_vfm_f16mf4(vfloat16mf4_t op1, float16_t 
> +op2, vbool64_t mask, size_t vl) {
> +  return __riscv_vfmerge_vfm_f16mf4(op1, op2, mask, vl); }
> +
> +vfloat16m8_t test_vfmerge_vfm_f16m8(vfloat16m8_t op1, float16_t op2, 
> +vbool2_t mask, size_t vl) {
> +  return __riscv_vfmerge_vfm_f16m8(op1, op2, mask, vl); }
> +
> +vfloat16mf4_t test_vfmv_v_f_f16mf4(float16_t src, size_t vl) {
> +  return __riscv_vfmv_v_f_f16mf4(src, vl); }
> +
> +vfloat16m8_t test_vfmv_v_f_f16m8(float16_t src, size_t vl) {
> +  return __riscv_vfmv_v_f_f16m8(src, vl); }
> +
> +vint16mf4_t test_vfcvt_x_f_v_i16mf4(vfloat16mf4_t src, size_t vl) {
> +  return __riscv_vfcvt_x_f_v_i16mf4(src, vl); }
> +
> +vuint16m8_t test_vfcvt_xu_f_v_u16m8(vfloat16m8_t src, size_t vl) {
> +  return __riscv_vfcvt_xu_f_v_u16m8(src, vl); }
> +
> +vfloat16mf4_t test_vfcvt_f_x_v_f16mf4(vint16mf4_t src, size_t vl) {
> +  return __riscv_vfcvt_f_x_v_f16mf4(src, vl); }
> +
> +vfloat16m8_t test_vfcvt_f_xu_v_f16m8(vuint16m8_t src, size_t vl) {
> +  return __riscv_vfcvt_f_xu_v_f16m8(src, vl); }
> +
> +vint16mf4_t test_vfcvt_rtz_x_f_v_i16mf4(vfloat16mf4_t src, size_t vl) 
> +{
> +  return __riscv_vfcvt_rtz_x_f_v_i16mf4(src, vl); }
> +
> +vuint16m8_t test_vfcvt_rtz_xu_f_v_u16m8(vfloat16m8_t src, size_t vl) 
> +{
> +  return __riscv_vfcvt_rtz_xu_f_v_u16m8(src, vl); }
> +
> +vfloat16mf4_t test_vfwcvt_f_x_v_f16mf4(vint8mf8_t src, size_t vl) {
> +  return __riscv_vfwcvt_f_x_v_f16mf4(src, vl); }
> +
> +vuint32m8_t test_vfwcvt_xu_f_v_u32m8(vfloat16m4_t src, size_t vl) {
> +  return __riscv_vfwcvt_xu_f_v_u32m8(src, vl); }
> +
> +vint8mf8_t test_vfncvt_x_f_w_i8mf8(vfloat16mf4_t src, size_t vl) {
> +  return __riscv_vfncvt_x_f_w_i8mf8(src, vl); }
> +
> +vfloat16m4_t test_vfncvt_f_xu_w_f16m4(vuint32m8_t src, size_t vl) {
> +  return __riscv_vfncvt_f_xu_w_f16m4(src, vl); }
> +
> +/* { dg-final { scan-assembler-times 
> +{vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 43 } } 
> +*/
> +/* { dg-final { scan-assembler-times 
> +{vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*m4,\s*t[au],\s*m[au]} 11 } } 
> +*/
> +/* { dg-final { scan-assembler-times 
> +{vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*m8,\s*t[au],\s*m[au]} 34 } } 
> +*/
> +/* { dg-final { scan-assembler-times 
> +{vfadd\.v[fv]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfsub\.v[fv]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfrsub\.vf\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfwadd\.[wv]v\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 4 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfwsub\.[wv]v\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 4 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfmul\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfdiv\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfrdiv\.vf\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfwmul\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfmacc\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfnmacc\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfmsac\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfnmsac\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfmadd\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfnmadd\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfmsub\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfnmsub\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfwmacc\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfwnmacc\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfwmsac\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfwnmsac\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vfsqrt\.v\s+v[0-9]+,\s*v[0-9]+} 
> +2 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfrsqrt7\.v\s+v[0-9]+,\s*v[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vfrec7\.v\s+v[0-9]+,\s*v[0-9]+} 
> +2 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfmin\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfmax\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfsgnj\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfsgnjn\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfsgnjx\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times 
> +{vmfeq\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times 
> +{vmfne\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times 
> +{vmflt\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times 
> +{vmfle\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times 
> +{vmfgt\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times 
> +{vmfge\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfclass\.v\s+v[0-9]+,\s*v[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfmerge\.vfm\s+v[0-9]+,\s*v[0-9]+,\s*fa[0-9]+,\s*v0} 2 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfmv\.v\.f\s+v[0-9]+,\s*fa[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+} 1 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfcvt\.xu\.f\.v\s+v[0-9]+,\s*v[0-9]+} 1 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+} 1 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+} 1 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfcvt\.rtz\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+} 1 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfcvt\.rtz\.xu\.f\.v\s+v[0-9]+,\s*v[0-9]+} 1 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfwcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+} 1 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfwcvt\.xu\.f\.v\s+v[0-9]+,\s*v[0-9]+} 1 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfncvt\.x\.f\.w\s+v[0-9]+,\s*v[0-9]+} 1 } } */
> +/* { dg-final { scan-assembler-times 
> +{vfncvt\.f\.xu\.w\s+v[0-9]+,\s*v[0-9]+} 1 } } */
> --
> 2.34.1
>
>
diff mbox series

Patch

diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def b/gcc/config/riscv/riscv-vector-builtins-types.def
index 9cb3aca992e..1e2491de6d6 100644
--- a/gcc/config/riscv/riscv-vector-builtins-types.def
+++ b/gcc/config/riscv/riscv-vector-builtins-types.def
@@ -518,11 +518,24 @@  DEF_RVV_FULL_V_U_OPS (vuint64m2_t, RVV_REQUIRE_FULL_V)
 DEF_RVV_FULL_V_U_OPS (vuint64m4_t, RVV_REQUIRE_FULL_V)
 DEF_RVV_FULL_V_U_OPS (vuint64m8_t, RVV_REQUIRE_FULL_V)
 
+DEF_RVV_WEXTF_OPS (vfloat32mf2_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_WEXTF_OPS (vfloat32m1_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32)
+DEF_RVV_WEXTF_OPS (vfloat32m2_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32)
+DEF_RVV_WEXTF_OPS (vfloat32m4_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32)
+DEF_RVV_WEXTF_OPS (vfloat32m8_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32)
+
 DEF_RVV_WEXTF_OPS (vfloat64m1_t, RVV_REQUIRE_ELEN_FP_64)
 DEF_RVV_WEXTF_OPS (vfloat64m2_t, RVV_REQUIRE_ELEN_FP_64)
 DEF_RVV_WEXTF_OPS (vfloat64m4_t, RVV_REQUIRE_ELEN_FP_64)
 DEF_RVV_WEXTF_OPS (vfloat64m8_t, RVV_REQUIRE_ELEN_FP_64)
 
+DEF_RVV_CONVERT_I_OPS (vint16mf4_t, TARGET_ZVFH | RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_CONVERT_I_OPS (vint16mf2_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_I_OPS (vint16m1_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_I_OPS (vint16m2_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_I_OPS (vint16m4_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_I_OPS (vint16m8_t, TARGET_ZVFH)
+
 DEF_RVV_CONVERT_I_OPS (vint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
 DEF_RVV_CONVERT_I_OPS (vint32m1_t, 0)
 DEF_RVV_CONVERT_I_OPS (vint32m2_t, 0)
@@ -533,6 +546,13 @@  DEF_RVV_CONVERT_I_OPS (vint64m2_t, RVV_REQUIRE_ELEN_64)
 DEF_RVV_CONVERT_I_OPS (vint64m4_t, RVV_REQUIRE_ELEN_64)
 DEF_RVV_CONVERT_I_OPS (vint64m8_t, RVV_REQUIRE_ELEN_64)
 
+DEF_RVV_CONVERT_U_OPS (vuint16mf4_t, TARGET_ZVFH | RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_CONVERT_U_OPS (vuint16mf2_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_U_OPS (vuint16m1_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_U_OPS (vuint16m2_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_U_OPS (vuint16m4_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_U_OPS (vuint16m8_t, TARGET_ZVFH)
+
 DEF_RVV_CONVERT_U_OPS (vuint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
 DEF_RVV_CONVERT_U_OPS (vuint32m1_t, 0)
 DEF_RVV_CONVERT_U_OPS (vuint32m2_t, 0)
@@ -543,11 +563,23 @@  DEF_RVV_CONVERT_U_OPS (vuint64m2_t, RVV_REQUIRE_ELEN_64)
 DEF_RVV_CONVERT_U_OPS (vuint64m4_t, RVV_REQUIRE_ELEN_64)
 DEF_RVV_CONVERT_U_OPS (vuint64m8_t, RVV_REQUIRE_ELEN_64)
 
+DEF_RVV_WCONVERT_I_OPS (vint32mf2_t, TARGET_ZVFH | RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_WCONVERT_I_OPS (vint32m1_t, TARGET_ZVFH)
+DEF_RVV_WCONVERT_I_OPS (vint32m2_t, TARGET_ZVFH)
+DEF_RVV_WCONVERT_I_OPS (vint32m4_t, TARGET_ZVFH)
+DEF_RVV_WCONVERT_I_OPS (vint32m8_t, TARGET_ZVFH)
+
 DEF_RVV_WCONVERT_I_OPS (vint64m1_t, RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_ELEN_64)
 DEF_RVV_WCONVERT_I_OPS (vint64m2_t, RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_ELEN_64)
 DEF_RVV_WCONVERT_I_OPS (vint64m4_t, RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_ELEN_64)
 DEF_RVV_WCONVERT_I_OPS (vint64m8_t, RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_ELEN_64)
 
+DEF_RVV_WCONVERT_U_OPS (vuint32mf2_t, TARGET_ZVFH | RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_WCONVERT_U_OPS (vuint32m1_t, TARGET_ZVFH)
+DEF_RVV_WCONVERT_U_OPS (vuint32m2_t, TARGET_ZVFH)
+DEF_RVV_WCONVERT_U_OPS (vuint32m4_t, TARGET_ZVFH)
+DEF_RVV_WCONVERT_U_OPS (vuint32m8_t, TARGET_ZVFH)
+
 DEF_RVV_WCONVERT_U_OPS (vuint64m1_t, RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_ELEN_64)
 DEF_RVV_WCONVERT_U_OPS (vuint64m2_t, RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_ELEN_64)
 DEF_RVV_WCONVERT_U_OPS (vuint64m4_t, RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_ELEN_64)
diff --git a/gcc/config/riscv/vector-iterators.md b/gcc/config/riscv/vector-iterators.md
index 90743ed76c5..e4f2ba90799 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -296,6 +296,14 @@  (define_mode_iterator VWI_ZVE32 [
 ])
 
 (define_mode_iterator VF [
+  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
+  (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
+  (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
+
   (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
   (VNx2SF "TARGET_VECTOR_ELEN_FP_32")
   (VNx4SF "TARGET_VECTOR_ELEN_FP_32")
@@ -496,6 +504,13 @@  (define_mode_iterator VWEXTF [
 ])
 
 (define_mode_iterator VWCONVERTI [
+  (VNx1SI "TARGET_MIN_VLEN < 128 && TARGET_VECTOR_ELEN_FP_16")
+  (VNx2SI "TARGET_VECTOR_ELEN_FP_16")
+  (VNx4SI "TARGET_VECTOR_ELEN_FP_16")
+  (VNx8SI "TARGET_VECTOR_ELEN_FP_16")
+  (VNx16SI "TARGET_MIN_VLEN > 32 && TARGET_VECTOR_ELEN_FP_16")
+  (VNx32SI "TARGET_MIN_VLEN >= 128 && TARGET_VECTOR_ELEN_FP_16")
+
   (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
   (VNx2DI "TARGET_VECTOR_ELEN_64 && TARGET_VECTOR_ELEN_FP_32")
   (VNx4DI "TARGET_VECTOR_ELEN_64 && TARGET_VECTOR_ELEN_FP_32")
@@ -1239,17 +1254,21 @@  (define_mode_attr VINDEX_OCT_EXT [
 ])
 
 (define_mode_attr VCONVERT [
+  (VNx1HF "VNx1HI") (VNx2HF "VNx2HI") (VNx4HF "VNx4HI") (VNx8HF "VNx8HI") (VNx16HF "VNx16HI") (VNx32HF "VNx32HI") (VNx64HF "VNx64HI")
   (VNx1SF "VNx1SI") (VNx2SF "VNx2SI") (VNx4SF "VNx4SI") (VNx8SF "VNx8SI") (VNx16SF "VNx16SI") (VNx32SF "VNx32SI")
   (VNx1DF "VNx1DI") (VNx2DF "VNx2DI") (VNx4DF "VNx4DI") (VNx8DF "VNx8DI") (VNx16DF "VNx16DI")
 ])
 
 (define_mode_attr vconvert [
+  (VNx1HF "vnx1hi") (VNx2HF "vnx2hi") (VNx4HF "vnx4hi") (VNx8HF "vnx8hi") (VNx16HF "vnx16hi") (VNx32HF "vnx32hi") (VNx64HF "vnx64hi")
   (VNx1SF "vnx1si") (VNx2SF "vnx2si") (VNx4SF "vnx4si") (VNx8SF "vnx8si") (VNx16SF "vnx16si") (VNx32SF "vnx32si")
   (VNx1DF "vnx1di") (VNx2DF "vnx2di") (VNx4DF "vnx4di") (VNx8DF "vnx8di") (VNx16DF "vnx16di")
 ])
 
 (define_mode_attr VNCONVERT [
+  (VNx1HF "VNx1QI") (VNx2HF "VNx2QI") (VNx4HF "VNx4QI") (VNx8HF "VNx8QI") (VNx16HF "VNx16QI") (VNx32HF "VNx32QI") (VNx64HF "VNx64QI")
   (VNx1SF "VNx1HI") (VNx2SF "VNx2HI") (VNx4SF "VNx4HI") (VNx8SF "VNx8HI") (VNx16SF "VNx16HI") (VNx32SF "VNx32HI")
+  (VNx1SI "VNx1HF") (VNx2SI "VNx2HF") (VNx4SI "VNx4HF") (VNx8SI "VNx8HF") (VNx16SI "VNx16HF") (VNx32SI "VNx32HF")
   (VNx1DI "VNx1SF") (VNx2DI "VNx2SF") (VNx4DI "VNx4SF") (VNx8DI "VNx8SF") (VNx16DI "VNx16SF")
   (VNx1DF "VNx1SI") (VNx2DF "VNx2SI") (VNx4DF "VNx4SI") (VNx8DF "VNx8SI") (VNx16DF "VNx16SI")
 ])
@@ -1263,6 +1282,7 @@  (define_mode_attr VLMUL1 [
   (VNx8SI "VNx4SI") (VNx16SI "VNx4SI") (VNx32SI "VNx4SI")
   (VNx1DI "VNx2DI") (VNx2DI "VNx2DI")
   (VNx4DI "VNx2DI") (VNx8DI "VNx2DI") (VNx16DI "VNx2DI")
+  (VNx1HF "VNx8HF") (VNx2HF "VNx8HF") (VNx4HF "VNx8HF") (VNx8HF "VNx8HF") (VNx16HF "VNx8HF") (VNx32HF "VNx8HF") (VNx64HF "VNx8HF")
   (VNx1SF "VNx4SF") (VNx2SF "VNx4SF")
   (VNx4SF "VNx4SF") (VNx8SF "VNx4SF") (VNx16SF "VNx4SF") (VNx32SF "VNx4SF")
   (VNx1DF "VNx2DF") (VNx2DF "VNx2DF")
@@ -1333,6 +1353,7 @@  (define_mode_attr vlmul1 [
   (VNx8SI "vnx4si") (VNx16SI "vnx4si") (VNx32SI "vnx4si")
   (VNx1DI "vnx2di") (VNx2DI "vnx2di")
   (VNx4DI "vnx2di") (VNx8DI "vnx2di") (VNx16DI "vnx2di")
+  (VNx1HF "vnx8hf") (VNx2HF "vnx8hf") (VNx4HF "vnx8hf") (VNx8HF "vnx8hf") (VNx16HF "vnx8hf") (VNx32HF "vnx8hf") (VNx64HF "vnx8hf")
   (VNx1SF "vnx4sf") (VNx2SF "vnx4sf")
   (VNx4SF "vnx4sf") (VNx8SF "vnx4sf") (VNx16SF "vnx4sf") (VNx32SF "vnx4sf")
   (VNx1DF "vnx2df") (VNx2DF "vnx2df")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-intrinsic.c b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-intrinsic.c
new file mode 100644
index 00000000000..0d244aac9ec
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-intrinsic.c
@@ -0,0 +1,418 @@ 
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+typedef _Float16 float16_t;
+
+vfloat16mf4_t test_vfadd_vv_f16mf4(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vfadd_vv_f16mf4(op1, op2, vl);
+}
+
+vfloat16m8_t test_vfadd_vf_f16m8(vfloat16m8_t op1, float16_t op2, size_t vl) {
+  return __riscv_vfadd_vf_f16m8(op1, op2, vl);
+}
+
+vfloat16mf4_t test_vfsub_vv_f16mf4(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vfsub_vv_f16mf4(op1, op2, vl);
+}
+
+vfloat16m8_t test_vfsub_vf_f16m8(vfloat16m8_t op1, float16_t op2, size_t vl) {
+  return __riscv_vfsub_vf_f16m8(op1, op2, vl);
+}
+
+vfloat16mf4_t test_vfrsub_vf_f16mf4(vfloat16mf4_t op1, float16_t op2, size_t vl) {
+  return __riscv_vfrsub_vf_f16mf4(op1, op2, vl);
+}
+
+vfloat16m8_t test_vfrsub_vf_f16m8(vfloat16m8_t op1, float16_t op2, size_t vl) {
+  return __riscv_vfrsub_vf_f16m8(op1, op2, vl);
+}
+
+vfloat32mf2_t test_vfwadd_vv_f32mf2(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vfwadd_vv_f32mf2(op1, op2, vl);
+}
+
+vfloat32m8_t test_vfwadd_vv_f32m8(vfloat16m4_t op1, vfloat16m4_t op2, size_t vl) {
+  return __riscv_vfwadd_vv_f32m8(op1, op2, vl);
+}
+
+vfloat32mf2_t test_vfwadd_wv_f32mf2(vfloat32mf2_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vfwadd_wv_f32mf2(op1, op2, vl);
+}
+
+vfloat32m8_t test_vfwadd_wv_f32m8(vfloat32m8_t op1, vfloat16m4_t op2, size_t vl) {
+  return __riscv_vfwadd_wv_f32m8(op1, op2, vl);
+}
+
+vfloat32mf2_t test_vfwsub_vv_f32mf2(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vfwsub_vv_f32mf2(op1, op2, vl);
+}
+
+vfloat32m8_t test_vfwsub_vv_f32m8(vfloat16m4_t op1, vfloat16m4_t op2, size_t vl) {
+  return __riscv_vfwsub_vv_f32m8(op1, op2, vl);
+}
+
+vfloat32mf2_t test_vfwsub_wv_f32mf2(vfloat32mf2_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vfwsub_wv_f32mf2(op1, op2, vl);
+}
+
+vfloat32m8_t test_vfwsub_wv_f32m8(vfloat32m8_t op1, vfloat16m4_t op2, size_t vl) {
+  return __riscv_vfwsub_wv_f32m8(op1, op2, vl);
+}
+
+vfloat16mf4_t test_vfmul_vv_f16mf4(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vfmul_vv_f16mf4(op1, op2, vl);
+}
+
+vfloat16m8_t test_vfmul_vf_f16m8(vfloat16m8_t op1, float16_t op2, size_t vl) {
+  return __riscv_vfmul_vf_f16m8(op1, op2, vl);
+}
+
+vfloat16mf4_t test_vfdiv_vv_f16mf4(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vfdiv_vv_f16mf4(op1, op2, vl);
+}
+
+vfloat16m8_t test_vfdiv_vf_f16m8(vfloat16m8_t op1, float16_t op2, size_t vl) {
+  return __riscv_vfdiv_vf_f16m8(op1, op2, vl);
+}
+
+vfloat16mf4_t test_vfrdiv_vf_f16mf4(vfloat16mf4_t op1, float16_t op2, size_t vl) {
+  return __riscv_vfrdiv_vf_f16mf4(op1, op2, vl);
+}
+
+vfloat16m8_t test_vfrdiv_vf_f16m8(vfloat16m8_t op1, float16_t op2, size_t vl) {
+  return __riscv_vfrdiv_vf_f16m8(op1, op2, vl);
+}
+
+vfloat32mf2_t test_vfwmul_vv_f32mf2(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vfwmul_vv_f32mf2(op1, op2, vl);
+}
+
+vfloat32m8_t test_vfwmul_vf_f32m8(vfloat16m4_t op1, float16_t op2, size_t vl) {
+  return __riscv_vfwmul_vf_f32m8(op1, op2, vl);
+}
+
+vfloat16mf4_t test_vfmacc_vv_f16mf4(vfloat16mf4_t vd, vfloat16mf4_t vs1, vfloat16mf4_t vs2, size_t vl) {
+  return __riscv_vfmacc_vv_f16mf4(vd, vs1, vs2, vl);
+}
+
+vfloat16m8_t test_vfmacc_vf_f16m8(vfloat16m8_t vd, float16_t rs1, vfloat16m8_t vs2, size_t vl) {
+  return __riscv_vfmacc_vf_f16m8(vd, rs1, vs2, vl);
+}
+
+vfloat16mf4_t test_vfnmacc_vv_f16mf4(vfloat16mf4_t vd, vfloat16mf4_t vs1, vfloat16mf4_t vs2, size_t vl) {
+  return __riscv_vfnmacc_vv_f16mf4(vd, vs1, vs2, vl);
+}
+
+vfloat16m8_t test_vfnmacc_vf_f16m8(vfloat16m8_t vd, float16_t rs1, vfloat16m8_t vs2, size_t vl) {
+  return __riscv_vfnmacc_vf_f16m8(vd, rs1, vs2, vl);
+}
+
+vfloat16mf4_t test_vfmsac_vv_f16mf4(vfloat16mf4_t vd, vfloat16mf4_t vs1, vfloat16mf4_t vs2, size_t vl) {
+  return __riscv_vfmsac_vv_f16mf4(vd, vs1, vs2, vl);
+}
+
+vfloat16m8_t test_vfmsac_vf_f16m8(vfloat16m8_t vd, float16_t rs1, vfloat16m8_t vs2, size_t vl) {
+  return __riscv_vfmsac_vf_f16m8(vd, rs1, vs2, vl);
+}
+
+vfloat16mf4_t test_vfnmsac_vv_f16mf4(vfloat16mf4_t vd, vfloat16mf4_t vs1, vfloat16mf4_t vs2, size_t vl) {
+  return __riscv_vfnmsac_vv_f16mf4(vd, vs1, vs2, vl);
+}
+
+vfloat16m8_t test_vfnmsac_vf_f16m8(vfloat16m8_t vd, float16_t rs1, vfloat16m8_t vs2, size_t vl) {
+  return __riscv_vfnmsac_vf_f16m8(vd, rs1, vs2, vl);
+}
+
+vfloat16mf4_t test_vfmadd_vv_f16mf4(vfloat16mf4_t vd, vfloat16mf4_t vs1, vfloat16mf4_t vs2, size_t vl) {
+  return __riscv_vfmadd_vv_f16mf4(vd, vs1, vs2, vl);
+}
+
+vfloat16m8_t test_vfmadd_vf_f16m8(vfloat16m8_t vd, float16_t rs1, vfloat16m8_t vs2, size_t vl) {
+  return __riscv_vfmadd_vf_f16m8(vd, rs1, vs2, vl);
+}
+
+vfloat16mf4_t test_vfnmadd_vv_f16mf4(vfloat16mf4_t vd, vfloat16mf4_t vs1, vfloat16mf4_t vs2, size_t vl) {
+  return __riscv_vfnmadd_vv_f16mf4(vd, vs1, vs2, vl);
+}
+
+vfloat16m8_t test_vfnmadd_vf_f16m8(vfloat16m8_t vd, float16_t rs1, vfloat16m8_t vs2, size_t vl) {
+  return __riscv_vfnmadd_vf_f16m8(vd, rs1, vs2, vl);
+}
+
+vfloat16mf4_t test_vfmsub_vv_f16mf4(vfloat16mf4_t vd, vfloat16mf4_t vs1, vfloat16mf4_t vs2, size_t vl) {
+  return __riscv_vfmsub_vv_f16mf4(vd, vs1, vs2, vl);
+}
+
+vfloat16m8_t test_vfmsub_vf_f16m8(vfloat16m8_t vd, float16_t rs1, vfloat16m8_t vs2, size_t vl) {
+  return __riscv_vfmsub_vf_f16m8(vd, rs1, vs2, vl);
+}
+
+vfloat16mf4_t test_vfnmsub_vv_f16mf4(vfloat16mf4_t vd, vfloat16mf4_t vs1, vfloat16mf4_t vs2, size_t vl) {
+  return __riscv_vfnmsub_vv_f16mf4(vd, vs1, vs2, vl);
+}
+
+vfloat16m8_t test_vfnmsub_vf_f16m8(vfloat16m8_t vd, float16_t rs1, vfloat16m8_t vs2, size_t vl) {
+  return __riscv_vfnmsub_vf_f16m8(vd, rs1, vs2, vl);
+}
+
+vfloat32mf2_t test_vfwmacc_vv_f32mf2(vfloat32mf2_t vd, vfloat16mf4_t vs1, vfloat16mf4_t vs2, size_t vl) {
+  return __riscv_vfwmacc_vv_f32mf2(vd, vs1, vs2, vl);
+}
+
+vfloat32m8_t test_vfwmacc_vf_f32m8(vfloat32m8_t vd, float16_t vs1, vfloat16m4_t vs2, size_t vl) {
+  return __riscv_vfwmacc_vf_f32m8(vd, vs1, vs2, vl);
+}
+
+vfloat32mf2_t test_vfwnmacc_vv_f32mf2(vfloat32mf2_t vd, vfloat16mf4_t vs1, vfloat16mf4_t vs2, size_t vl) {
+  return __riscv_vfwnmacc_vv_f32mf2(vd, vs1, vs2, vl);
+}
+
+vfloat32m8_t test_vfwnmacc_vf_f32m8(vfloat32m8_t vd, float16_t vs1, vfloat16m4_t vs2, size_t vl) {
+  return __riscv_vfwnmacc_vf_f32m8(vd, vs1, vs2, vl);
+}
+
+vfloat32mf2_t test_vfwmsac_vv_f32mf2(vfloat32mf2_t vd, vfloat16mf4_t vs1, vfloat16mf4_t vs2, size_t vl) {
+  return __riscv_vfwmsac_vv_f32mf2(vd, vs1, vs2, vl);
+}
+
+vfloat32m8_t test_vfwmsac_vf_f32m8(vfloat32m8_t vd, float16_t vs1, vfloat16m4_t vs2, size_t vl) {
+  return __riscv_vfwmsac_vf_f32m8(vd, vs1, vs2, vl);
+}
+
+vfloat32mf2_t test_vfwnmsac_vv_f32mf2(vfloat32mf2_t vd, vfloat16mf4_t vs1, vfloat16mf4_t vs2, size_t vl) {
+  return __riscv_vfwnmsac_vv_f32mf2(vd, vs1, vs2, vl);
+}
+
+vfloat32m8_t test_vfwnmsac_vf_f32m8(vfloat32m8_t vd, float16_t vs1, vfloat16m4_t vs2, size_t vl) {
+  return __riscv_vfwnmsac_vf_f32m8(vd, vs1, vs2, vl);
+}
+
+vfloat16mf4_t test_vfsqrt_v_f16mf4(vfloat16mf4_t op1, size_t vl) {
+  return __riscv_vfsqrt_v_f16mf4(op1, vl);
+}
+
+vfloat16m8_t test_vfsqrt_v_f16m8(vfloat16m8_t op1, size_t vl) {
+  return __riscv_vfsqrt_v_f16m8(op1, vl);
+}
+
+vfloat16mf4_t test_vfrsqrt7_v_f16mf4(vfloat16mf4_t op1, size_t vl) {
+  return __riscv_vfrsqrt7_v_f16mf4(op1, vl);
+}
+
+vfloat16m8_t test_vfrsqrt7_v_f16m8(vfloat16m8_t op1, size_t vl) {
+  return __riscv_vfrsqrt7_v_f16m8(op1, vl);
+}
+
+vfloat16mf4_t test_vfrec7_v_f16mf4(vfloat16mf4_t op1, size_t vl) {
+  return __riscv_vfrec7_v_f16mf4(op1, vl);
+}
+
+vfloat16m8_t test_vfrec7_v_f16m8(vfloat16m8_t op1, size_t vl) {
+  return __riscv_vfrec7_v_f16m8(op1, vl);
+}
+
+vfloat16mf4_t test_vfmin_vv_f16mf4(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vfmin_vv_f16mf4(op1, op2, vl);
+}
+
+vfloat16m8_t test_vfmin_vf_f16m8(vfloat16m8_t op1, float16_t op2, size_t vl) {
+  return __riscv_vfmin_vf_f16m8(op1, op2, vl);
+}
+
+vfloat16mf4_t test_vfmax_vv_f16mf4(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vfmax_vv_f16mf4(op1, op2, vl);
+}
+
+vfloat16m8_t test_vfmax_vf_f16m8(vfloat16m8_t op1, float16_t op2, size_t vl) {
+  return __riscv_vfmax_vf_f16m8(op1, op2, vl);
+}
+
+vfloat16mf4_t test_vfsgnj_vv_f16mf4(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vfsgnj_vv_f16mf4(op1, op2, vl);
+}
+
+vfloat16m8_t test_vfsgnj_vf_f16m8(vfloat16m8_t op1, float16_t op2, size_t vl) {
+  return __riscv_vfsgnj_vf_f16m8(op1, op2, vl);
+}
+
+vfloat16mf4_t test_vfsgnjn_vv_f16mf4(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vfsgnjn_vv_f16mf4(op1, op2, vl);
+}
+
+vfloat16m8_t test_vfsgnjn_vf_f16m8(vfloat16m8_t op1, float16_t op2, size_t vl) {
+  return __riscv_vfsgnjn_vf_f16m8(op1, op2, vl);
+}
+
+vfloat16mf4_t test_vfsgnjx_vv_f16mf4(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vfsgnjx_vv_f16mf4(op1, op2, vl);
+}
+
+vfloat16m8_t test_vfsgnjx_vf_f16m8(vfloat16m8_t op1, float16_t op2, size_t vl) {
+  return __riscv_vfsgnjx_vf_f16m8(op1, op2, vl);
+}
+
+vbool64_t test_vmfeq_vv_f16mf4_b64(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vmfeq_vv_f16mf4_b64(op1, op2, vl);
+}
+
+vbool2_t test_vmfeq_vf_f16m8_b2(vfloat16m8_t op1, float16_t op2, size_t vl) {
+  return __riscv_vmfeq_vf_f16m8_b2(op1, op2, vl);
+}
+
+vbool64_t test_vmfne_vv_f16mf4_b64(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vmfne_vv_f16mf4_b64(op1, op2, vl);
+}
+
+vbool2_t test_vmfne_vf_f16m8_b2(vfloat16m8_t op1, float16_t op2, size_t vl) {
+  return __riscv_vmfne_vf_f16m8_b2(op1, op2, vl);
+}
+
+vbool64_t test_vmflt_vv_f16mf4_b64(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vmflt_vv_f16mf4_b64(op1, op2, vl);
+}
+
+vbool2_t test_vmflt_vf_f16m8_b2(vfloat16m8_t op1, float16_t op2, size_t vl) {
+  return __riscv_vmflt_vf_f16m8_b2(op1, op2, vl);
+}
+
+vbool64_t test_vmfle_vv_f16mf4_b64(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vmfle_vv_f16mf4_b64(op1, op2, vl);
+}
+
+vbool2_t test_vmfle_vf_f16m8_b2(vfloat16m8_t op1, float16_t op2, size_t vl) {
+  return __riscv_vmfle_vf_f16m8_b2(op1, op2, vl);
+}
+
+vbool64_t test_vmfgt_vv_f16mf4_b64(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vmfgt_vv_f16mf4_b64(op1, op2, vl);
+}
+
+vbool2_t test_vmfgt_vf_f16m8_b2(vfloat16m8_t op1, float16_t op2, size_t vl) {
+  return __riscv_vmfgt_vf_f16m8_b2(op1, op2, vl);
+}
+
+vbool64_t test_vmfge_vv_f16mf4_b64(vfloat16mf4_t op1, vfloat16mf4_t op2, size_t vl) {
+  return __riscv_vmfge_vv_f16mf4_b64(op1, op2, vl);
+}
+
+vbool2_t test_vmfge_vf_f16m8_b2(vfloat16m8_t op1, float16_t op2, size_t vl) {
+  return __riscv_vmfge_vf_f16m8_b2(op1, op2, vl);
+}
+
+vuint16mf4_t test_vfclass_v_u16mf4(vfloat16mf4_t op1, size_t vl) {
+  return __riscv_vfclass_v_u16mf4(op1, vl);
+}
+
+vuint16m8_t test_vfclass_v_u16m8(vfloat16m8_t op1, size_t vl) {
+  return __riscv_vfclass_v_u16m8(op1, vl);
+}
+
+vfloat16mf4_t test_vfmerge_vfm_f16mf4(vfloat16mf4_t op1, float16_t op2, vbool64_t mask, size_t vl) {
+  return __riscv_vfmerge_vfm_f16mf4(op1, op2, mask, vl);
+}
+
+vfloat16m8_t test_vfmerge_vfm_f16m8(vfloat16m8_t op1, float16_t op2, vbool2_t mask, size_t vl) {
+  return __riscv_vfmerge_vfm_f16m8(op1, op2, mask, vl);
+}
+
+vfloat16mf4_t test_vfmv_v_f_f16mf4(float16_t src, size_t vl) {
+  return __riscv_vfmv_v_f_f16mf4(src, vl);
+}
+
+vfloat16m8_t test_vfmv_v_f_f16m8(float16_t src, size_t vl) {
+  return __riscv_vfmv_v_f_f16m8(src, vl);
+}
+
+vint16mf4_t test_vfcvt_x_f_v_i16mf4(vfloat16mf4_t src, size_t vl) {
+  return __riscv_vfcvt_x_f_v_i16mf4(src, vl);
+}
+
+vuint16m8_t test_vfcvt_xu_f_v_u16m8(vfloat16m8_t src, size_t vl) {
+  return __riscv_vfcvt_xu_f_v_u16m8(src, vl);
+}
+
+vfloat16mf4_t test_vfcvt_f_x_v_f16mf4(vint16mf4_t src, size_t vl) {
+  return __riscv_vfcvt_f_x_v_f16mf4(src, vl);
+}
+
+vfloat16m8_t test_vfcvt_f_xu_v_f16m8(vuint16m8_t src, size_t vl) {
+  return __riscv_vfcvt_f_xu_v_f16m8(src, vl);
+}
+
+vint16mf4_t test_vfcvt_rtz_x_f_v_i16mf4(vfloat16mf4_t src, size_t vl) {
+  return __riscv_vfcvt_rtz_x_f_v_i16mf4(src, vl);
+}
+
+vuint16m8_t test_vfcvt_rtz_xu_f_v_u16m8(vfloat16m8_t src, size_t vl) {
+  return __riscv_vfcvt_rtz_xu_f_v_u16m8(src, vl);
+}
+
+vfloat16mf4_t test_vfwcvt_f_x_v_f16mf4(vint8mf8_t src, size_t vl) {
+  return __riscv_vfwcvt_f_x_v_f16mf4(src, vl);
+}
+
+vuint32m8_t test_vfwcvt_xu_f_v_u32m8(vfloat16m4_t src, size_t vl) {
+  return __riscv_vfwcvt_xu_f_v_u32m8(src, vl);
+}
+
+vint8mf8_t test_vfncvt_x_f_w_i8mf8(vfloat16mf4_t src, size_t vl) {
+  return __riscv_vfncvt_x_f_w_i8mf8(src, vl);
+}
+
+vfloat16m4_t test_vfncvt_f_xu_w_f16m4(vuint32m8_t src, size_t vl) {
+  return __riscv_vfncvt_f_xu_w_f16m4(src, vl);
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 43 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*m4,\s*t[au],\s*m[au]} 11 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*m8,\s*t[au],\s*m[au]} 34 } } */
+/* { dg-final { scan-assembler-times {vfadd\.v[fv]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfsub\.v[fv]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfrsub\.vf\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfwadd\.[wv]v\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vfwsub\.[wv]v\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vfmul\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfdiv\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfrdiv\.vf\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfwmul\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfmacc\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfnmacc\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfmsac\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfnmsac\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfmadd\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfnmadd\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfmsub\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfnmsub\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfwmacc\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfwnmacc\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfwmsac\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfwnmsac\.v[vf]\s+v[0-9]+,\s*[vfa]+[0-9]+,\s*v[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfsqrt\.v\s+v[0-9]+,\s*v[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfrsqrt7\.v\s+v[0-9]+,\s*v[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfrec7\.v\s+v[0-9]+,\s*v[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfmin\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfmax\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfsgnj\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfsgnjn\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfsgnjx\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vmfeq\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vmfne\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vmflt\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vmfle\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vmfgt\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vmfge\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfclass\.v\s+v[0-9]+,\s*v[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfmerge\.vfm\s+v[0-9]+,\s*v[0-9]+,\s*fa[0-9]+,\s*v0} 2 } } */
+/* { dg-final { scan-assembler-times {vfmv\.v\.f\s+v[0-9]+,\s*fa[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vfcvt\.xu\.f\.v\s+v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vfcvt\.rtz\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vfcvt\.rtz\.xu\.f\.v\s+v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vfwcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vfwcvt\.xu\.f\.v\s+v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vfncvt\.x\.f\.w\s+v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vfncvt\.f\.xu\.w\s+v[0-9]+,\s*v[0-9]+} 1 } } */