diff mbox series

[2/2] RISC-V: Constant FP Optimization with 'Zfa'

Message ID b16370ade4886697b0ed46ebf2d7835b89ab8cc2.1691991126.git.research_trasio@irq.a4lg.com
State New
Headers show
Series RISC-V: Make "prefetch.i" built-in usable | expand

Commit Message

Tsukasa OI Aug. 14, 2023, 5:32 a.m. UTC
From: Tsukasa OI <research_trasio@irq.a4lg.com>

This commit implements an optimization for assignments from a FP constant
to a FP register using a FLI instruction from the 'Zfa' extension.

To this purpose, it adds the constraint "H" and adds hardfloat move
instructions a "H -> f" variant.  Because FLI instruction constraint is
a bit complex, it adds the riscv_get_float_fli_const function to parse
a floating point constant if appropriate and the validness is contained
in its return value.

It also modifies the cost model for floating point constants and implements
simple yet bit-accurate printer for valid finite FLI constants.

This optimization is partially based on AArch64
(vmov instruction handling).

gcc/ChangeLog:

	* config/riscv/constraints.md (H): New.
	* config/riscv/riscv-protos.h (enum riscv_float_fli_const_type):
	New to identify the FLI constant type.
	(struct riscv_float_fli_const): New to represent an optional
	FLI constant.
	* config/riscv/riscv.cc (riscv_get_float_fli_const): New function
	to parse a CONST_DOUBLE and return optionally-valid FLI constant.
	(riscv_const_insns): Modify CONST_DOUBLE cost model.
	(riscv_output_move): Add FLI instruction outputs.
	(riscv_print_operand): Print a finite FLI constant as a hexadecimal
	FP representation or a string operand "min", "inf" or "nan".
	* config/riscv/riscv.md (movhf_hardfloat, movsf_hardfloat,
	movdf_hardfloat_rv32, movdf_hardfloat_rv64): Add "H" variant
	for 'Zfa' extension-based FP constant moves.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/zfa-fli-1.c: New test.
	* gcc.target/riscv/zfa-fli-2.c: Ditto.
	* gcc.target/riscv/zfa-fli-3.c: Ditto.
	* gcc.target/riscv/zfa-fli-4.c: Ditto.
	* gcc.target/riscv/zfa-fli-5.c: Ditto.
	* gcc.target/riscv/zfa-fli-6.c: Ditto.
	* gcc.target/riscv/zfa-fli-7.c: Ditto.
	* gcc.target/riscv/zfa-fli-8.c: Ditto.
---
 gcc/config/riscv/constraints.md            |   7 +
 gcc/config/riscv/riscv-protos.h            |  34 +++
 gcc/config/riscv/riscv.cc                  | 250 ++++++++++++++++++++-
 gcc/config/riscv/riscv.md                  |  24 +-
 gcc/testsuite/gcc.target/riscv/zfa-fli-1.c |  24 ++
 gcc/testsuite/gcc.target/riscv/zfa-fli-2.c |  24 ++
 gcc/testsuite/gcc.target/riscv/zfa-fli-3.c |  14 ++
 gcc/testsuite/gcc.target/riscv/zfa-fli-4.c | 111 +++++++++
 gcc/testsuite/gcc.target/riscv/zfa-fli-5.c |  98 ++++++++
 gcc/testsuite/gcc.target/riscv/zfa-fli-6.c |  61 +++++
 gcc/testsuite/gcc.target/riscv/zfa-fli-7.c |  30 +++
 gcc/testsuite/gcc.target/riscv/zfa-fli-8.c |  39 ++++
 12 files changed, 692 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-8.c

Comments

Jin Ma Aug. 14, 2023, 12:51 p.m. UTC | #1
Hi Tsukasa,
  What a coincidence, I also implemented zfa extension, which also includes fli related instructions :)

links: https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627294.html

> +  if (!TARGET_HARD_FLOAT || !TARGET_ZFA)
> +    return result;
> +  switch (GET_MODE (x))
> +    {
> +    case HFmode:
> +      /* Not only 'Zfhmin', either 'Zfh' or 'Zvfh' is required.  */
> +      if (!TARGET_ZFH && !TARGET_ZVFH)

When Zvfh means that zfh is also on, so there may be no need to judge
the TARGET_ZVFH here. By the way,the format here seems wrong, maybe 'tab'
is needed for alignment?

> +	return result;
> +      break;
> +    case SFmode: break;
> +    case DFmode: break;

Maybe we still have to judge TARGET_DOUBLE_FLOAT?

> +    default: return result;
> +    }
> +
> +  if (!CONST_DOUBLE_P (x))
> +    return result;

I think it might be better to judge whether x satisfies the CONST_DOUBLE_P
before switch (GET_MODE (x)) above.

> +
> +  r = *CONST_DOUBLE_REAL_VALUE (x);
> +
> +  if (REAL_VALUE_ISNAN (r))
> +    {
> +      long reprs[2] = { 0 };
> +      /* Compare with canonical NaN.  */
> +      switch (GET_MODE (x))
> +	{
> +	case HFmode:
> +	  reprs[0] = real_to_target (NULL, &r,
> +				     float_mode_for_size (16).require ());
> +	  /* 0x7e00: Canonical NaN for binary16.  */
> +	  if (reprs[0] != 0x7e00)
> +	    return result;
> +	  break;
> +	case SFmode:
> +	  reprs[0] = real_to_target (NULL, &r,
> +				     float_mode_for_size (32).require ());
> +	  /* 0x7fc00000: Canonical NaN for binary32.  */
> +	  if (reprs[0] != 0x7fc00000)
> +	    return result;
> +	  break;
> +	case DFmode:
> +	  real_to_target (reprs, &r, float_mode_for_size (64).require ());
> +	  if (FLOAT_WORDS_BIG_ENDIAN)
> +	    std::swap (reprs[0], reprs[1]);
> +	  /* 0x7ff80000_00000000: Canonical NaN for binary64.  */
> +	  if (reprs[0] != 0 || reprs[1] != 0x7ff80000)
> +	    return result;
> +	  break;
> +	default:
> +	  gcc_unreachable ();
> +	}
> +      result.type = RISCV_FLOAT_CONST_NAN;
> +      result.valid = true;
> +      return result;
> +    }
> +  else if (REAL_VALUE_ISINF (r))
> +    {
> +      if (REAL_VALUE_NEGATIVE (r))
> +	return result;
> +      result.type = RISCV_FLOAT_CONST_INF;
> +      result.valid = true;
> +      return result;
> +    }
> +
> +  bool sign = REAL_VALUE_NEGATIVE (r);
> +  result.sign = sign;
> +
> +  r = real_value_abs (&r);
> +  /* GCC internally does not use IEEE754-like encoding (where normalized
> +     significands are in the range [1, 2).  GCC uses [0.5, 1) (see real.cc).
> +     So, this exponent_p1 variable equals IEEE754 unbiased exponent + 1.  */
> +  int exponent_p1 = REAL_EXP (&r);
> +
> +  /* For the mantissa, we expand into two HOST_WIDE_INTS, apart from the
> +     highest (sign) bit, with a fixed binary point at bit point_pos.
> +     m1 holds the low part of the mantissa, m2 the high part.
> +     WARNING: If we ever have a representation using more than 2 * H_W_I - 1
> +     bits for the mantissa, this can fail (low bits will be lost).  */
> +  bool fail = false;
> +  real_ldexp (&m, &r, (2 * HOST_BITS_PER_WIDE_INT - 1) - exponent_p1);
> +  wide_int w = real_to_integer (&m, &fail, HOST_BITS_PER_WIDE_INT * 2);
> +  if (fail)
> +    return result;
> +
> +  /* If the low part of the mantissa has bits set we cannot represent
> +     the value.  */
> +  if (w.ulow () != 0)
> +    return result;
> +  /* We have rejected the lower HOST_WIDE_INT, so update our
> +     understanding of how many bits lie in the mantissa and
> +     look only at the high HOST_WIDE_INT.  */
> +  unsigned HOST_WIDE_INT mantissa = w.elt (1);
> +
> +  /* We cannot represent the value 0.0.  */
> +  if (mantissa == 0)
> +    return result;
> +
> +  /* We can only represent values with a mantissa of the form 1.xx.  */
> +  unsigned HOST_WIDE_INT mask
> +      = ((unsigned HOST_WIDE_INT) 1 << (HOST_BITS_PER_WIDE_INT - 4)) - 1;
> +  if ((mantissa & mask) != 0)
> +    return result;
> +  mantissa >>= HOST_BITS_PER_WIDE_INT - 4;
> +  /* Now the lowest 3-bits of mantissa should form (1.xx)b.  */
> +  gcc_assert (mantissa & (1u << 2));
> +  /* Mask out the highest bit.  */
> +  mantissa &= ~(1u << 2);
> +
> +  if (mantissa == 0)
> +    {
> +      /* We cannot represent any values but -1.0.  */
> +      if (exponent_p1 != 1 && sign)
> +	return result;
> +      switch (exponent_p1)
> +	{
> +	case -15: /* 1.0 * 2^(-16)  */
> +	case -14: /* 1.0 * 2^(-15)  */
> +	case -7:  /* 1.0 * 2^(- 8)  */
> +	case -6:  /* 1.0 * 2^(- 7)  */
> +	case 8:   /* 1.0 * 2^(+ 7)  */
> +	case 9:   /* 1.0 * 2^(+ 8)  */
> +	case 16:  /* 1.0 * 2^(+15)  */
> +	case 17:  /* 1.0 * 2^(+16)  */
> +	  break;
> +	default:
> +	  if (exponent_p1 >= -3 && exponent_p1 <= 5)
> +	    /* 1.0 * 2^[-4,4]  */
> +	    break;
> +	  switch (GET_MODE (x))
> +	    {
> +	    case HFmode: /* IEEE 754 binary16.  */
> +	      /* Minimum positive normal == 1.0 * 2^(-14)  */
> +	      if (exponent_p1 != -13) return result;
> +	      break;
> +	    case SFmode: /* IEEE 754 binary32.  */
> +	      /* Minimum positive normal == 1.0 * 2^(-126)  */
> +	      if (exponent_p1 != -125) return result;
> +	      break;
> +	    case DFmode: /* IEEE 754 binary64.  */
> +	      /* Minimum positive normal == 1.0 * 2^(-1022)  */
> +	      if (exponent_p1 != -1021) return result;
> +	      break;
> +	    default:
> +	      gcc_unreachable ();
> +	    }
> +	  result.type = RISCV_FLOAT_CONST_MIN;
> +	  result.valid = true;
> +	  return result;
> +	}
> +    }
> +  else
> +    {
> +      if (sign)
> +	return result;
> +      if (exponent_p1 < -1 || exponent_p1 > 2)
> +	return result;
> +      /* The value is (+1.xx)b * 2^[-2,1].
> +	 But we cannot represent (+1.11)b * 2^1 (that is 3.5). */
> +      if (exponent_p1 == 2 && mantissa == 3)
> +	return result;
> +    }
> +
> +  result.valid = true;
> +  result.mantissa_below_point = mantissa;
> +  result.biased_exponent = exponent_p1 + 15;
> +
> +  return result;
> +}
> +

This code is great and completely different from the way I implemented it.
I'm not sure which one is better, but my idea is that the fli instruction
corresponds to three tables (HF, SF and DF), all of which represent
specific values. the library in gcc's real.h can very well convert
the corresponding values into the values in the table, so it is only
necessary to perform a simple binary search to look up the tables.

@@ -1362,17 +1545,14 @@  riscv_const_insns (rtx x)
 		   constant incurs a literal-pool access.  Allow this in
 		   order to increase vectorization possibilities.  */
 		int n = riscv_const_insns (elt);
-		if (CONST_DOUBLE_P (elt))
-		    return 1 + 4; /* vfmv.v.f + memory access.  */
> +		/* We need as many insns as it takes to load the constant
> +		   into a GPR and one vmv.v.x.  */
> +		if (n != 0)
> +		  return 1 + n;
> +		else if (CONST_DOUBLE_P (elt))
> +		  return 1 + 4; /* vfmv.v.f + memory access.  */
 		else
-		  {
-		    /* We need as many insns as it takes to load the constant
-		       into a GPR and one vmv.v.x.  */
-		    if (n != 0)
-		      return 1 + n;
-		    else
-		      return 1 + 4; /*vmv.v.x + memory access.  */
-		  }
> +		  return 1 + 4; /* vmv.v.x + memory access.  */
 	      }
 	  }

I don't seem to understand here, if n = = 0, always return 1 + 4?
If so, it could be
if (n != 0)
   return 1 + n;
else
  return 1 + 4;

@@ -5117,6 +5313,36 @@  riscv_print_operand (FILE *file, rtx op, int letter)
 	    output_address (mode, XEXP (op, 0));
 	  break;
 
> +	case CONST_DOUBLE:
> +	  {
> +	    struct riscv_float_fli_const flt = riscv_get_float_fli_const (op);
> +	    if (flt.valid)
> +	      {
> +		switch (flt.type)
> +		  {
> +		  case RISCV_FLOAT_CONST_MIN:
> +		    fputs ("min", file);
> +		    break;
> +		  case RISCV_FLOAT_CONST_INF:
> +		    fputs ("inf", file);
> +		    break;
> +		  case RISCV_FLOAT_CONST_NAN:
> +		    fputs ("nan", file);
> +		    break;
> +		  default:
> +		    /* Use simpler (and bit-perfect) printer.  */
> +		    if (flt.sign)
> +		      fputc ('-', file);
> +		    fprintf (file, "0x1.%cp%+d",
> +			     "048c"[flt.mantissa_below_point],
> +			     (int) flt.biased_exponent - 16);
> +		    break;
> +		  }
> +		break;
> +	      }
> +	  }
> +	  /* Fall through.  */

Display floating-point values at the assembly level can refer llvm
https://reviews.llvm.org/D145645. 

It may also be necessary to deal with riscv_split_64bit_move_p
and riscv_legitimize_const_move for rv32, otherwise the mov of
DFmode on rv32 will be split into high 32-bit mov and low 32-bit
mov, thus unable to generate fli instructions.

Thanks,
Jin
Tsukasa OI Aug. 15, 2023, 3:38 a.m. UTC | #2
On 2023/08/14 21:51, Jin Ma wrote:
> Hi Tsukasa,
>   What a coincidence, I also implemented zfa extension, which also includes fli related instructions :)

Hi, I'm glad to know that someone is working on this extension more
comprehensively (especially when "someone" is an experienced GCC
contributor).  I prefer your patch set in general and glad to learn from
your patch set and your response that my approach was not *that* bad as
I expected.

When a new extension gets available, I will be more confident making a
patch set for GCC (as I already do in GNU Binutils).

> 
> links: https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627294.html
> 
>> +  if (!TARGET_HARD_FLOAT || !TARGET_ZFA)
>> +    return result;
>> +  switch (GET_MODE (x))
>> +    {
>> +    case HFmode:
>> +      /* Not only 'Zfhmin', either 'Zfh' or 'Zvfh' is required.  */
>> +      if (!TARGET_ZFH && !TARGET_ZVFH)
> 
> When Zvfh means that zfh is also on, so there may be no need to judge
> the TARGET_ZVFH here. By the way,the format here seems wrong, maybe 'tab'
> is needed for alignment?

For indentation, I believe this is okay considering 3 indent (soft tab)
from the top (meaning 6 spaces).

For specification requirements, I think I'm correct.

The spec says that 'Zvfh' depends on 'Zve32f' and 'Zfhmin'.  'Zfhmin' is
a conversion-only 'Zfh' subset ('Zve32f' doesn't require any
FP16-related extensions).

Note that "fli.h" requires 'Zfa' and ('Zfh' and/or 'Zvfh').

So, 'Zfh' alone will not be sufficient to check requirements to the
"fli.h" instruction.  So, checking TARGET_ZFH || TARGET_ZVFH (for
existence of the "fli.h") should be correct and I think your patch needs
to be changed "in the long term".

"In the long term" means that, current GNU Binutils has a bug which
"fli.h" requires 'Zfa' and 'Zfh' ('Zfa' and 'Zvfh' does not work).
My initial 'Zfa' proposal (improved by Christoph Müllner and upstreamed
into master) intentionally ignored this case because I assumed that
approval/ratification of 'Zvfh' will take some time and we have a time
to fix before a release of Binutils following approval of both 'Zfa' and
'Zvfh' (it turned out to be wrong).

cf. <https://sourceware.org/pipermail/binutils/2023-August/129006.html>

So, "fixing" this part (on your patch) alone will not make the program
work (on the simulator) because current buggy GNU Binutils won't accept
it.  I'm working on it on the GNU Binutils side.

> 
>> +	return result;
>> +      break;
>> +    case SFmode: break;
>> +    case DFmode: break;
> 
> Maybe we still have to judge TARGET_DOUBLE_FLOAT?

Indeed.  I just missed that.

> 
>> +    default: return result;
>> +    }
>> +
>> +  if (!CONST_DOUBLE_P (x))
>> +    return result;
> 
> I think it might be better to judge whether x satisfies the CONST_DOUBLE_P
> before switch (GET_MODE (x)) above.

That's correct.  I think that's a part of leftover when I'm experimenting.

> 
>> +
>> +  r = *CONST_DOUBLE_REAL_VALUE (x);
>> +
>> +  if (REAL_VALUE_ISNAN (r))
>> +    {
>> +      long reprs[2] = { 0 };
>> +      /* Compare with canonical NaN.  */
>> +      switch (GET_MODE (x))
>> +	{
>> +	case HFmode:
>> +	  reprs[0] = real_to_target (NULL, &r,
>> +				     float_mode_for_size (16).require ());
>> +	  /* 0x7e00: Canonical NaN for binary16.  */
>> +	  if (reprs[0] != 0x7e00)
>> +	    return result;
>> +	  break;
>> +	case SFmode:
>> +	  reprs[0] = real_to_target (NULL, &r,
>> +				     float_mode_for_size (32).require ());
>> +	  /* 0x7fc00000: Canonical NaN for binary32.  */
>> +	  if (reprs[0] != 0x7fc00000)
>> +	    return result;
>> +	  break;
>> +	case DFmode:
>> +	  real_to_target (reprs, &r, float_mode_for_size (64).require ());
>> +	  if (FLOAT_WORDS_BIG_ENDIAN)
>> +	    std::swap (reprs[0], reprs[1]);
>> +	  /* 0x7ff80000_00000000: Canonical NaN for binary64.  */
>> +	  if (reprs[0] != 0 || reprs[1] != 0x7ff80000)
>> +	    return result;
>> +	  break;
>> +	default:
>> +	  gcc_unreachable ();
>> +	}
>> +      result.type = RISCV_FLOAT_CONST_NAN;
>> +      result.valid = true;
>> +      return result;
>> +    }
>> +  else if (REAL_VALUE_ISINF (r))
>> +    {
>> +      if (REAL_VALUE_NEGATIVE (r))
>> +	return result;
>> +      result.type = RISCV_FLOAT_CONST_INF;
>> +      result.valid = true;
>> +      return result;
>> +    }
>> +
>> +  bool sign = REAL_VALUE_NEGATIVE (r);
>> +  result.sign = sign;
>> +
>> +  r = real_value_abs (&r);
>> +  /* GCC internally does not use IEEE754-like encoding (where normalized
>> +     significands are in the range [1, 2).  GCC uses [0.5, 1) (see real.cc).
>> +     So, this exponent_p1 variable equals IEEE754 unbiased exponent + 1.  */
>> +  int exponent_p1 = REAL_EXP (&r);
>> +
>> +  /* For the mantissa, we expand into two HOST_WIDE_INTS, apart from the
>> +     highest (sign) bit, with a fixed binary point at bit point_pos.
>> +     m1 holds the low part of the mantissa, m2 the high part.
>> +     WARNING: If we ever have a representation using more than 2 * H_W_I - 1
>> +     bits for the mantissa, this can fail (low bits will be lost).  */
>> +  bool fail = false;
>> +  real_ldexp (&m, &r, (2 * HOST_BITS_PER_WIDE_INT - 1) - exponent_p1);
>> +  wide_int w = real_to_integer (&m, &fail, HOST_BITS_PER_WIDE_INT * 2);
>> +  if (fail)
>> +    return result;
>> +
>> +  /* If the low part of the mantissa has bits set we cannot represent
>> +     the value.  */
>> +  if (w.ulow () != 0)
>> +    return result;
>> +  /* We have rejected the lower HOST_WIDE_INT, so update our
>> +     understanding of how many bits lie in the mantissa and
>> +     look only at the high HOST_WIDE_INT.  */
>> +  unsigned HOST_WIDE_INT mantissa = w.elt (1);
>> +
>> +  /* We cannot represent the value 0.0.  */
>> +  if (mantissa == 0)
>> +    return result;
>> +
>> +  /* We can only represent values with a mantissa of the form 1.xx.  */
>> +  unsigned HOST_WIDE_INT mask
>> +      = ((unsigned HOST_WIDE_INT) 1 << (HOST_BITS_PER_WIDE_INT - 4)) - 1;
>> +  if ((mantissa & mask) != 0)
>> +    return result;
>> +  mantissa >>= HOST_BITS_PER_WIDE_INT - 4;
>> +  /* Now the lowest 3-bits of mantissa should form (1.xx)b.  */
>> +  gcc_assert (mantissa & (1u << 2));
>> +  /* Mask out the highest bit.  */
>> +  mantissa &= ~(1u << 2);
>> +
>> +  if (mantissa == 0)
>> +    {
>> +      /* We cannot represent any values but -1.0.  */
>> +      if (exponent_p1 != 1 && sign)
>> +	return result;
>> +      switch (exponent_p1)
>> +	{
>> +	case -15: /* 1.0 * 2^(-16)  */
>> +	case -14: /* 1.0 * 2^(-15)  */
>> +	case -7:  /* 1.0 * 2^(- 8)  */
>> +	case -6:  /* 1.0 * 2^(- 7)  */
>> +	case 8:   /* 1.0 * 2^(+ 7)  */
>> +	case 9:   /* 1.0 * 2^(+ 8)  */
>> +	case 16:  /* 1.0 * 2^(+15)  */
>> +	case 17:  /* 1.0 * 2^(+16)  */
>> +	  break;
>> +	default:
>> +	  if (exponent_p1 >= -3 && exponent_p1 <= 5)
>> +	    /* 1.0 * 2^[-4,4]  */
>> +	    break;
>> +	  switch (GET_MODE (x))
>> +	    {
>> +	    case HFmode: /* IEEE 754 binary16.  */
>> +	      /* Minimum positive normal == 1.0 * 2^(-14)  */
>> +	      if (exponent_p1 != -13) return result;
>> +	      break;
>> +	    case SFmode: /* IEEE 754 binary32.  */
>> +	      /* Minimum positive normal == 1.0 * 2^(-126)  */
>> +	      if (exponent_p1 != -125) return result;
>> +	      break;
>> +	    case DFmode: /* IEEE 754 binary64.  */
>> +	      /* Minimum positive normal == 1.0 * 2^(-1022)  */
>> +	      if (exponent_p1 != -1021) return result;
>> +	      break;
>> +	    default:
>> +	      gcc_unreachable ();
>> +	    }
>> +	  result.type = RISCV_FLOAT_CONST_MIN;
>> +	  result.valid = true;
>> +	  return result;
>> +	}
>> +    }
>> +  else
>> +    {
>> +      if (sign)
>> +	return result;
>> +      if (exponent_p1 < -1 || exponent_p1 > 2)
>> +	return result;
>> +      /* The value is (+1.xx)b * 2^[-2,1].
>> +	 But we cannot represent (+1.11)b * 2^1 (that is 3.5). */
>> +      if (exponent_p1 == 2 && mantissa == 3)
>> +	return result;
>> +    }
>> +
>> +  result.valid = true;
>> +  result.mantissa_below_point = mantissa;
>> +  result.biased_exponent = exponent_p1 + 15;
>> +
>> +  return result;
>> +}
>> +
> 
> This code is great and completely different from the way I implemented it.
> I'm not sure which one is better, but my idea is that the fli instruction
> corresponds to three tables (HF, SF and DF), all of which represent
> specific values. the library in gcc's real.h can very well convert
> the corresponding values into the values in the table, so it is only
> necessary to perform a simple binary search to look up the tables.

Yup.  My approach (based on AArch64's VMOV.F32 constraint checking code)
is more generic but I think constants with single FLI instruction don't
need to be that generic.

If multi-instruction FLI sequence gets realistic, this kind of generic
approach (handling finite constants precisely) will be helpful (multi
FLI sequence with addition might need some additional measures to avoid
underflow, though).  But for now, I think your approach is better and
simpler.

> 
> @@ -1362,17 +1545,14 @@  riscv_const_insns (rtx x)
>  		   constant incurs a literal-pool access.  Allow this in
>  		   order to increase vectorization possibilities.  */
>  		int n = riscv_const_insns (elt);
> -		if (CONST_DOUBLE_P (elt))
> -		    return 1 + 4; /* vfmv.v.f + memory access.  */
>> +		/* We need as many insns as it takes to load the constant
>> +		   into a GPR and one vmv.v.x.  */
>> +		if (n != 0)
>> +		  return 1 + n;
>> +		else if (CONST_DOUBLE_P (elt))
>> +		  return 1 + 4; /* vfmv.v.f + memory access.  */
>  		else
> -		  {
> -		    /* We need as many insns as it takes to load the constant
> -		       into a GPR and one vmv.v.x.  */
> -		    if (n != 0)
> -		      return 1 + n;
> -		    else
> -		      return 1 + 4; /*vmv.v.x + memory access.  */
> -		  }
>> +		  return 1 + 4; /* vmv.v.x + memory access.  */
>  	      }
>  	  }
> 
> I don't seem to understand here, if n = = 0, always return 1 + 4?
> If so, it could be
> if (n != 0)
>    return 1 + n;
> else
>   return 1 + 4;
> 
> @@ -5117,6 +5313,36 @@  riscv_print_operand (FILE *file, rtx op, int letter)
>  	    output_address (mode, XEXP (op, 0));
>  	  break;
>  
>> +	case CONST_DOUBLE:
>> +	  {
>> +	    struct riscv_float_fli_const flt = riscv_get_float_fli_const (op);
>> +	    if (flt.valid)
>> +	      {
>> +		switch (flt.type)
>> +		  {
>> +		  case RISCV_FLOAT_CONST_MIN:
>> +		    fputs ("min", file);
>> +		    break;
>> +		  case RISCV_FLOAT_CONST_INF:
>> +		    fputs ("inf", file);
>> +		    break;
>> +		  case RISCV_FLOAT_CONST_NAN:
>> +		    fputs ("nan", file);
>> +		    break;
>> +		  default:
>> +		    /* Use simpler (and bit-perfect) printer.  */
>> +		    if (flt.sign)
>> +		      fputc ('-', file);
>> +		    fprintf (file, "0x1.%cp%+d",
>> +			     "048c"[flt.mantissa_below_point],
>> +			     (int) flt.biased_exponent - 16);
>> +		    break;
>> +		  }
>> +		break;
>> +	      }
>> +	  }
>> +	  /* Fall through.  */
> 
> Display floating-point values at the assembly level can refer llvm
> https://reviews.llvm.org/D145645. 

Thanks for the link.  I personally prefer hexfloats to avoid precision
problems as possible and that's how GNU Binutils prints the FLI
constants.  But that makes sense (and I feel decimals are okay).

> 
> It may also be necessary to deal with riscv_split_64bit_move_p
> and riscv_legitimize_const_move for rv32, otherwise the mov of
> DFmode on rv32 will be split into high 32-bit mov and low 32-bit
> mov, thus unable to generate fli instructions.

Thanks for letting me know.  I'm fighting against GCC's large code base
for future contribution and that's a lot of help for me.

Thanks,
Tsukasa

> 
> Thanks,
> Jin
>
Tsukasa OI Aug. 15, 2023, 7:59 a.m. UTC | #3
On 2023/08/15 12:38, Tsukasa OI wrote:
> On 2023/08/14 21:51, Jin Ma wrote:
>> Hi Tsukasa,
>>   What a coincidence, I also implemented zfa extension, which also includes fli related instructions :)
> 
> Hi, I'm glad to know that someone is working on this extension more
> comprehensively (especially when "someone" is an experienced GCC
> contributor).  I prefer your patch set in general and glad to learn from
> your patch set and your response that my approach was not *that* bad as
> I expected.
> 
> When a new extension gets available, I will be more confident making a
> patch set for GCC (as I already do in GNU Binutils).
> 
>>
>> links: https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627294.html
>>
>>> +  if (!TARGET_HARD_FLOAT || !TARGET_ZFA)
>>> +    return result;
>>> +  switch (GET_MODE (x))
>>> +    {
>>> +    case HFmode:
>>> +      /* Not only 'Zfhmin', either 'Zfh' or 'Zvfh' is required.  */
>>> +      if (!TARGET_ZFH && !TARGET_ZVFH)
>>
>> When Zvfh means that zfh is also on, so there may be no need to judge
>> the TARGET_ZVFH here. By the way,the format here seems wrong, maybe 'tab'
>> is needed for alignment?
> 
> For indentation, I believe this is okay considering 3 indent (soft tab)
> from the top (meaning 6 spaces).
> 
> For specification requirements, I think I'm correct.
> 
> The spec says that 'Zvfh' depends on 'Zve32f' and 'Zfhmin'.  'Zfhmin' is
> a conversion-only 'Zfh' subset ('Zve32f' doesn't require any
> FP16-related extensions).
> 
> Note that "fli.h" requires 'Zfa' and ('Zfh' and/or 'Zvfh').
> 
> So, 'Zfh' alone will not be sufficient to check requirements to the
> "fli.h" instruction.  So, checking TARGET_ZFH || TARGET_ZVFH (for
> existence of the "fli.h") should be correct and I think your patch needs
> to be changed "in the long term".
> 
> "In the long term" means that, current GNU Binutils has a bug which
> "fli.h" requires 'Zfa' and 'Zfh' ('Zfa' and 'Zvfh' does not work).
> My initial 'Zfa' proposal (improved by Christoph Müllner and upstreamed
> into master) intentionally ignored this case because I assumed that
> approval/ratification of 'Zvfh' will take some time and we have a time
> to fix before a release of Binutils following approval of both 'Zfa' and
> 'Zvfh' (it turned out to be wrong).
> 
> cf. <https://sourceware.org/pipermail/binutils/2023-August/129006.html>
> 
> So, "fixing" this part (on your patch) alone will not make the program
> work (on the simulator) because current buggy GNU Binutils won't accept
> it.  I'm working on it on the GNU Binutils side.

Okay, the bug is fixed on GNU Binutils (master) and waiting approval
from the release maintainer (for binutils-2_41-branch).

Thanks,
Tsukasa

> 
>>
>>> +	return result;
>>> +      break;
>>> +    case SFmode: break;
>>> +    case DFmode: break;
>>
>> Maybe we still have to judge TARGET_DOUBLE_FLOAT?
> 
> Indeed.  I just missed that.
> 
>>
>>> +    default: return result;
>>> +    }
>>> +
>>> +  if (!CONST_DOUBLE_P (x))
>>> +    return result;
>>
>> I think it might be better to judge whether x satisfies the CONST_DOUBLE_P
>> before switch (GET_MODE (x)) above.
> 
> That's correct.  I think that's a part of leftover when I'm experimenting.
> 
>>
>>> +
>>> +  r = *CONST_DOUBLE_REAL_VALUE (x);
>>> +
>>> +  if (REAL_VALUE_ISNAN (r))
>>> +    {
>>> +      long reprs[2] = { 0 };
>>> +      /* Compare with canonical NaN.  */
>>> +      switch (GET_MODE (x))
>>> +	{
>>> +	case HFmode:
>>> +	  reprs[0] = real_to_target (NULL, &r,
>>> +				     float_mode_for_size (16).require ());
>>> +	  /* 0x7e00: Canonical NaN for binary16.  */
>>> +	  if (reprs[0] != 0x7e00)
>>> +	    return result;
>>> +	  break;
>>> +	case SFmode:
>>> +	  reprs[0] = real_to_target (NULL, &r,
>>> +				     float_mode_for_size (32).require ());
>>> +	  /* 0x7fc00000: Canonical NaN for binary32.  */
>>> +	  if (reprs[0] != 0x7fc00000)
>>> +	    return result;
>>> +	  break;
>>> +	case DFmode:
>>> +	  real_to_target (reprs, &r, float_mode_for_size (64).require ());
>>> +	  if (FLOAT_WORDS_BIG_ENDIAN)
>>> +	    std::swap (reprs[0], reprs[1]);
>>> +	  /* 0x7ff80000_00000000: Canonical NaN for binary64.  */
>>> +	  if (reprs[0] != 0 || reprs[1] != 0x7ff80000)
>>> +	    return result;
>>> +	  break;
>>> +	default:
>>> +	  gcc_unreachable ();
>>> +	}
>>> +      result.type = RISCV_FLOAT_CONST_NAN;
>>> +      result.valid = true;
>>> +      return result;
>>> +    }
>>> +  else if (REAL_VALUE_ISINF (r))
>>> +    {
>>> +      if (REAL_VALUE_NEGATIVE (r))
>>> +	return result;
>>> +      result.type = RISCV_FLOAT_CONST_INF;
>>> +      result.valid = true;
>>> +      return result;
>>> +    }
>>> +
>>> +  bool sign = REAL_VALUE_NEGATIVE (r);
>>> +  result.sign = sign;
>>> +
>>> +  r = real_value_abs (&r);
>>> +  /* GCC internally does not use IEEE754-like encoding (where normalized
>>> +     significands are in the range [1, 2).  GCC uses [0.5, 1) (see real.cc).
>>> +     So, this exponent_p1 variable equals IEEE754 unbiased exponent + 1.  */
>>> +  int exponent_p1 = REAL_EXP (&r);
>>> +
>>> +  /* For the mantissa, we expand into two HOST_WIDE_INTS, apart from the
>>> +     highest (sign) bit, with a fixed binary point at bit point_pos.
>>> +     m1 holds the low part of the mantissa, m2 the high part.
>>> +     WARNING: If we ever have a representation using more than 2 * H_W_I - 1
>>> +     bits for the mantissa, this can fail (low bits will be lost).  */
>>> +  bool fail = false;
>>> +  real_ldexp (&m, &r, (2 * HOST_BITS_PER_WIDE_INT - 1) - exponent_p1);
>>> +  wide_int w = real_to_integer (&m, &fail, HOST_BITS_PER_WIDE_INT * 2);
>>> +  if (fail)
>>> +    return result;
>>> +
>>> +  /* If the low part of the mantissa has bits set we cannot represent
>>> +     the value.  */
>>> +  if (w.ulow () != 0)
>>> +    return result;
>>> +  /* We have rejected the lower HOST_WIDE_INT, so update our
>>> +     understanding of how many bits lie in the mantissa and
>>> +     look only at the high HOST_WIDE_INT.  */
>>> +  unsigned HOST_WIDE_INT mantissa = w.elt (1);
>>> +
>>> +  /* We cannot represent the value 0.0.  */
>>> +  if (mantissa == 0)
>>> +    return result;
>>> +
>>> +  /* We can only represent values with a mantissa of the form 1.xx.  */
>>> +  unsigned HOST_WIDE_INT mask
>>> +      = ((unsigned HOST_WIDE_INT) 1 << (HOST_BITS_PER_WIDE_INT - 4)) - 1;
>>> +  if ((mantissa & mask) != 0)
>>> +    return result;
>>> +  mantissa >>= HOST_BITS_PER_WIDE_INT - 4;
>>> +  /* Now the lowest 3-bits of mantissa should form (1.xx)b.  */
>>> +  gcc_assert (mantissa & (1u << 2));
>>> +  /* Mask out the highest bit.  */
>>> +  mantissa &= ~(1u << 2);
>>> +
>>> +  if (mantissa == 0)
>>> +    {
>>> +      /* We cannot represent any values but -1.0.  */
>>> +      if (exponent_p1 != 1 && sign)
>>> +	return result;
>>> +      switch (exponent_p1)
>>> +	{
>>> +	case -15: /* 1.0 * 2^(-16)  */
>>> +	case -14: /* 1.0 * 2^(-15)  */
>>> +	case -7:  /* 1.0 * 2^(- 8)  */
>>> +	case -6:  /* 1.0 * 2^(- 7)  */
>>> +	case 8:   /* 1.0 * 2^(+ 7)  */
>>> +	case 9:   /* 1.0 * 2^(+ 8)  */
>>> +	case 16:  /* 1.0 * 2^(+15)  */
>>> +	case 17:  /* 1.0 * 2^(+16)  */
>>> +	  break;
>>> +	default:
>>> +	  if (exponent_p1 >= -3 && exponent_p1 <= 5)
>>> +	    /* 1.0 * 2^[-4,4]  */
>>> +	    break;
>>> +	  switch (GET_MODE (x))
>>> +	    {
>>> +	    case HFmode: /* IEEE 754 binary16.  */
>>> +	      /* Minimum positive normal == 1.0 * 2^(-14)  */
>>> +	      if (exponent_p1 != -13) return result;
>>> +	      break;
>>> +	    case SFmode: /* IEEE 754 binary32.  */
>>> +	      /* Minimum positive normal == 1.0 * 2^(-126)  */
>>> +	      if (exponent_p1 != -125) return result;
>>> +	      break;
>>> +	    case DFmode: /* IEEE 754 binary64.  */
>>> +	      /* Minimum positive normal == 1.0 * 2^(-1022)  */
>>> +	      if (exponent_p1 != -1021) return result;
>>> +	      break;
>>> +	    default:
>>> +	      gcc_unreachable ();
>>> +	    }
>>> +	  result.type = RISCV_FLOAT_CONST_MIN;
>>> +	  result.valid = true;
>>> +	  return result;
>>> +	}
>>> +    }
>>> +  else
>>> +    {
>>> +      if (sign)
>>> +	return result;
>>> +      if (exponent_p1 < -1 || exponent_p1 > 2)
>>> +	return result;
>>> +      /* The value is (+1.xx)b * 2^[-2,1].
>>> +	 But we cannot represent (+1.11)b * 2^1 (that is 3.5). */
>>> +      if (exponent_p1 == 2 && mantissa == 3)
>>> +	return result;
>>> +    }
>>> +
>>> +  result.valid = true;
>>> +  result.mantissa_below_point = mantissa;
>>> +  result.biased_exponent = exponent_p1 + 15;
>>> +
>>> +  return result;
>>> +}
>>> +
>>
>> This code is great and completely different from the way I implemented it.
>> I'm not sure which one is better, but my idea is that the fli instruction
>> corresponds to three tables (HF, SF and DF), all of which represent
>> specific values. the library in gcc's real.h can very well convert
>> the corresponding values into the values in the table, so it is only
>> necessary to perform a simple binary search to look up the tables.
> 
> Yup.  My approach (based on AArch64's VMOV.F32 constraint checking code)
> is more generic but I think constants with single FLI instruction don't
> need to be that generic.
> 
> If multi-instruction FLI sequence gets realistic, this kind of generic
> approach (handling finite constants precisely) will be helpful (multi
> FLI sequence with addition might need some additional measures to avoid
> underflow, though).  But for now, I think your approach is better and
> simpler.
> 
>>
>> @@ -1362,17 +1545,14 @@  riscv_const_insns (rtx x)
>>  		   constant incurs a literal-pool access.  Allow this in
>>  		   order to increase vectorization possibilities.  */
>>  		int n = riscv_const_insns (elt);
>> -		if (CONST_DOUBLE_P (elt))
>> -		    return 1 + 4; /* vfmv.v.f + memory access.  */
>>> +		/* We need as many insns as it takes to load the constant
>>> +		   into a GPR and one vmv.v.x.  */
>>> +		if (n != 0)
>>> +		  return 1 + n;
>>> +		else if (CONST_DOUBLE_P (elt))
>>> +		  return 1 + 4; /* vfmv.v.f + memory access.  */
>>  		else
>> -		  {
>> -		    /* We need as many insns as it takes to load the constant
>> -		       into a GPR and one vmv.v.x.  */
>> -		    if (n != 0)
>> -		      return 1 + n;
>> -		    else
>> -		      return 1 + 4; /*vmv.v.x + memory access.  */
>> -		  }
>>> +		  return 1 + 4; /* vmv.v.x + memory access.  */
>>  	      }
>>  	  }
>>
>> I don't seem to understand here, if n = = 0, always return 1 + 4?
>> If so, it could be
>> if (n != 0)
>>    return 1 + n;
>> else
>>   return 1 + 4;
>>
>> @@ -5117,6 +5313,36 @@  riscv_print_operand (FILE *file, rtx op, int letter)
>>  	    output_address (mode, XEXP (op, 0));
>>  	  break;
>>  
>>> +	case CONST_DOUBLE:
>>> +	  {
>>> +	    struct riscv_float_fli_const flt = riscv_get_float_fli_const (op);
>>> +	    if (flt.valid)
>>> +	      {
>>> +		switch (flt.type)
>>> +		  {
>>> +		  case RISCV_FLOAT_CONST_MIN:
>>> +		    fputs ("min", file);
>>> +		    break;
>>> +		  case RISCV_FLOAT_CONST_INF:
>>> +		    fputs ("inf", file);
>>> +		    break;
>>> +		  case RISCV_FLOAT_CONST_NAN:
>>> +		    fputs ("nan", file);
>>> +		    break;
>>> +		  default:
>>> +		    /* Use simpler (and bit-perfect) printer.  */
>>> +		    if (flt.sign)
>>> +		      fputc ('-', file);
>>> +		    fprintf (file, "0x1.%cp%+d",
>>> +			     "048c"[flt.mantissa_below_point],
>>> +			     (int) flt.biased_exponent - 16);
>>> +		    break;
>>> +		  }
>>> +		break;
>>> +	      }
>>> +	  }
>>> +	  /* Fall through.  */
>>
>> Display floating-point values at the assembly level can refer llvm
>> https://reviews.llvm.org/D145645. 
> 
> Thanks for the link.  I personally prefer hexfloats to avoid precision
> problems as possible and that's how GNU Binutils prints the FLI
> constants.  But that makes sense (and I feel decimals are okay).
> 
>>
>> It may also be necessary to deal with riscv_split_64bit_move_p
>> and riscv_legitimize_const_move for rv32, otherwise the mov of
>> DFmode on rv32 will be split into high 32-bit mov and low 32-bit
>> mov, thus unable to generate fli instructions.
> 
> Thanks for letting me know.  I'm fighting against GCC's large code base
> for future contribution and that's a lot of help for me.
> 
> Thanks,
> Tsukasa
> 
>>
>> Thanks,
>> Jin
>>
Jin Ma Aug. 15, 2023, 9:20 a.m. UTC | #4
On 2023/08/15 12:38, Tsukasa OI wrote:
> > On 2023/08/14 21:51, Jin Ma wrote:
> >> Hi Tsukasa,
> >>   What a coincidence, I also implemented zfa extension, which also includes fli related instructions :)
> > 
> > Hi, I'm glad to know that someone is working on this extension more
> > comprehensively (especially when "someone" is an experienced GCC
> > contributor).  I prefer your patch set in general and glad to learn from
> > your patch set and your response that my approach was not *that* bad as
> > I expected.
> > 
> > When a new extension gets available, I will be more confident making a
> > patch set for GCC (as I already do in GNU Binutils).
> > 
> >>
> >> links: https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627294.html
> >>
> >>> +  if (!TARGET_HARD_FLOAT || !TARGET_ZFA)
> >>> +    return result;
> >>> +  switch (GET_MODE (x))
> >>> +    {
> >>> +    case HFmode:
> >>> +      /* Not only 'Zfhmin', either 'Zfh' or 'Zvfh' is required.  */
> >>> +      if (!TARGET_ZFH && !TARGET_ZVFH)
> >>
> >> When Zvfh means that zfh is also on, so there may be no need to judge
> >> the TARGET_ZVFH here. By the way,the format here seems wrong, maybe 'tab'
> >> is needed for alignment?
> > 
> > For indentation, I believe this is okay considering 3 indent (soft tab)
> > from the top (meaning 6 spaces).
> > 
> > For specification requirements, I think I'm correct.
> > 
> > The spec says that 'Zvfh' depends on 'Zve32f' and 'Zfhmin'.  'Zfhmin' is
> > a conversion-only 'Zfh' subset ('Zve32f' doesn't require any
> > FP16-related extensions).
> > 
> > Note that "fli.h" requires 'Zfa' and ('Zfh' and/or 'Zvfh').
> > 
> > So, 'Zfh' alone will not be sufficient to check requirements to the
> > "fli.h" instruction.  So, checking TARGET_ZFH || TARGET_ZVFH (for
> > existence of the "fli.h") should be correct and I think your patch needs
> > to be changed "in the long term".
> > 
> > "In the long term" means that, current GNU Binutils has a bug which
> > "fli.h" requires 'Zfa' and 'Zfh' ('Zfa' and 'Zvfh' does not work).
> > My initial 'Zfa' proposal (improved by Christoph Müllner and upstreamed
> > into master) intentionally ignored this case because I assumed that
> > approval/ratification of 'Zvfh' will take some time and we have a time
> > to fix before a release of Binutils following approval of both 'Zfa' and
> > 'Zvfh' (it turned out to be wrong).
> > 
> > cf. <https://sourceware.org/pipermail/binutils/2023-August/129006.html>
> > 
> > So, "fixing" this part (on your patch) alone will not make the program
> > work (on the simulator) because current buggy GNU Binutils won't accept
> > it.  I'm working on it on the GNU Binutils side.
> 
> Okay, the bug is fixed on GNU Binutils (master) and waiting approval
> from the release maintainer (for binutils-2_41-branch).
> 
> Thanks,
> Tsukasa
> 

Yes, you are right. I did not notice that zfh and zvfh are relatively independent.
Jeff Law Aug. 25, 2023, 8:59 p.m. UTC | #5
On 8/14/23 06:51, Jin Ma wrote:

> 
> This code is great and completely different from the way I implemented it.
> I'm not sure which one is better, but my idea is that the fli instruction
> corresponds to three tables (HF, SF and DF), all of which represent
> specific values. the library in gcc's real.h can very well convert
> the corresponding values into the values in the table, so it is only
> necessary to perform a simple binary search to look up the tables.
Yea, I was kindof amazed at how Tsukasa implemented that code.  But I 
think the tables are easier to understand, so I'd tend to prefer them.

I'm still evaluating, but in general it looks like your implementation 
is (functionally) a superset of what Tsukasa has done.  I've still got 
some testing to do with Tsukasa's tests to verify, but my inclination is 
to go with your v10 patch right now.

Jeff
diff mbox series

Patch

diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md
index 44525b2da491..d57c72ef14f0 100644
--- a/gcc/config/riscv/constraints.md
+++ b/gcc/config/riscv/constraints.md
@@ -98,6 +98,13 @@ 
   (and (match_code "const_double")
        (match_test "op == CONST0_RTX (mode)")))
 
+;; Floating-point constant that can be generated by a FLI instruction
+;; in the 'Zfa' standard extension.
+(define_constraint "H"
+  "@internal"
+  (and (match_code "const_double")
+       (match_test "riscv_get_float_fli_const (op).valid")))
+
 (define_memory_constraint "A"
   "An address that is held in a general-purpose register."
   (and (match_code "mem")
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 2fbed04ff84c..6effa2437251 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -80,6 +80,39 @@  struct riscv_address_info {
   enum riscv_symbol_type symbol_type;
 };
 
+/* Classifies a floating point constant possibly retrieved by
+   the FLI instructions.
+
+   RISCV_FLOAT_CONST_MIN
+       The minimum positive normal value for given mode.
+
+   RISCV_FLOAT_CONST_INF
+       Positive infinity.
+
+   RISCV_FLOAT_CONST_NAN
+       Canonical NaN (positive, quiet and zero payload NaN).
+
+   RISCV_FLOAT_CONST_FINITE
+       A finite number.  */
+enum riscv_float_fli_const_type {
+  RISCV_FLOAT_CONST_MIN,
+  RISCV_FLOAT_CONST_INF,
+  RISCV_FLOAT_CONST_NAN,
+  RISCV_FLOAT_CONST_FINITE,
+};
+
+/* Information about a floating point constant possibly retrieved by
+   the FLI instructions.  */
+struct riscv_float_fli_const {
+  bool valid: 1;
+  bool sign: 1;
+  enum riscv_float_fli_const_type type: 2;
+  /* Highest 2 bits of IEEE754 mantissa on RISCV_FLOAT_CONST_FINITE.  */
+  unsigned int mantissa_below_point: 2;
+  /* IEEE754 normal exponent - 16 on RISCV_FLOAT_CONST_FINITE.  */
+  unsigned int biased_exponent: 6;
+};
+
 /* Routines implemented in riscv.cc.  */
 extern enum riscv_symbol_type riscv_classify_symbolic_expression (rtx);
 extern bool riscv_symbolic_constant_p (rtx, enum riscv_symbol_type *);
@@ -125,6 +158,7 @@  extern rtx riscv_gen_gpr_save_insn (struct riscv_frame_info *);
 extern bool riscv_gpr_save_operation_p (rtx);
 extern void riscv_reinit (void);
 extern poly_uint64 riscv_regmode_natural_size (machine_mode);
+extern struct riscv_float_fli_const riscv_get_float_fli_const (rtx);
 extern bool riscv_v_ext_vector_mode_p (machine_mode);
 extern bool riscv_v_ext_tuple_mode_p (machine_mode);
 extern bool riscv_v_ext_vls_mode_p (machine_mode);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index f9b7a9ee749f..a8c13b014130 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -812,6 +812,185 @@  riscv_legitimate_constant_p (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
   return riscv_const_insns (x) > 0;
 }
 
+/* Check and generate information corresponding a floating point constant
+   that can be generated from a FLI instruction.  */
+
+struct riscv_float_fli_const
+riscv_get_float_fli_const (rtx x)
+{
+  struct riscv_float_fli_const result = {
+    false, false, RISCV_FLOAT_CONST_FINITE, 0, 0
+  };
+  REAL_VALUE_TYPE r, m;
+
+  if (!TARGET_HARD_FLOAT || !TARGET_ZFA)
+    return result;
+  switch (GET_MODE (x))
+    {
+    case HFmode:
+      /* Not only 'Zfhmin', either 'Zfh' or 'Zvfh' is required.  */
+      if (!TARGET_ZFH && !TARGET_ZVFH)
+	return result;
+      break;
+    case SFmode: break;
+    case DFmode: break;
+    default: return result;
+    }
+
+  if (!CONST_DOUBLE_P (x))
+    return result;
+
+  r = *CONST_DOUBLE_REAL_VALUE (x);
+
+  if (REAL_VALUE_ISNAN (r))
+    {
+      long reprs[2] = { 0 };
+      /* Compare with canonical NaN.  */
+      switch (GET_MODE (x))
+	{
+	case HFmode:
+	  reprs[0] = real_to_target (NULL, &r,
+				     float_mode_for_size (16).require ());
+	  /* 0x7e00: Canonical NaN for binary16.  */
+	  if (reprs[0] != 0x7e00)
+	    return result;
+	  break;
+	case SFmode:
+	  reprs[0] = real_to_target (NULL, &r,
+				     float_mode_for_size (32).require ());
+	  /* 0x7fc00000: Canonical NaN for binary32.  */
+	  if (reprs[0] != 0x7fc00000)
+	    return result;
+	  break;
+	case DFmode:
+	  real_to_target (reprs, &r, float_mode_for_size (64).require ());
+	  if (FLOAT_WORDS_BIG_ENDIAN)
+	    std::swap (reprs[0], reprs[1]);
+	  /* 0x7ff80000_00000000: Canonical NaN for binary64.  */
+	  if (reprs[0] != 0 || reprs[1] != 0x7ff80000)
+	    return result;
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+      result.type = RISCV_FLOAT_CONST_NAN;
+      result.valid = true;
+      return result;
+    }
+  else if (REAL_VALUE_ISINF (r))
+    {
+      if (REAL_VALUE_NEGATIVE (r))
+	return result;
+      result.type = RISCV_FLOAT_CONST_INF;
+      result.valid = true;
+      return result;
+    }
+
+  bool sign = REAL_VALUE_NEGATIVE (r);
+  result.sign = sign;
+
+  r = real_value_abs (&r);
+  /* GCC internally does not use IEEE754-like encoding (where normalized
+     significands are in the range [1, 2).  GCC uses [0.5, 1) (see real.cc).
+     So, this exponent_p1 variable equals IEEE754 unbiased exponent + 1.  */
+  int exponent_p1 = REAL_EXP (&r);
+
+  /* For the mantissa, we expand into two HOST_WIDE_INTS, apart from the
+     highest (sign) bit, with a fixed binary point at bit point_pos.
+     m1 holds the low part of the mantissa, m2 the high part.
+     WARNING: If we ever have a representation using more than 2 * H_W_I - 1
+     bits for the mantissa, this can fail (low bits will be lost).  */
+  bool fail = false;
+  real_ldexp (&m, &r, (2 * HOST_BITS_PER_WIDE_INT - 1) - exponent_p1);
+  wide_int w = real_to_integer (&m, &fail, HOST_BITS_PER_WIDE_INT * 2);
+  if (fail)
+    return result;
+
+  /* If the low part of the mantissa has bits set we cannot represent
+     the value.  */
+  if (w.ulow () != 0)
+    return result;
+  /* We have rejected the lower HOST_WIDE_INT, so update our
+     understanding of how many bits lie in the mantissa and
+     look only at the high HOST_WIDE_INT.  */
+  unsigned HOST_WIDE_INT mantissa = w.elt (1);
+
+  /* We cannot represent the value 0.0.  */
+  if (mantissa == 0)
+    return result;
+
+  /* We can only represent values with a mantissa of the form 1.xx.  */
+  unsigned HOST_WIDE_INT mask
+      = ((unsigned HOST_WIDE_INT) 1 << (HOST_BITS_PER_WIDE_INT - 4)) - 1;
+  if ((mantissa & mask) != 0)
+    return result;
+  mantissa >>= HOST_BITS_PER_WIDE_INT - 4;
+  /* Now the lowest 3-bits of mantissa should form (1.xx)b.  */
+  gcc_assert (mantissa & (1u << 2));
+  /* Mask out the highest bit.  */
+  mantissa &= ~(1u << 2);
+
+  if (mantissa == 0)
+    {
+      /* We cannot represent any values but -1.0.  */
+      if (exponent_p1 != 1 && sign)
+	return result;
+      switch (exponent_p1)
+	{
+	case -15: /* 1.0 * 2^(-16)  */
+	case -14: /* 1.0 * 2^(-15)  */
+	case -7:  /* 1.0 * 2^(- 8)  */
+	case -6:  /* 1.0 * 2^(- 7)  */
+	case 8:   /* 1.0 * 2^(+ 7)  */
+	case 9:   /* 1.0 * 2^(+ 8)  */
+	case 16:  /* 1.0 * 2^(+15)  */
+	case 17:  /* 1.0 * 2^(+16)  */
+	  break;
+	default:
+	  if (exponent_p1 >= -3 && exponent_p1 <= 5)
+	    /* 1.0 * 2^[-4,4]  */
+	    break;
+	  switch (GET_MODE (x))
+	    {
+	    case HFmode: /* IEEE 754 binary16.  */
+	      /* Minimum positive normal == 1.0 * 2^(-14)  */
+	      if (exponent_p1 != -13) return result;
+	      break;
+	    case SFmode: /* IEEE 754 binary32.  */
+	      /* Minimum positive normal == 1.0 * 2^(-126)  */
+	      if (exponent_p1 != -125) return result;
+	      break;
+	    case DFmode: /* IEEE 754 binary64.  */
+	      /* Minimum positive normal == 1.0 * 2^(-1022)  */
+	      if (exponent_p1 != -1021) return result;
+	      break;
+	    default:
+	      gcc_unreachable ();
+	    }
+	  result.type = RISCV_FLOAT_CONST_MIN;
+	  result.valid = true;
+	  return result;
+	}
+    }
+  else
+    {
+      if (sign)
+	return result;
+      if (exponent_p1 < -1 || exponent_p1 > 2)
+	return result;
+      /* The value is (+1.xx)b * 2^[-2,1].
+	 But we cannot represent (+1.11)b * 2^1 (that is 3.5). */
+      if (exponent_p1 == 2 && mantissa == 3)
+	return result;
+    }
+
+  result.valid = true;
+  result.mantissa_below_point = mantissa;
+  result.biased_exponent = exponent_p1 + 15;
+
+  return result;
+}
+
 /* Implement TARGET_CANNOT_FORCE_CONST_MEM.  */
 
 static bool
@@ -1322,8 +1501,12 @@  riscv_const_insns (rtx x)
       }
 
     case CONST_DOUBLE:
-      /* We can use x0 to load floating-point zero.  */
-      return x == CONST0_RTX (GET_MODE (x)) ? 1 : 0;
+      /* We can use x0 to load floating-point zero.
+	 We also have FLI instructions when the Zfa extension is enabled.  */
+      return x == CONST0_RTX (GET_MODE (x))        ? 1
+	     : riscv_get_float_fli_const (x).valid ? 1
+						   : 0;
+
     case CONST_VECTOR:
       {
 	/* TODO: This is not accurate, we will need to
@@ -1362,17 +1545,14 @@  riscv_const_insns (rtx x)
 		   constant incurs a literal-pool access.  Allow this in
 		   order to increase vectorization possibilities.  */
 		int n = riscv_const_insns (elt);
-		if (CONST_DOUBLE_P (elt))
-		    return 1 + 4; /* vfmv.v.f + memory access.  */
+		/* We need as many insns as it takes to load the constant
+		   into a GPR and one vmv.v.x.  */
+		if (n != 0)
+		  return 1 + n;
+		else if (CONST_DOUBLE_P (elt))
+		  return 1 + 4; /* vfmv.v.f + memory access.  */
 		else
-		  {
-		    /* We need as many insns as it takes to load the constant
-		       into a GPR and one vmv.v.x.  */
-		    if (n != 0)
-		      return 1 + n;
-		    else
-		      return 1 + 4; /*vmv.v.x + memory access.  */
-		  }
+		  return 1 + 4; /* vmv.v.x + memory access.  */
 	      }
 	  }
 
@@ -3196,6 +3376,22 @@  riscv_output_move (rtx dest, rtx src)
       gcc_assert (known_eq (rtx_to_poly_int64 (src), BYTES_PER_RISCV_VECTOR));
       return "csrr\t%0,vlenb";
     }
+  if (dest_code == REG && src_code == CONST_DOUBLE)
+    {
+      struct riscv_float_fli_const flt = riscv_get_float_fli_const (src);
+      if (flt.valid)
+	{
+	  switch (width)
+	    {
+	    case 2:
+	      return "fli.h\t%0,%1";
+	    case 4:
+	      return "fli.s\t%0,%1";
+	    case 8:
+	      return "fli.d\t%0,%1";
+	    }
+	}
+    }
   gcc_unreachable ();
 }
 
@@ -5117,6 +5313,36 @@  riscv_print_operand (FILE *file, rtx op, int letter)
 	    output_address (mode, XEXP (op, 0));
 	  break;
 
+	case CONST_DOUBLE:
+	  {
+	    struct riscv_float_fli_const flt = riscv_get_float_fli_const (op);
+	    if (flt.valid)
+	      {
+		switch (flt.type)
+		  {
+		  case RISCV_FLOAT_CONST_MIN:
+		    fputs ("min", file);
+		    break;
+		  case RISCV_FLOAT_CONST_INF:
+		    fputs ("inf", file);
+		    break;
+		  case RISCV_FLOAT_CONST_NAN:
+		    fputs ("nan", file);
+		    break;
+		  default:
+		    /* Use simpler (and bit-perfect) printer.  */
+		    if (flt.sign)
+		      fputc ('-', file);
+		    fprintf (file, "0x1.%cp%+d",
+			     "048c"[flt.mantissa_below_point],
+			     (int) flt.biased_exponent - 16);
+		    break;
+		  }
+		break;
+	      }
+	  }
+	  /* Fall through.  */
+
 	default:
 	  if (letter == 'z' && op == CONST0_RTX (GET_MODE (op)))
 	    fputs (reg_names[GP_REG_FIRST], file);
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index b456fa6abb3c..ce73db33830d 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -1744,13 +1744,13 @@ 
 })
 
 (define_insn "*movhf_hardfloat"
-  [(set (match_operand:HF 0 "nonimmediate_operand" "=f,f,f,m,m,*f,*r,  *r,*r,*m")
-	(match_operand:HF 1 "move_operand"         " f,G,m,f,G,*r,*f,*G*r,*m,*r"))]
+  [(set (match_operand:HF 0 "nonimmediate_operand" "=f,f,f,f,m,m, *f,*r,   *r,*r,*m")
+	(match_operand:HF 1 "move_operand"         " f,G,H,m,f,G,*r,*f,*G*r,*m,*r"))]
   "TARGET_ZFHMIN
    && (register_operand (operands[0], HFmode)
        || reg_or_0_operand (operands[1], HFmode))"
   { return riscv_output_move (operands[0], operands[1]); }
-  [(set_attr "move_type" "fmove,mtc,fpload,fpstore,store,mtc,mfc,move,load,store")
+  [(set_attr "move_type" "fmove,mtc,mtc,fpload,fpstore,store,mtc,mfc,move,load,store")
    (set_attr "mode" "HF")])
 
 (define_insn "*movhf_softfloat"
@@ -2075,13 +2075,13 @@ 
 })
 
 (define_insn "*movsf_hardfloat"
-  [(set (match_operand:SF 0 "nonimmediate_operand" "=f,f,f,m,m,*f,*r,  *r,*r,*m")
-	(match_operand:SF 1 "move_operand"         " f,G,m,f,G,*r,*f,*G*r,*m,*r"))]
+  [(set (match_operand:SF 0 "nonimmediate_operand" "=f,f,f,f,m,m, *f,*r,   *r,*r,*m")
+	(match_operand:SF 1 "move_operand"         " f,G,H,m,f,G,*r,*f,*G*r,*m,*r"))]
   "TARGET_HARD_FLOAT
    && (register_operand (operands[0], SFmode)
        || reg_or_0_operand (operands[1], SFmode))"
   { return riscv_output_move (operands[0], operands[1]); }
-  [(set_attr "move_type" "fmove,mtc,fpload,fpstore,store,mtc,mfc,move,load,store")
+  [(set_attr "move_type" "fmove,mtc,mtc,fpload,fpstore,store,mtc,mfc,move,load,store")
    (set_attr "mode" "SF")])
 
 (define_insn "*movsf_softfloat"
@@ -2109,23 +2109,23 @@ 
 ;; In RV32, we lack fmv.x.d and fmv.d.x.  Go through memory instead.
 ;; (However, we can still use fcvt.d.w to zero a floating-point register.)
 (define_insn "*movdf_hardfloat_rv32"
-  [(set (match_operand:DF 0 "nonimmediate_operand" "=f,f,f,m,m,*th_f_fmv,*th_r_fmv,  *r,*r,*m")
-	(match_operand:DF 1 "move_operand"         " f,G,m,f,G,*th_r_fmv,*th_f_fmv,*r*G,*m,*r"))]
+  [(set (match_operand:DF 0 "nonimmediate_operand" "=f,f,f,f,m,m, *th_f_fmv,*th_r_fmv,  *r, *r,*m")
+	(match_operand:DF 1 "move_operand"         " f,G,H,m,f,G,*th_r_fmv,*th_f_fmv,*r*G,*m,*r"))]
   "!TARGET_64BIT && TARGET_DOUBLE_FLOAT
    && (register_operand (operands[0], DFmode)
        || reg_or_0_operand (operands[1], DFmode))"
   { return riscv_output_move (operands[0], operands[1]); }
-  [(set_attr "move_type" "fmove,mtc,fpload,fpstore,store,mtc,mfc,move,load,store")
+  [(set_attr "move_type" "fmove,mtc,mtc,fpload,fpstore,store,mtc,mfc,move,load,store")
    (set_attr "mode" "DF")])
 
 (define_insn "*movdf_hardfloat_rv64"
-  [(set (match_operand:DF 0 "nonimmediate_operand" "=f,f,f,m,m,*f,*r,  *r,*r,*m")
-	(match_operand:DF 1 "move_operand"         " f,G,m,f,G,*r,*f,*r*G,*m,*r"))]
+  [(set (match_operand:DF 0 "nonimmediate_operand" "=f,f,f,f,m,m, *f,*r,  *r, *r,*m")
+	(match_operand:DF 1 "move_operand"         " f,G,H,m,f,G,*r,*f,*r*G,*m,*r"))]
   "TARGET_64BIT && TARGET_DOUBLE_FLOAT
    && (register_operand (operands[0], DFmode)
        || reg_or_0_operand (operands[1], DFmode))"
   { return riscv_output_move (operands[0], operands[1]); }
-  [(set_attr "move_type" "fmove,mtc,fpload,fpstore,store,mtc,mfc,move,load,store")
+  [(set_attr "move_type" "fmove,mtc,mtc,fpload,fpstore,store,mtc,mfc,move,load,store")
    (set_attr "mode" "DF")])
 
 (define_insn "*movdf_softfloat"
diff --git a/gcc/testsuite/gcc.target/riscv/zfa-fli-1.c b/gcc/testsuite/gcc.target/riscv/zfa-fli-1.c
new file mode 100644
index 000000000000..35ea5c477676
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zfa-fli-1.c
@@ -0,0 +1,24 @@ 
+/* { dg-do compile } */
+/* { dg-options "-march=rv64imfd_zfa -mabi=lp64d"  { target { rv64 } } } */
+/* { dg-options "-march=rv32imfd_zfa -mabi=ilp32d" { target { rv32 } } } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-O1" "-Os" "-Og" "-Oz"} } */
+
+#ifndef __riscv_zfa
+#error Feature macro not defined
+#endif
+
+double
+foo_positive_d (double a)
+{
+  /* Use 3 FLI FP constants.  */
+  return (2.5 * a - 1.0) / 0.875;
+}
+
+float
+foo_positive_s (float a)
+{
+  return ((float) 2.5 * a - (float) 1.0) / (float) 0.875;
+}
+
+/* { dg-final { scan-assembler-times "fli\\.s\t" 3 } } */
+/* { dg-final { scan-assembler-times "fli\\.d\t" 3 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zfa-fli-2.c b/gcc/testsuite/gcc.target/riscv/zfa-fli-2.c
new file mode 100644
index 000000000000..10d49d116e46
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zfa-fli-2.c
@@ -0,0 +1,24 @@ 
+/* { dg-do compile } */
+/* { dg-options "-march=rv64imfd_zfa -mabi=lp64d"  { target { rv64 } } } */
+/* { dg-options "-march=rv32imfd_zfa -mabi=ilp32d" { target { rv32 } } } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-O1" "-O2" "-Os" "-Og" "-Oz"} } */
+
+#ifndef __riscv_zfa
+#error Feature macro not defined
+#endif
+
+double
+foo_negative_d (double a)
+{
+  /* Use 3 "non-FLI" FP constants.  */
+  return (3.5 * a - 5.0) / 0.1875;
+}
+
+float
+foo_negative_s (float a)
+{
+  return ((float) 3.5 * a - (float) 5.0) / (float) 0.1875;
+}
+
+/* { dg-final { scan-assembler-not "fli\\.s\t" } } */
+/* { dg-final { scan-assembler-not "fli\\.d\t" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zfa-fli-3.c b/gcc/testsuite/gcc.target/riscv/zfa-fli-3.c
new file mode 100644
index 000000000000..6d069b2a4a9c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zfa-fli-3.c
@@ -0,0 +1,14 @@ 
+/* { dg-do compile } */
+/* { dg-options "-march=rv64imfd_zfa -mabi=lp64d"  { target { rv64 } } } */
+/* { dg-options "-march=rv32imfd_zfa -mabi=ilp32d" { target { rv32 } } } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-O1" "-Os" "-Og" "-Oz"} } */
+
+double
+foo_positive_s (float a)
+{
+  /* Use 3 FLI FP constants (but type conversion occur in the middle).  */
+  return (2.5f * a - 1.0) / 0.875;
+}
+
+/* { dg-final { scan-assembler-times "fli\\.s\t" 1 } } */
+/* { dg-final { scan-assembler-times "fli\\.d\t" 2 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zfa-fli-4.c b/gcc/testsuite/gcc.target/riscv/zfa-fli-4.c
new file mode 100644
index 000000000000..153853efb196
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zfa-fli-4.c
@@ -0,0 +1,111 @@ 
+/* { dg-do compile } */
+/* { dg-options "-march=rv64imfd_zfa_zfh -mabi=lp64d"  { target { rv64 } } } */
+/* { dg-options "-march=rv32imfd_zfa_zfh -mabi=ilp32d" { target { rv32 } } } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-Og" "-Oz"} } */
+
+#define TYPE_h _Float16
+#define TYPE_s float
+#define TYPE_d double
+
+#define DECL_TYPE(TYPE_SHORT) TYPE_##TYPE_SHORT
+
+#define DECL_FUNC(TYPE_SHORT, N, VALUE)                                       \
+  DECL_TYPE (TYPE_SHORT) const_##TYPE_SHORT##_##N (void)                      \
+    {                                                                         \
+      return VALUE;                                                           \
+    }
+
+#define DECL_FINITE_FUNCS(TYPE_SHORT)                                         \
+  DECL_FUNC (TYPE_SHORT, 00, -1)                                              \
+  DECL_FUNC (TYPE_SHORT, 02, 0.0000152587890625)                              \
+  DECL_FUNC (TYPE_SHORT, 03, 0.000030517578125)                               \
+  DECL_FUNC (TYPE_SHORT, 04, 0.00390625)                                      \
+  DECL_FUNC (TYPE_SHORT, 05, 0.0078125)                                       \
+  DECL_FUNC (TYPE_SHORT, 06, 0.0625)                                          \
+  DECL_FUNC (TYPE_SHORT, 07, 0.125)                                           \
+  DECL_FUNC (TYPE_SHORT, 08, 0.25)                                            \
+  DECL_FUNC (TYPE_SHORT, 09, 0.3125)                                          \
+  DECL_FUNC (TYPE_SHORT, 10, 0.375)                                           \
+  DECL_FUNC (TYPE_SHORT, 11, 0.4375)                                          \
+  DECL_FUNC (TYPE_SHORT, 12, 0.5)                                             \
+  DECL_FUNC (TYPE_SHORT, 13, 0.625)                                           \
+  DECL_FUNC (TYPE_SHORT, 14, 0.75)                                            \
+  DECL_FUNC (TYPE_SHORT, 15, 0.875)                                           \
+  DECL_FUNC (TYPE_SHORT, 16, 1)                                               \
+  DECL_FUNC (TYPE_SHORT, 17, 1.25)                                            \
+  DECL_FUNC (TYPE_SHORT, 18, 1.5)                                             \
+  DECL_FUNC (TYPE_SHORT, 19, 1.75)                                            \
+  DECL_FUNC (TYPE_SHORT, 20, 2)                                               \
+  DECL_FUNC (TYPE_SHORT, 21, 2.5)                                             \
+  DECL_FUNC (TYPE_SHORT, 22, 3)                                               \
+  DECL_FUNC (TYPE_SHORT, 23, 4)                                               \
+  DECL_FUNC (TYPE_SHORT, 24, 8)                                               \
+  DECL_FUNC (TYPE_SHORT, 25, 16)                                              \
+  DECL_FUNC (TYPE_SHORT, 26, 128)                                             \
+  DECL_FUNC (TYPE_SHORT, 27, 256)                                             \
+  DECL_FUNC (TYPE_SHORT, 28, 32768)                                           \
+  DECL_FUNC (TYPE_SHORT, 29, 65536)
+
+/* Finite numbers (except 2^16 in _Float16, making an inf).  */
+DECL_FINITE_FUNCS (h)
+DECL_FINITE_FUNCS (s)
+DECL_FINITE_FUNCS (d)
+
+/* min.  */
+DECL_FUNC (h, 01, __FLT16_MIN__)
+DECL_FUNC (s, 01, __FLT_MIN__)
+DECL_FUNC (d, 01, __DBL_MIN__)
+
+/* inf.  */
+DECL_FUNC (h, 30, __builtin_inff16 ())
+DECL_FUNC (s, 30, __builtin_inff ())
+DECL_FUNC (d, 30, __builtin_inf ())
+
+/* nan.  */
+DECL_FUNC (h, 31, __builtin_nanf16 (""))
+DECL_FUNC (s, 31, __builtin_nanf (""))
+DECL_FUNC (d, 31, __builtin_nan (""))
+
+
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,-0x1\\.0p\\+0\n" 3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.0p-16\n"   3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.0p-15\n"   3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.0p-8\n"    3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.0p-7\n"    3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.0p-4\n"    3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.0p-3\n"    3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.0p-2\n"    3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.4p-2\n"    3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.8p-2\n"    3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.cp-2\n"    3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.0p-1\n"    3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.4p-1\n"    3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.8p-1\n"    3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.cp-1\n"    3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.0p\\+0\n"  3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.4p\\+0\n"  3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.8p\\+0\n"  3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.cp\\+0\n"  3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.0p\\+1\n"  3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.4p\\+1\n"  3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.8p\\+1\n"  3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.0p\\+2\n"  3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.0p\\+3\n"  3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.0p\\+4\n"  3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.0p\\+7\n"  3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.0p\\+8\n"  3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.0p\\+15\n" 3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[sd]\tfa0,0x1\\.0p\\+16\n"  2 } } */
+
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,min\n" 3 } } */
+
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,inf\n" 2 } } */
+/* { dg-final { scan-assembler-times "fli\\.s\tfa0,inf\n" 1 } } */
+/* { dg-final { scan-assembler-times "fli\\.d\tfa0,inf\n" 1 } } */
+
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,nan\n" 3 } } */
+
+
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0," 32 } } */
+/* { dg-final { scan-assembler-times "fli\\.s\tfa0," 32 } } */
+/* { dg-final { scan-assembler-times "fli\\.d\tfa0," 32 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zfa-fli-5.c b/gcc/testsuite/gcc.target/riscv/zfa-fli-5.c
new file mode 100644
index 000000000000..186f91ffb349
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zfa-fli-5.c
@@ -0,0 +1,98 @@ 
+/* { dg-do compile } */
+/* { dg-options "-march=rv64imf_zfa_zvfh -mabi=lp64f"  { target { rv64 } } } */
+/* { dg-options "-march=rv32imf_zfa_zvfh -mabi=ilp32f" { target { rv32 } } } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-Og" "-Oz"} } */
+
+/* Even if 'Zfh' is disabled, "fli.h" is usable when
+   both 'Zfa' and 'Zvfh' are available.  */
+#ifdef __riscv_zfh
+#error Invalid feature macro defined
+#endif
+
+#define TYPE_h _Float16
+
+#define DECL_TYPE(TYPE_SHORT) TYPE_##TYPE_SHORT
+
+#define DECL_FUNC(TYPE_SHORT, N, VALUE)                                       \
+  DECL_TYPE (TYPE_SHORT) const_##TYPE_SHORT##_##N (void)                      \
+    {                                                                         \
+      return VALUE;                                                           \
+    }
+
+#define DECL_FINITE_FUNCS(TYPE_SHORT)                                         \
+  DECL_FUNC (TYPE_SHORT, 00, -1)                                              \
+  DECL_FUNC (TYPE_SHORT, 02, 0.0000152587890625)                              \
+  DECL_FUNC (TYPE_SHORT, 03, 0.000030517578125)                               \
+  DECL_FUNC (TYPE_SHORT, 04, 0.00390625)                                      \
+  DECL_FUNC (TYPE_SHORT, 05, 0.0078125)                                       \
+  DECL_FUNC (TYPE_SHORT, 06, 0.0625)                                          \
+  DECL_FUNC (TYPE_SHORT, 07, 0.125)                                           \
+  DECL_FUNC (TYPE_SHORT, 08, 0.25)                                            \
+  DECL_FUNC (TYPE_SHORT, 09, 0.3125)                                          \
+  DECL_FUNC (TYPE_SHORT, 10, 0.375)                                           \
+  DECL_FUNC (TYPE_SHORT, 11, 0.4375)                                          \
+  DECL_FUNC (TYPE_SHORT, 12, 0.5)                                             \
+  DECL_FUNC (TYPE_SHORT, 13, 0.625)                                           \
+  DECL_FUNC (TYPE_SHORT, 14, 0.75)                                            \
+  DECL_FUNC (TYPE_SHORT, 15, 0.875)                                           \
+  DECL_FUNC (TYPE_SHORT, 16, 1)                                               \
+  DECL_FUNC (TYPE_SHORT, 17, 1.25)                                            \
+  DECL_FUNC (TYPE_SHORT, 18, 1.5)                                             \
+  DECL_FUNC (TYPE_SHORT, 19, 1.75)                                            \
+  DECL_FUNC (TYPE_SHORT, 20, 2)                                               \
+  DECL_FUNC (TYPE_SHORT, 21, 2.5)                                             \
+  DECL_FUNC (TYPE_SHORT, 22, 3)                                               \
+  DECL_FUNC (TYPE_SHORT, 23, 4)                                               \
+  DECL_FUNC (TYPE_SHORT, 24, 8)                                               \
+  DECL_FUNC (TYPE_SHORT, 25, 16)                                              \
+  DECL_FUNC (TYPE_SHORT, 26, 128)                                             \
+  DECL_FUNC (TYPE_SHORT, 27, 256)                                             \
+  DECL_FUNC (TYPE_SHORT, 28, 32768)                                           \
+  DECL_FUNC (TYPE_SHORT, 29, 65536)
+
+/* Finite numbers (except 2^16 in _Float16, making an inf).  */
+DECL_FINITE_FUNCS (h)
+
+/* min.  */
+DECL_FUNC (h, 01, __FLT16_MIN__)
+
+/* inf.  */
+DECL_FUNC (h, 30, __builtin_inff16 ())
+
+
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,-0x1\\.0p\\+0\n" 1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.0p-16\n"   1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.0p-15\n"   1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.0p-8\n"    1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.0p-7\n"    1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.0p-4\n"    1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.0p-3\n"    1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.0p-2\n"    1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.4p-2\n"    1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.8p-2\n"    1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.cp-2\n"    1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.0p-1\n"    1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.4p-1\n"    1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.8p-1\n"    1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.cp-1\n"    1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.0p\\+0\n"  1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.4p\\+0\n"  1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.8p\\+0\n"  1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.cp\\+0\n"  1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.0p\\+1\n"  1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.4p\\+1\n"  1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.8p\\+1\n"  1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.0p\\+2\n"  1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.0p\\+3\n"  1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.0p\\+4\n"  1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.0p\\+7\n"  1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.0p\\+8\n"  1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.0p\\+15\n" 1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,inf\n"           2 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,min\n"           1 } } */
+
+
+/* nan.  */
+DECL_FUNC (h, 31, __builtin_nanf16 (""))
+
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,nan\n"           1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zfa-fli-6.c b/gcc/testsuite/gcc.target/riscv/zfa-fli-6.c
new file mode 100644
index 000000000000..2ee830d5c14c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zfa-fli-6.c
@@ -0,0 +1,61 @@ 
+/* { dg-do compile } */
+/* { dg-options "-march=rv64imf_zfa_zfhmin -mabi=lp64f"  { target { rv64 } } } */
+/* { dg-options "-march=rv32imf_zfa_zfhmin -mabi=ilp32f" { target { rv32 } } } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-Og" "-Oz"} } */
+
+/* "fli.h" is unavailable even if both 'Zfa' and 'Zfhmin' is enabled.  */
+
+#define TYPE_h _Float16
+
+#define DECL_TYPE(TYPE_SHORT) TYPE_##TYPE_SHORT
+
+#define DECL_FUNC(TYPE_SHORT, N, VALUE)                                       \
+  DECL_TYPE (TYPE_SHORT) const_##TYPE_SHORT##_##N (void)                      \
+    {                                                                         \
+      return VALUE;                                                           \
+    }
+
+#define DECL_FINITE_FUNCS(TYPE_SHORT)                                         \
+  DECL_FUNC (TYPE_SHORT, 00, -1)                                              \
+  DECL_FUNC (TYPE_SHORT, 02, 0.0000152587890625)                              \
+  DECL_FUNC (TYPE_SHORT, 03, 0.000030517578125)                               \
+  DECL_FUNC (TYPE_SHORT, 04, 0.00390625)                                      \
+  DECL_FUNC (TYPE_SHORT, 05, 0.0078125)                                       \
+  DECL_FUNC (TYPE_SHORT, 06, 0.0625)                                          \
+  DECL_FUNC (TYPE_SHORT, 07, 0.125)                                           \
+  DECL_FUNC (TYPE_SHORT, 08, 0.25)                                            \
+  DECL_FUNC (TYPE_SHORT, 09, 0.3125)                                          \
+  DECL_FUNC (TYPE_SHORT, 10, 0.375)                                           \
+  DECL_FUNC (TYPE_SHORT, 11, 0.4375)                                          \
+  DECL_FUNC (TYPE_SHORT, 12, 0.5)                                             \
+  DECL_FUNC (TYPE_SHORT, 13, 0.625)                                           \
+  DECL_FUNC (TYPE_SHORT, 14, 0.75)                                            \
+  DECL_FUNC (TYPE_SHORT, 15, 0.875)                                           \
+  DECL_FUNC (TYPE_SHORT, 16, 1)                                               \
+  DECL_FUNC (TYPE_SHORT, 17, 1.25)                                            \
+  DECL_FUNC (TYPE_SHORT, 18, 1.5)                                             \
+  DECL_FUNC (TYPE_SHORT, 19, 1.75)                                            \
+  DECL_FUNC (TYPE_SHORT, 20, 2)                                               \
+  DECL_FUNC (TYPE_SHORT, 21, 2.5)                                             \
+  DECL_FUNC (TYPE_SHORT, 22, 3)                                               \
+  DECL_FUNC (TYPE_SHORT, 23, 4)                                               \
+  DECL_FUNC (TYPE_SHORT, 24, 8)                                               \
+  DECL_FUNC (TYPE_SHORT, 25, 16)                                              \
+  DECL_FUNC (TYPE_SHORT, 26, 128)                                             \
+  DECL_FUNC (TYPE_SHORT, 27, 256)                                             \
+  DECL_FUNC (TYPE_SHORT, 28, 32768)                                           \
+  DECL_FUNC (TYPE_SHORT, 29, 65536)
+
+/* Finite numbers (except 2^16 in _Float16, making an inf).  */
+DECL_FINITE_FUNCS (h)
+
+/* min.  */
+DECL_FUNC (h, 01, __FLT16_MIN__)
+
+/* inf.  */
+DECL_FUNC (h, 30, __builtin_inff16 ())
+
+/* nan.  */
+DECL_FUNC (h, 31, __builtin_nanf16 (""))
+
+/* { dg-final { scan-assembler-not "fli\\.h\t" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zfa-fli-7.c b/gcc/testsuite/gcc.target/riscv/zfa-fli-7.c
new file mode 100644
index 000000000000..4da8a2985852
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zfa-fli-7.c
@@ -0,0 +1,30 @@ 
+/* { dg-do compile } */
+/* { dg-options "-march=rv64imfd_zfa_zfh -mabi=lp64d"  { target { rv64 } } } */
+/* { dg-options "-march=rv32imfd_zfa_zfh -mabi=ilp32d" { target { rv32 } } } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-Og" "-Oz"} } */
+
+/* Canonical NaN is, positive, quiet NaN with zero payload.  */
+
+#define TYPE_h _Float16
+#define TYPE_s float
+#define TYPE_d double
+
+#define DECL_TYPE(TYPE_SHORT) TYPE_##TYPE_SHORT
+
+#define DECL_FUNC(TYPE_SHORT, N, VALUE)                                       \
+  DECL_TYPE (TYPE_SHORT) const_##TYPE_SHORT##_##N (void)                      \
+    {                                                                         \
+      return VALUE;                                                           \
+    }
+
+/* Canonical NaN.  */
+DECL_FUNC (h, 1, __builtin_nanf16 (""))
+DECL_FUNC (s, 1, __builtin_nanf (""))
+DECL_FUNC (d, 1, __builtin_nan (""))
+DECL_FUNC (h, 2, __builtin_nanf16 ("0"))
+DECL_FUNC (s, 2, __builtin_nanf ("0"))
+DECL_FUNC (d, 2, __builtin_nan ("0"))
+
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,nan\n" 2 } } */
+/* { dg-final { scan-assembler-times "fli\\.s\tfa0,nan\n" 2 } } */
+/* { dg-final { scan-assembler-times "fli\\.d\tfa0,nan\n" 2 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zfa-fli-8.c b/gcc/testsuite/gcc.target/riscv/zfa-fli-8.c
new file mode 100644
index 000000000000..a09726c0cb59
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zfa-fli-8.c
@@ -0,0 +1,39 @@ 
+/* { dg-do compile } */
+/* { dg-options "-march=rv64imfd_zfa_zfh -mabi=lp64d"  { target { rv64 } } } */
+/* { dg-options "-march=rv32imfd_zfa_zfh -mabi=ilp32d" { target { rv32 } } } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-Og" "-Oz"} } */
+
+/* Canonical NaN is, positive, quiet NaN with zero payload.  */
+
+#define TYPE_h _Float16
+#define TYPE_s float
+#define TYPE_d double
+
+#define DECL_TYPE(TYPE_SHORT) TYPE_##TYPE_SHORT
+
+#define DECL_FUNC(TYPE_SHORT, N, VALUE)                                       \
+  DECL_TYPE (TYPE_SHORT) const_##TYPE_SHORT##_##N (void)                      \
+    {                                                                         \
+      return VALUE;                                                           \
+    }
+
+/* Non-canonical NaN.  */
+DECL_FUNC (h, 1, __builtin_nansf16 (""))
+DECL_FUNC (s, 1, __builtin_nansf (""))
+DECL_FUNC (d, 1, __builtin_nans (""))
+DECL_FUNC (h, 2, __builtin_nansf16 ("0"))
+DECL_FUNC (s, 2, __builtin_nansf ("0"))
+DECL_FUNC (d, 2, __builtin_nans ("0"))
+DECL_FUNC (h, 3, __builtin_nanf16 ("1"))
+DECL_FUNC (s, 3, __builtin_nanf ("1"))
+DECL_FUNC (d, 3, __builtin_nan ("1"))
+DECL_FUNC (h, 4, __builtin_nansf16 ("1"))
+DECL_FUNC (s, 4, __builtin_nansf ("1"))
+DECL_FUNC (d, 4, __builtin_nans ("1"))
+
+/* Canonical NaN, negated (making it non-canonical).  */
+DECL_FUNC (h, 5, -__builtin_nanf16 (""))
+DECL_FUNC (s, 5, -__builtin_nanf (""))
+DECL_FUNC (d, 5, -__builtin_nan (""))
+
+/* { dg-final { scan-assembler-not "fli\\.\[hsd]\tfa0,nan\n" } } */