mbox series

[0/2] RISC-V: Make "prefetch.i" built-in usable

Message ID cover.1691991126.git.research_trasio@irq.a4lg.com
Headers show
Series RISC-V: Make "prefetch.i" built-in usable | expand

Message

Tsukasa OI Aug. 14, 2023, 5:32 a.m. UTC
Hello,

and... I think this might be my first *large* patch set for GCC
contribution and definitely the first one to touch the machine description.

So, please review it carefully.


Background
===========

This patch set adds an optimization to FP constant initialization using a
FLI instruction, which is a part of the 'Zfa' extension which provides
additional floating-point instructions.

FLI instructions ("fli.h" for binary16, "fli.s" for binary32, "fli.d" for
binary64 and "fli.q" for binary128 [which can be ignored because current
GCC for RISC-V does not natively support binary128]) provide an
load-immediate operation for following 32 immediates.

| Binary Encoding | Immediate (and its part of binary representation) |
| --------------- | --------------------------------------------------|
|    `00000` ( 0) | -1.0          (-0b1.00 * 2^(+ 0))                 |
|    `00001` ( 1) | Minimum positive normal value                     |
|                 | sign=[0] exponent=[0..01] significand=[000..000]  |
|    `00010` ( 2) | 1.00*2^(-16)  (+0b1.00 * 2^(-16))                 |
|    `00011` ( 3) | 1.00*2^(-15)  (+0b1.00 * 2^(-15))                 |
|    `00100` ( 4) | 1.00*2^(- 8)  (+0b1.00 * 2^(- 8))                 |
|    `00101` ( 5) | 1.00*2^(- 7)  (+0b1.00 * 2^(- 7))                 |
|    `00110` ( 6) | 1.00*2^(- 4)  (+0b1.00 * 2^(- 4)) = 0.0625        |
|    `00111` ( 7) | 1.00*2^(- 3)  (+0b1.00 * 2^(- 3)) = 0.125         |
|    `01000` ( 8) | 1.00*2^(- 2)  (+0b1.00 * 2^(- 2)) : 0.25          |
|    `01001` ( 9) | 1.25*2^(- 2)  (+0b1.01 * 2^(- 2)) : 0.3125        |
|    `01010` (10) | 1.50*2^(- 2)  (+0b1.10 * 2^(- 2)) : 0.375         |
|    `01011` (11) | 1.75*2^(- 2)  (+0b1.11 * 2^(- 2)) : 0.4375        |
|    `01100` (12) | 1.00*2^(- 1)  (+0b1.00 * 2^(- 1)) : 0.5           |
|    `01101` (13) | 1.25*2^(- 1)  (+0b1.01 * 2^(- 1)) : 0.625         |
|    `01110` (14) | 1.50*2^(- 1)  (+0b1.10 * 2^(- 1)) : 0.75          |
|    `01111` (15) | 1.75*2^(- 1)  (+0b1.11 * 2^(- 1)) : 0.875         |
|    `10000` (16) | 1.00*2^(+ 0)  (+0b1.00 * 2^(+ 0)) : 1.0           |
|    `10001` (17) | 1.25*2^(+ 0)  (+0b1.01 * 2^(+ 0)) : 1.25          |
|    `10010` (18) | 1.50*2^(+ 0)  (+0b1.10 * 2^(+ 0)) : 1.5           |
|    `10011` (19) | 1.75*2^(+ 0)  (+0b1.11 * 2^(+ 0)) : 1.75          |
|    `10100` (20) | 1.00*2^(+ 1)  (+0b1.00 * 2^(+ 1)) : 2.0           |
|    `10101` (21) | 1.25*2^(+ 1)  (+0b1.01 * 2^(+ 1)) : 2.5           |
|    `10110` (22) | 1.50*2^(+ 1)  (+0b1.10 * 2^(+ 1)) : 3.0           |
|    `10111` (23) | 1.00*2^(+ 2)  (+0b1.00 * 2^(+ 2)) = 4             |
|    `11000` (24) | 1.00*2^(+ 3)  (+0b1.00 * 2^(+ 3)) = 8             |
|    `11001` (25) | 1.00*2^(+ 4)  (+0b1.00 * 2^(+ 4)) = 16            |
|    `11010` (26) | 1.00*2^(+ 7)  (+0b1.00 * 2^(+ 7)) = 128           |
|    `11011` (27) | 1.00*2^(+ 8)  (+0b1.00 * 2^(+ 8)) = 256           |
|    `11100` (28) | 1.00*2^(+15)  (+0b1.00 * 2^(+15)) = 32768         |
|    `11101` (29) | 1.00*2^(+16)  (+0b1.00 * 2^(+16)) = 65536         |
|                 | On "fli.h", this is equivalent to positive inf.   |
|    `11110` (30) | Positive infinity                                 |
|                 | sign=[0] exponent=[1..11] significand=[000..000]  |
|    `11111` (31) | Canonical NaN (positive, quiet and zero payload)  |
|                 | sign=[0] exponent=[1..11] significand=[100..000]  |

Currently, initializing a FP constant (except zero) involves memory and its
use can be reduced by FLI instructions.

We may have a room to generate much complex constants with multiple FLI
instructions (e.g. like long integer constants) but for starter, we can
begin with optimizing one FP constant initialization with one FLI
instruction (and because FP arithmetic often requires larger latency,
benefits of making multiple FLI sequence is not high compared to integers).


FLI FP constant checking
=========================

An instruction with a similar role to RISC-V's FLI instructions is the Arm/
AArch64's vmov.f32 instruction. It provides a load-immediate operation for
constant that can be represented in the following form:

> (-1)^s * 0b1.xxxx * 2^r   (where -3 <= r <= +4; fits in 3-bits)

This patch is largely influenced by AArch64's handling but
compared to this, handling RISC-V's FLI FP constant can be a little tricky.

*   FLI normally generates only values with sign bit 0 except the binary
    encoding 0 (which loads -1.0 with sign bit 1).
*   Not only finite values, FLI can generate positive infinity and
    canonical NaN.
*   Because FLI can generate canonical NaN, handling NaN is preferred but
    FLI only generates canonical NaN.  Since we can easily create a non-
    canonical NaN with __builtin_nan ("[PAYLOAD]") and that could be a
    direct return value of a function, we must reject non-canonical NaNs
    (otherwise it'll generate "fli.d fa0,nan" where NaN is non-canonical).
*   Exponent range and mantissa constraint is a bit tricky.
    On binary encodings 8-22, it looks like 0b1.xx * 2^r (where -2 <= 1)
    but we have to explicitly reject 0b1.11 * 2^1 (that is 3.5) because
    the value 3.5 is not in the list.
    Other 1.00 * 2^r values have discontinuous r.
*   Binary encoding 1 (minimum positive normal value for corresponding
    type) depends on the type (or mode) we are on.
*   Assembler accepts three string operands: "min", "inf" and "nan".

Handling those like aarch64_float_const_representable_p can be
inefficient.  So, I implemented riscv_get_float_fli_const function which
returns complex information about a FLI constant (including whether the
constant is valid for a FLI constant).

This complex information contains:

1.  Validness
2.  Sign bit (only set for -1.0)
3.  FLI constant type ("min", "inf", "nan" or a finite number but "min")
4.  Highest two bits of mantissa under the point (xx for 0b1.xx)
    on a finite value except "min".
5.  Biased exponent (yet sparse representation to make handling easier)
    on a finite value except "min".  For 0b1.xx * 2^r, (r+16) is stored.
    Valid range of this is [0, 32] (inclusive) so it requires 6 bits.

On many ABIs, those information is packed into an integer sized bitfield.


New Constraint: "H"
====================

According to the GCC Internals documentation, (along with "G") "H" is
preferred for a machine-dependent fashion to permit immediate floating
operands in particular ranges of values.  Because "G" is already used to
represent +0.0, this patch set uses "H" for FLI-capable FP constants.

It adds one variant per operation:

*   movhf_hardfloat
*   movsf_hardfloat
*   movdf_hardfloat_rv32
*   movdf_hardfloat_rv64

Note that the 'Zfa' extension requires the 'F' extension (which is the
hard float).



Portions that I'm not sure whether they are okay
=================================================

*   NaN handling (comparison with canonical NaN)
    Due to constraints, I had to compare a NaN with known binary
    representations with known IEEE 754 binary16/32/64's canonical NaN but
    it there any better way to perform this?
*   Any ICE possibility?
    For simple programs, I confirmed that no ICE occurs but I'm not sure
    whether this applies to other programs.  If I miss some cases in
    riscv_output_move or riscv_print_operand functions (corresponding
    mov instructions in riscv.md), it can easily cause an ICE.


Sincerely,
Tsukasa




Tsukasa OI (2):
  RISC-V: Add support for the 'Zfa' extension
  RISC-V: Constant FP Optimization with 'Zfa'

 gcc/common/config/riscv/riscv-common.cc    |   3 +
 gcc/config/riscv/constraints.md            |   7 +
 gcc/config/riscv/riscv-opts.h              |   2 +
 gcc/config/riscv/riscv-protos.h            |  34 +++
 gcc/config/riscv/riscv.cc                  | 250 ++++++++++++++++++++-
 gcc/config/riscv/riscv.md                  |  24 +-
 gcc/testsuite/gcc.target/riscv/zfa-fli-1.c |  24 ++
 gcc/testsuite/gcc.target/riscv/zfa-fli-2.c |  24 ++
 gcc/testsuite/gcc.target/riscv/zfa-fli-3.c |  14 ++
 gcc/testsuite/gcc.target/riscv/zfa-fli-4.c | 111 +++++++++
 gcc/testsuite/gcc.target/riscv/zfa-fli-5.c |  98 ++++++++
 gcc/testsuite/gcc.target/riscv/zfa-fli-6.c |  61 +++++
 gcc/testsuite/gcc.target/riscv/zfa-fli-7.c |  30 +++
 gcc/testsuite/gcc.target/riscv/zfa-fli-8.c |  39 ++++
 14 files changed, 697 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-8.c


base-commit: 614052dd4ea083e086712809c754ffebd9361316

Comments

Tsukasa OI Aug. 14, 2023, 6:19 a.m. UTC | #1
Oh my, I forgot to change the subject of PATCH 0/2.
That should have been "RISC-V: Constant FP Optimization with 'Zfa'", the
same subject as PATCH 2/2.

Sorry for confusion!

On 2023/08/14 14:32, Tsukasa OI wrote:
> Hello,
> 
> and... I think this might be my first *large* patch set for GCC
> contribution and definitely the first one to touch the machine description.
> 
> So, please review it carefully.
> 
> 
> Background
> ===========
> 
> This patch set adds an optimization to FP constant initialization using a
> FLI instruction, which is a part of the 'Zfa' extension which provides
> additional floating-point instructions.
> 
> FLI instructions ("fli.h" for binary16, "fli.s" for binary32, "fli.d" for
> binary64 and "fli.q" for binary128 [which can be ignored because current
> GCC for RISC-V does not natively support binary128]) provide an
> load-immediate operation for following 32 immediates.
> 
> | Binary Encoding | Immediate (and its part of binary representation) |
> | --------------- | --------------------------------------------------|
> |    `00000` ( 0) | -1.0          (-0b1.00 * 2^(+ 0))                 |
> |    `00001` ( 1) | Minimum positive normal value                     |
> |                 | sign=[0] exponent=[0..01] significand=[000..000]  |
> |    `00010` ( 2) | 1.00*2^(-16)  (+0b1.00 * 2^(-16))                 |
> |    `00011` ( 3) | 1.00*2^(-15)  (+0b1.00 * 2^(-15))                 |
> |    `00100` ( 4) | 1.00*2^(- 8)  (+0b1.00 * 2^(- 8))                 |
> |    `00101` ( 5) | 1.00*2^(- 7)  (+0b1.00 * 2^(- 7))                 |
> |    `00110` ( 6) | 1.00*2^(- 4)  (+0b1.00 * 2^(- 4)) = 0.0625        |
> |    `00111` ( 7) | 1.00*2^(- 3)  (+0b1.00 * 2^(- 3)) = 0.125         |
> |    `01000` ( 8) | 1.00*2^(- 2)  (+0b1.00 * 2^(- 2)) : 0.25          |
> |    `01001` ( 9) | 1.25*2^(- 2)  (+0b1.01 * 2^(- 2)) : 0.3125        |
> |    `01010` (10) | 1.50*2^(- 2)  (+0b1.10 * 2^(- 2)) : 0.375         |
> |    `01011` (11) | 1.75*2^(- 2)  (+0b1.11 * 2^(- 2)) : 0.4375        |
> |    `01100` (12) | 1.00*2^(- 1)  (+0b1.00 * 2^(- 1)) : 0.5           |
> |    `01101` (13) | 1.25*2^(- 1)  (+0b1.01 * 2^(- 1)) : 0.625         |
> |    `01110` (14) | 1.50*2^(- 1)  (+0b1.10 * 2^(- 1)) : 0.75          |
> |    `01111` (15) | 1.75*2^(- 1)  (+0b1.11 * 2^(- 1)) : 0.875         |
> |    `10000` (16) | 1.00*2^(+ 0)  (+0b1.00 * 2^(+ 0)) : 1.0           |
> |    `10001` (17) | 1.25*2^(+ 0)  (+0b1.01 * 2^(+ 0)) : 1.25          |
> |    `10010` (18) | 1.50*2^(+ 0)  (+0b1.10 * 2^(+ 0)) : 1.5           |
> |    `10011` (19) | 1.75*2^(+ 0)  (+0b1.11 * 2^(+ 0)) : 1.75          |
> |    `10100` (20) | 1.00*2^(+ 1)  (+0b1.00 * 2^(+ 1)) : 2.0           |
> |    `10101` (21) | 1.25*2^(+ 1)  (+0b1.01 * 2^(+ 1)) : 2.5           |
> |    `10110` (22) | 1.50*2^(+ 1)  (+0b1.10 * 2^(+ 1)) : 3.0           |
> |    `10111` (23) | 1.00*2^(+ 2)  (+0b1.00 * 2^(+ 2)) = 4             |
> |    `11000` (24) | 1.00*2^(+ 3)  (+0b1.00 * 2^(+ 3)) = 8             |
> |    `11001` (25) | 1.00*2^(+ 4)  (+0b1.00 * 2^(+ 4)) = 16            |
> |    `11010` (26) | 1.00*2^(+ 7)  (+0b1.00 * 2^(+ 7)) = 128           |
> |    `11011` (27) | 1.00*2^(+ 8)  (+0b1.00 * 2^(+ 8)) = 256           |
> |    `11100` (28) | 1.00*2^(+15)  (+0b1.00 * 2^(+15)) = 32768         |
> |    `11101` (29) | 1.00*2^(+16)  (+0b1.00 * 2^(+16)) = 65536         |
> |                 | On "fli.h", this is equivalent to positive inf.   |
> |    `11110` (30) | Positive infinity                                 |
> |                 | sign=[0] exponent=[1..11] significand=[000..000]  |
> |    `11111` (31) | Canonical NaN (positive, quiet and zero payload)  |
> |                 | sign=[0] exponent=[1..11] significand=[100..000]  |
> 
> Currently, initializing a FP constant (except zero) involves memory and its
> use can be reduced by FLI instructions.
> 
> We may have a room to generate much complex constants with multiple FLI
> instructions (e.g. like long integer constants) but for starter, we can
> begin with optimizing one FP constant initialization with one FLI
> instruction (and because FP arithmetic often requires larger latency,
> benefits of making multiple FLI sequence is not high compared to integers).
> 
> 
> FLI FP constant checking
> =========================
> 
> An instruction with a similar role to RISC-V's FLI instructions is the Arm/
> AArch64's vmov.f32 instruction. It provides a load-immediate operation for
> constant that can be represented in the following form:
> 
>> (-1)^s * 0b1.xxxx * 2^r   (where -3 <= r <= +4; fits in 3-bits)
> 
> This patch is largely influenced by AArch64's handling but
> compared to this, handling RISC-V's FLI FP constant can be a little tricky.
> 
> *   FLI normally generates only values with sign bit 0 except the binary
>     encoding 0 (which loads -1.0 with sign bit 1).
> *   Not only finite values, FLI can generate positive infinity and
>     canonical NaN.
> *   Because FLI can generate canonical NaN, handling NaN is preferred but
>     FLI only generates canonical NaN.  Since we can easily create a non-
>     canonical NaN with __builtin_nan ("[PAYLOAD]") and that could be a
>     direct return value of a function, we must reject non-canonical NaNs
>     (otherwise it'll generate "fli.d fa0,nan" where NaN is non-canonical).
> *   Exponent range and mantissa constraint is a bit tricky.
>     On binary encodings 8-22, it looks like 0b1.xx * 2^r (where -2 <= 1)
>     but we have to explicitly reject 0b1.11 * 2^1 (that is 3.5) because
>     the value 3.5 is not in the list.
>     Other 1.00 * 2^r values have discontinuous r.
> *   Binary encoding 1 (minimum positive normal value for corresponding
>     type) depends on the type (or mode) we are on.
> *   Assembler accepts three string operands: "min", "inf" and "nan".
> 
> Handling those like aarch64_float_const_representable_p can be
> inefficient.  So, I implemented riscv_get_float_fli_const function which
> returns complex information about a FLI constant (including whether the
> constant is valid for a FLI constant).
> 
> This complex information contains:
> 
> 1.  Validness
> 2.  Sign bit (only set for -1.0)
> 3.  FLI constant type ("min", "inf", "nan" or a finite number but "min")
> 4.  Highest two bits of mantissa under the point (xx for 0b1.xx)
>     on a finite value except "min".
> 5.  Biased exponent (yet sparse representation to make handling easier)
>     on a finite value except "min".  For 0b1.xx * 2^r, (r+16) is stored.
>     Valid range of this is [0, 32] (inclusive) so it requires 6 bits.
> 
> On many ABIs, those information is packed into an integer sized bitfield.
> 
> 
> New Constraint: "H"
> ====================
> 
> According to the GCC Internals documentation, (along with "G") "H" is
> preferred for a machine-dependent fashion to permit immediate floating
> operands in particular ranges of values.  Because "G" is already used to
> represent +0.0, this patch set uses "H" for FLI-capable FP constants.
> 
> It adds one variant per operation:
> 
> *   movhf_hardfloat
> *   movsf_hardfloat
> *   movdf_hardfloat_rv32
> *   movdf_hardfloat_rv64
> 
> Note that the 'Zfa' extension requires the 'F' extension (which is the
> hard float).
> 
> 
> 
> Portions that I'm not sure whether they are okay
> =================================================
> 
> *   NaN handling (comparison with canonical NaN)
>     Due to constraints, I had to compare a NaN with known binary
>     representations with known IEEE 754 binary16/32/64's canonical NaN but
>     it there any better way to perform this?
> *   Any ICE possibility?
>     For simple programs, I confirmed that no ICE occurs but I'm not sure
>     whether this applies to other programs.  If I miss some cases in
>     riscv_output_move or riscv_print_operand functions (corresponding
>     mov instructions in riscv.md), it can easily cause an ICE.
> 
> 
> Sincerely,
> Tsukasa
> 
> 
> 
> 
> Tsukasa OI (2):
>   RISC-V: Add support for the 'Zfa' extension
>   RISC-V: Constant FP Optimization with 'Zfa'
> 
>  gcc/common/config/riscv/riscv-common.cc    |   3 +
>  gcc/config/riscv/constraints.md            |   7 +
>  gcc/config/riscv/riscv-opts.h              |   2 +
>  gcc/config/riscv/riscv-protos.h            |  34 +++
>  gcc/config/riscv/riscv.cc                  | 250 ++++++++++++++++++++-
>  gcc/config/riscv/riscv.md                  |  24 +-
>  gcc/testsuite/gcc.target/riscv/zfa-fli-1.c |  24 ++
>  gcc/testsuite/gcc.target/riscv/zfa-fli-2.c |  24 ++
>  gcc/testsuite/gcc.target/riscv/zfa-fli-3.c |  14 ++
>  gcc/testsuite/gcc.target/riscv/zfa-fli-4.c | 111 +++++++++
>  gcc/testsuite/gcc.target/riscv/zfa-fli-5.c |  98 ++++++++
>  gcc/testsuite/gcc.target/riscv/zfa-fli-6.c |  61 +++++
>  gcc/testsuite/gcc.target/riscv/zfa-fli-7.c |  30 +++
>  gcc/testsuite/gcc.target/riscv/zfa-fli-8.c |  39 ++++
>  14 files changed, 697 insertions(+), 24 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-3.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-4.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-5.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-6.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-7.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-8.c
> 
> 
> base-commit: 614052dd4ea083e086712809c754ffebd9361316