From patchwork Mon Aug 14 05:32:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tsukasa OI X-Patchwork-Id: 1820843 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=Uw5Q4Ame; dkim-atps=neutral Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RPNMd0vYjz1yfP for ; Mon, 14 Aug 2023 15:32:44 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 423073858402 for ; Mon, 14 Aug 2023 05:32:42 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 423073858402 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1691991162; bh=9z4rBxq6/WGhYAeqrjJpaJp0cC/R0cLLHEpxIR23tsM=; h=To:Cc:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=Uw5Q4AmeMV5b/Pv6Gms6DMl208xS+Udn1A6f9jjM9bAb8QrxIbASyxE0BbCltAmVa nyS7oeIj0b1U5oPq80WCrmZ8NhRz3DDTL3Jux4j3AHx1+Q2qr6TRt1xKGtrJaS+Miy G6nCqfVXMym7Dy9UpPLY/Q4maBQ1pQqk2zOEhOrU= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-sender-0.a4lg.com (mail-sender.a4lg.com [153.120.152.154]) by sourceware.org (Postfix) with ESMTPS id AF6453858D39 for ; Mon, 14 Aug 2023 05:32:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org AF6453858D39 Received: from [127.0.0.1] (localhost [127.0.0.1]) by mail-sender-0.a4lg.com (Postfix) with ESMTPSA id C0036300089; Mon, 14 Aug 2023 05:32:18 +0000 (UTC) To: Tsukasa OI , Kito Cheng , Palmer Dabbelt , Andrew Waterman , Jim Wilson Cc: gcc-patches@gcc.gnu.org Subject: [PATCH 0/2] RISC-V: Make "prefetch.i" built-in usable Date: Mon, 14 Aug 2023 05:32:08 +0000 Message-ID: Mime-Version: 1.0 X-Spam-Status: No, score=-5.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, KAM_ASCII_DIVIDERS, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tsukasa OI via Gcc-patches From: Tsukasa OI Reply-To: Tsukasa OI Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Hello, and... I think this might be my first *large* patch set for GCC contribution and definitely the first one to touch the machine description. So, please review it carefully. Background =========== This patch set adds an optimization to FP constant initialization using a FLI instruction, which is a part of the 'Zfa' extension which provides additional floating-point instructions. FLI instructions ("fli.h" for binary16, "fli.s" for binary32, "fli.d" for binary64 and "fli.q" for binary128 [which can be ignored because current GCC for RISC-V does not natively support binary128]) provide an load-immediate operation for following 32 immediates. | Binary Encoding | Immediate (and its part of binary representation) | | --------------- | --------------------------------------------------| | `00000` ( 0) | -1.0 (-0b1.00 * 2^(+ 0)) | | `00001` ( 1) | Minimum positive normal value | | | sign=[0] exponent=[0..01] significand=[000..000] | | `00010` ( 2) | 1.00*2^(-16) (+0b1.00 * 2^(-16)) | | `00011` ( 3) | 1.00*2^(-15) (+0b1.00 * 2^(-15)) | | `00100` ( 4) | 1.00*2^(- 8) (+0b1.00 * 2^(- 8)) | | `00101` ( 5) | 1.00*2^(- 7) (+0b1.00 * 2^(- 7)) | | `00110` ( 6) | 1.00*2^(- 4) (+0b1.00 * 2^(- 4)) = 0.0625 | | `00111` ( 7) | 1.00*2^(- 3) (+0b1.00 * 2^(- 3)) = 0.125 | | `01000` ( 8) | 1.00*2^(- 2) (+0b1.00 * 2^(- 2)) : 0.25 | | `01001` ( 9) | 1.25*2^(- 2) (+0b1.01 * 2^(- 2)) : 0.3125 | | `01010` (10) | 1.50*2^(- 2) (+0b1.10 * 2^(- 2)) : 0.375 | | `01011` (11) | 1.75*2^(- 2) (+0b1.11 * 2^(- 2)) : 0.4375 | | `01100` (12) | 1.00*2^(- 1) (+0b1.00 * 2^(- 1)) : 0.5 | | `01101` (13) | 1.25*2^(- 1) (+0b1.01 * 2^(- 1)) : 0.625 | | `01110` (14) | 1.50*2^(- 1) (+0b1.10 * 2^(- 1)) : 0.75 | | `01111` (15) | 1.75*2^(- 1) (+0b1.11 * 2^(- 1)) : 0.875 | | `10000` (16) | 1.00*2^(+ 0) (+0b1.00 * 2^(+ 0)) : 1.0 | | `10001` (17) | 1.25*2^(+ 0) (+0b1.01 * 2^(+ 0)) : 1.25 | | `10010` (18) | 1.50*2^(+ 0) (+0b1.10 * 2^(+ 0)) : 1.5 | | `10011` (19) | 1.75*2^(+ 0) (+0b1.11 * 2^(+ 0)) : 1.75 | | `10100` (20) | 1.00*2^(+ 1) (+0b1.00 * 2^(+ 1)) : 2.0 | | `10101` (21) | 1.25*2^(+ 1) (+0b1.01 * 2^(+ 1)) : 2.5 | | `10110` (22) | 1.50*2^(+ 1) (+0b1.10 * 2^(+ 1)) : 3.0 | | `10111` (23) | 1.00*2^(+ 2) (+0b1.00 * 2^(+ 2)) = 4 | | `11000` (24) | 1.00*2^(+ 3) (+0b1.00 * 2^(+ 3)) = 8 | | `11001` (25) | 1.00*2^(+ 4) (+0b1.00 * 2^(+ 4)) = 16 | | `11010` (26) | 1.00*2^(+ 7) (+0b1.00 * 2^(+ 7)) = 128 | | `11011` (27) | 1.00*2^(+ 8) (+0b1.00 * 2^(+ 8)) = 256 | | `11100` (28) | 1.00*2^(+15) (+0b1.00 * 2^(+15)) = 32768 | | `11101` (29) | 1.00*2^(+16) (+0b1.00 * 2^(+16)) = 65536 | | | On "fli.h", this is equivalent to positive inf. | | `11110` (30) | Positive infinity | | | sign=[0] exponent=[1..11] significand=[000..000] | | `11111` (31) | Canonical NaN (positive, quiet and zero payload) | | | sign=[0] exponent=[1..11] significand=[100..000] | Currently, initializing a FP constant (except zero) involves memory and its use can be reduced by FLI instructions. We may have a room to generate much complex constants with multiple FLI instructions (e.g. like long integer constants) but for starter, we can begin with optimizing one FP constant initialization with one FLI instruction (and because FP arithmetic often requires larger latency, benefits of making multiple FLI sequence is not high compared to integers). FLI FP constant checking ========================= An instruction with a similar role to RISC-V's FLI instructions is the Arm/ AArch64's vmov.f32 instruction. It provides a load-immediate operation for constant that can be represented in the following form: > (-1)^s * 0b1.xxxx * 2^r (where -3 <= r <= +4; fits in 3-bits) This patch is largely influenced by AArch64's handling but compared to this, handling RISC-V's FLI FP constant can be a little tricky. * FLI normally generates only values with sign bit 0 except the binary encoding 0 (which loads -1.0 with sign bit 1). * Not only finite values, FLI can generate positive infinity and canonical NaN. * Because FLI can generate canonical NaN, handling NaN is preferred but FLI only generates canonical NaN. Since we can easily create a non- canonical NaN with __builtin_nan ("[PAYLOAD]") and that could be a direct return value of a function, we must reject non-canonical NaNs (otherwise it'll generate "fli.d fa0,nan" where NaN is non-canonical). * Exponent range and mantissa constraint is a bit tricky. On binary encodings 8-22, it looks like 0b1.xx * 2^r (where -2 <= 1) but we have to explicitly reject 0b1.11 * 2^1 (that is 3.5) because the value 3.5 is not in the list. Other 1.00 * 2^r values have discontinuous r. * Binary encoding 1 (minimum positive normal value for corresponding type) depends on the type (or mode) we are on. * Assembler accepts three string operands: "min", "inf" and "nan". Handling those like aarch64_float_const_representable_p can be inefficient. So, I implemented riscv_get_float_fli_const function which returns complex information about a FLI constant (including whether the constant is valid for a FLI constant). This complex information contains: 1. Validness 2. Sign bit (only set for -1.0) 3. FLI constant type ("min", "inf", "nan" or a finite number but "min") 4. Highest two bits of mantissa under the point (xx for 0b1.xx) on a finite value except "min". 5. Biased exponent (yet sparse representation to make handling easier) on a finite value except "min". For 0b1.xx * 2^r, (r+16) is stored. Valid range of this is [0, 32] (inclusive) so it requires 6 bits. On many ABIs, those information is packed into an integer sized bitfield. New Constraint: "H" ==================== According to the GCC Internals documentation, (along with "G") "H" is preferred for a machine-dependent fashion to permit immediate floating operands in particular ranges of values. Because "G" is already used to represent +0.0, this patch set uses "H" for FLI-capable FP constants. It adds one variant per operation: * movhf_hardfloat * movsf_hardfloat * movdf_hardfloat_rv32 * movdf_hardfloat_rv64 Note that the 'Zfa' extension requires the 'F' extension (which is the hard float). Portions that I'm not sure whether they are okay ================================================= * NaN handling (comparison with canonical NaN) Due to constraints, I had to compare a NaN with known binary representations with known IEEE 754 binary16/32/64's canonical NaN but it there any better way to perform this? * Any ICE possibility? For simple programs, I confirmed that no ICE occurs but I'm not sure whether this applies to other programs. If I miss some cases in riscv_output_move or riscv_print_operand functions (corresponding mov instructions in riscv.md), it can easily cause an ICE. Sincerely, Tsukasa Tsukasa OI (2): RISC-V: Add support for the 'Zfa' extension RISC-V: Constant FP Optimization with 'Zfa' gcc/common/config/riscv/riscv-common.cc | 3 + gcc/config/riscv/constraints.md | 7 + gcc/config/riscv/riscv-opts.h | 2 + gcc/config/riscv/riscv-protos.h | 34 +++ gcc/config/riscv/riscv.cc | 250 ++++++++++++++++++++- gcc/config/riscv/riscv.md | 24 +- gcc/testsuite/gcc.target/riscv/zfa-fli-1.c | 24 ++ gcc/testsuite/gcc.target/riscv/zfa-fli-2.c | 24 ++ gcc/testsuite/gcc.target/riscv/zfa-fli-3.c | 14 ++ gcc/testsuite/gcc.target/riscv/zfa-fli-4.c | 111 +++++++++ gcc/testsuite/gcc.target/riscv/zfa-fli-5.c | 98 ++++++++ gcc/testsuite/gcc.target/riscv/zfa-fli-6.c | 61 +++++ gcc/testsuite/gcc.target/riscv/zfa-fli-7.c | 30 +++ gcc/testsuite/gcc.target/riscv/zfa-fli-8.c | 39 ++++ 14 files changed, 697 insertions(+), 24 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-4.c create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-5.c create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-6.c create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-7.c create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-8.c base-commit: 614052dd4ea083e086712809c754ffebd9361316