diff mbox series

[RFC/PATCH] libgcc: sh: Use soft-fp for non-hosted SH3/SH4

Message ID 20240703100855.3855337-1-sebastien.michelland@lcis.grenoble-inp.fr
State New
Headers show
Series [RFC/PATCH] libgcc: sh: Use soft-fp for non-hosted SH3/SH4 | expand

Commit Message

Sébastien Michelland July 3, 2024, 9:59 a.m. UTC
libgcc's fp-bit.c is quite slow and most modern/developed architectures
have switched to using the soft-fp library. This patch does so for
free-standing/unknown-OS SH3/SH4 builds, using soft-fp's default parameters
for the most part, most notably no exceptions.

A quick run of Whetstone (built with OpenLibm) on an SH4 machine shows
about x3 speedup (~320 -> 1050 Kwhets/s).

I'm sending this as RFC because I'm quite unsure about testing. I built
the compiler and ran the benchmark, but I don't know if GCC has a test
for soft-fp correctness and whether I can run that in my non-hosted
environment. Any advice?

Cheers,
Sébastien

libgcc/ChangeLog:

        * config.host: Use soft-fp library for non-hosted SH3/SH4
        instead of fpdbit.
        * config/sh/sfp-machine.h: New.

Signed-off-by: Sébastien Michelland <sebastien.michelland@lcis.grenoble-inp.fr>
---
 libgcc/config.host             | 10 +++-
 libgcc/config/sh/sfp-machine.h | 83 ++++++++++++++++++++++++++++++++++
 2 files changed, 92 insertions(+), 1 deletion(-)
 create mode 100644 libgcc/config/sh/sfp-machine.h

Comments

Jeff Law July 3, 2024, 3:59 p.m. UTC | #1
On 7/3/24 3:59 AM, Sébastien Michelland wrote:
> libgcc's fp-bit.c is quite slow and most modern/developed architectures
> have switched to using the soft-fp library. This patch does so for
> free-standing/unknown-OS SH3/SH4 builds, using soft-fp's default parameters
> for the most part, most notably no exceptions.
> 
> A quick run of Whetstone (built with OpenLibm) on an SH4 machine shows
> about x3 speedup (~320 -> 1050 Kwhets/s).
> 
> I'm sending this as RFC because I'm quite unsure about testing. I built
> the compiler and ran the benchmark, but I don't know if GCC has a test
> for soft-fp correctness and whether I can run that in my non-hosted
> environment. Any advice?
> 
> Cheers,
> Sébastien
> 
> libgcc/ChangeLog:
> 
>          * config.host: Use soft-fp library for non-hosted SH3/SH4
>          instead of fpdbit.
>          * config/sh/sfp-machine.h: New.
I'd really like to hear from Oleg on this, though given we're using the 
soft-fp library on other targets it seems reasonable at a high level.

As far as testing, the GCC testsuite has some FP components which would 
implicitly test soft fp on any target that doesn't have hardware 
floating point.



Jeff
Sébastien Michelland July 3, 2024, 5:28 p.m. UTC | #2
On 2024-07-03 17:59, Jeff Law wrote:
> On 7/3/24 3:59 AM, Sébastien Michelland wrote:
>> libgcc's fp-bit.c is quite slow and most modern/developed architectures
>> have switched to using the soft-fp library. This patch does so for
>> free-standing/unknown-OS SH3/SH4 builds, using soft-fp's default 
>> parameters
>> for the most part, most notably no exceptions.
>>
>> A quick run of Whetstone (built with OpenLibm) on an SH4 machine shows
>> about x3 speedup (~320 -> 1050 Kwhets/s).
>>
>> I'm sending this as RFC because I'm quite unsure about testing. I built
>> the compiler and ran the benchmark, but I don't know if GCC has a test
>> for soft-fp correctness and whether I can run that in my non-hosted
>> environment. Any advice?
>>
>> Cheers,
>> Sébastien
>>
>> libgcc/ChangeLog:
>>
>>          * config.host: Use soft-fp library for non-hosted SH3/SH4
>>          instead of fpdbit.
>>          * config/sh/sfp-machine.h: New.
> I'd really like to hear from Oleg on this, though given we're using the 
> soft-fp library on other targets it seems reasonable at a high level.
> 
> As far as testing, the GCC testsuite has some FP components which would 
> implicitly test soft fp on any target that doesn't have hardware 
> floating point.

Thank you. I went this route, following the guide [1] and the 
instructions for cross-compiling [2] before hitting "Newlib does not 
support CPU sh3eb" which I should have seen coming.

There are plenty of random ports lying around but just grabbing one 
doesn't feel right (and I don't have a canonical one to go to as I 
usually run a custom libc for... mostly bad reasons).

Deferring maybe again to the few SH users... how do you usually do it?

Sébastien

[1] https://gcc.gnu.org/install/test.html
[2] https://gcc.gnu.org/simtest-howto.html
Oleg Endo July 4, 2024, 12:21 a.m. UTC | #3
Hi!

On Wed, 2024-07-03 at 19:28 +0200, Sébastien Michelland wrote:
> On 2024-07-03 17:59, Jeff Law wrote:
> > On 7/3/24 3:59 AM, Sébastien Michelland wrote:
> > > libgcc's fp-bit.c is quite slow and most modern/developed architectures
> > > have switched to using the soft-fp library. This patch does so for
> > > free-standing/unknown-OS SH3/SH4 builds, using soft-fp's default 
> > > parameters
> > > for the most part, most notably no exceptions.
> > > 
> > > A quick run of Whetstone (built with OpenLibm) on an SH4 machine shows
> > > about x3 speedup (~320 -> 1050 Kwhets/s).
> > > 
> > > I'm sending this as RFC because I'm quite unsure about testing. I built
> > > the compiler and ran the benchmark, but I don't know if GCC has a test
> > > for soft-fp correctness and whether I can run that in my non-hosted
> > > environment. Any advice?
> > > 
> > > Cheers,
> > > Sébastien
> > > 
> > > libgcc/ChangeLog:
> > > 
> > >          * config.host: Use soft-fp library for non-hosted SH3/SH4
> > >          instead of fpdbit.
> > >          * config/sh/sfp-machine.h: New.

> > I'd really like to hear from Oleg on this, though given we're using the 
> > soft-fp library on other targets it seems reasonable at a high level.

I don't understand why this is being limited to SH3 and SH4 only?
Almost all SH4 systems out there have an FPU (unless special configurations
are used).  So I'd say if switching to soft-fp, then for SH-anything, not
just SH3/SH4.

If it yields some improvements for some users, I'm all for it.

> > As far as testing, the GCC testsuite has some FP components which would 
> > implicitly test soft fp on any target that doesn't have hardware 
> > floating point.
> 
> Thank you. I went this route, following the guide [1] and the 
> instructions for cross-compiling [2] before hitting "Newlib does not 
> support CPU sh3eb" which I should have seen coming.
> 
> There are plenty of random ports lying around but just grabbing one 
> doesn't feel right (and I don't have a canonical one to go to as I 
> usually run a custom libc for... mostly bad reasons).
> 
> Deferring maybe again to the few SH users... how do you usually do it?
> 
> 

I think it would make sense to test it using sh-sim on SH2 big-endian and
little endian at least, as that doesn't have an FPU and hence would run
tests utilizing soft-fp.

After building the toolchain for --target=sh-elf, you can use this to run
the testsuite in the simulator:

make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb}"

(add make -j parameter according to you needs -- it will be slow)

Let me know if you have any further questions.

Best regards,
Oleg Endo
Sébastien Michelland July 5, 2024, 7:28 a.m. UTC | #4
Hi Oleg!

> I don't understand why this is being limited to SH3 and SH4 only?
> Almost all SH4 systems out there have an FPU (unless special configurations
> are used).  So I'd say if switching to soft-fp, then for SH-anything, not
> just SH3/SH4.
> 
> If it yields some improvements for some users, I'm all for it.

Yeah I just defaulted to SH3/SH4 conservatively because that's the only 
hardware I have. (My main platform also happens to be one of these SH4 
without an FPU, the SH4AL-DSP.)

Once this is tested/validated on simulator, I'll happily simplify the 
patch to apply to all SH.

> I think it would make sense to test it using sh-sim on SH2 big-endian and
> little endian at least, as that doesn't have an FPU and hence would run
> tests utilizing soft-fp.
> 
> After building the toolchain for --target=sh-elf, you can use this to run
> the testsuite in the simulator:
> 
> make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb}"
> 
> (add make -j parameter according to you needs -- it will be slow)

Alright, it might take a little bit.

Building the combined tree of gcc/binutils/newlib masters (again 
following [1]) gives me an ICE in libstdc++v3/src/libbacktrace, 
irrespective of my libgcc change:

---
during RTL pass: final
elf.c: In function ‘elf_zstd_decompress’:
elf.c:4999:1: internal compiler error: in output_296, at 
config/sh/sh.md:8408
  4999 | }
       | ^
0x1c8765e internal_error(char const*, ...)
	../../combined/gcc/diagnostic-global-context.cc:491
0x881269 fancy_abort(char const*, int, char const*)
	../../combined/gcc/diagnostic.cc:1725
0x83b73b output_296
	../../combined/gcc/config/sh/sh.md:8408
0x83b73b output_296
	../../combined/gcc/config/sh/sh.md:8063
0xb783c2 final_scan_insn_1
	../../combined/gcc/final.cc:2773
0xb78938 final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*)
	../../combined/gcc/final.cc:2886
0xb78b5f final_1
	../../combined/gcc/final.cc:1977
0xb796a8 rest_of_handle_final
	../../combined/gcc/final.cc:4239
0xb796a8 execute
	../../combined/gcc/final.cc:4317
Please submit a full bug report, with preprocessed source (by using 
-freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
make[9]: *** [Makefile:628: std_stacktrace-elf.lo] Error 1
make[9]: *** Waiting for unfinished jobs....
make[9]: Leaving directory 
'/home/el/Programs/sh-elf-gcc/build-combined2/sh-elf/m2a/libstdc++-v3/src/libbacktrace'
---

My configure, for reference (--disable-source-highlight came up from a 
configure error earlier):

../combined/configure                       \
     --prefix="$PREFIX"                      \
     --target="sh-elf"                       \
     --enable-languages="c,c++"              \
     --disable-gdb                           \
     --disable-source-highlight

The libbacktrace build in gcc (make all-libbacktrace) works without an 
issue.

I'll have to prepare a bug report (I couldn't find anything related), 
but bisecting on a triplet of repos doesn't sound very fun and I believe 
I do need the newlib to build libstdc++ in a reproducible way.

Any advice before I go on that tangent?

Sébastien

[1] https://gcc.gnu.org/simtest-howto.html
diff mbox series

Patch

diff --git a/libgcc/config.host b/libgcc/config.host
index 9fae51d4c..fee3bf0c0 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1399,7 +1399,15 @@  s390x-ibm-tpf*)
 	md_unwind_header=s390/tpf-unwind.h
 	;;
 sh-*-elf* | sh[12346l]*-*-elf*)
-	tmake_file="$tmake_file sh/t-sh t-crtstuff-pic t-fdpbit"
+	tmake_file="$tmake_file sh/t-sh t-crtstuff-pic"
+	case ${host} in
+	sh[34]*-*-elf*)
+		tmake_file="${tmake_file} t-softfp-sfdf t-softfp"
+		;;
+	*)
+		tmake_file="${tmake_file} t-fdpbit"
+		;;
+	esac
 	extra_parts="$extra_parts crt1.o crti.o crtn.o crtbeginS.o crtendS.o \
 		libic_invalidate_array_4-100.a \
 		libic_invalidate_array_4-200.a \
diff --git a/libgcc/config/sh/sfp-machine.h b/libgcc/config/sh/sfp-machine.h
new file mode 100644
index 000000000..c1aa428c0
--- /dev/null
+++ b/libgcc/config/sh/sfp-machine.h
@@ -0,0 +1,83 @@ 
+/* Software floating-point machine description for SuperH.
+
+   Copyright (C) 2016-2024 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+#define _FP_W_TYPE_SIZE     32
+#define _FP_W_TYPE      unsigned long
+#define _FP_WS_TYPE     signed long
+#define _FP_I_TYPE      long
+
+#define _FP_MUL_MEAT_S(R,X,Y)               \
+  _FP_MUL_MEAT_1_wide(_FP_WFRACBITS_S,R,X,Y,umul_ppmm)
+#define _FP_MUL_MEAT_D(R,X,Y)               \
+  _FP_MUL_MEAT_2_wide(_FP_WFRACBITS_D,R,X,Y,umul_ppmm)
+#define _FP_MUL_MEAT_Q(R,X,Y)               \
+  _FP_MUL_MEAT_4_wide(_FP_WFRACBITS_Q,R,X,Y,umul_ppmm)
+
+#define _FP_DIV_MEAT_S(R,X,Y)   _FP_DIV_MEAT_1_udiv_norm(S,R,X,Y)
+#define _FP_DIV_MEAT_D(R,X,Y)   _FP_DIV_MEAT_2_udiv(D,R,X,Y)
+#define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_4_udiv(Q,R,X,Y)
+
+#define _FP_NANFRAC_B       _FP_QNANBIT_B
+#define _FP_NANFRAC_H       _FP_QNANBIT_H
+#define _FP_NANFRAC_S       _FP_QNANBIT_S
+#define _FP_NANFRAC_D       _FP_QNANBIT_D, 0
+#define _FP_NANFRAC_Q       _FP_QNANBIT_Q, 0, 0, 0
+
+/* The type of the result of a floating point comparison.  This must
+   match __libgcc_cmp_return__ in GCC for the target.  */
+typedef int __gcc_CMPtype __attribute__ ((mode (__libgcc_cmp_return__)));
+#define CMPtype __gcc_CMPtype
+
+#define _FP_NANSIGN_B       0
+#define _FP_NANSIGN_H       0
+#define _FP_NANSIGN_S       0
+#define _FP_NANSIGN_D       0
+#define _FP_NANSIGN_Q       0
+
+#define _FP_KEEPNANFRACP 0
+#define _FP_QNANNEGATEDP 0
+
+#define _FP_CHOOSENAN(fs, wc, R, X, Y, OP)  \
+  do {                      \
+    R##_s = _FP_NANSIGN_##fs;           \
+    _FP_FRAC_SET_##wc(R,_FP_NANFRAC_##fs);  \
+    R##_c = FP_CLS_NAN;             \
+  } while (0)
+
+#define _FP_TININESS_AFTER_ROUNDING 1
+
+#define __LITTLE_ENDIAN 1234
+#define __BIG_ENDIAN    4321
+
+#if defined(__BYTE_ORDER__) && (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+#define __BYTE_ORDER __BIG_ENDIAN
+#else
+#define __BYTE_ORDER __LITTLE_ENDIAN
+#endif
+
+/* Define ALIASNAME as a strong alias for NAME.  */
+# define strong_alias(name, aliasname) _strong_alias(name, aliasname)
+# define _strong_alias(name, aliasname) \
+  extern __typeof (name) aliasname __attribute__ ((alias (#name)));