Message ID | 1326674823-13069-1-git-send-email-afaerber@suse.de |
---|---|
State | New |
Headers | show |
On 16 January 2012 00:46, Andreas Färber <afaerber@suse.de> wrote: > Based on a suggestion from Alex earlier this week, I managed to run a > simple benchmark of softfloat performance with qemu-arm, as requested by > Peter. > > I went for the Whetstone floating point benchmark: > http://en.wikipedia.org/wiki/Whetstone_%28benchmark%29 > > For a loop count of 100,000 and 5 runs I got the following results: > > current: 138.9-204.1 Whetstone-MIPS > [u]int*_t: 185.2-188.7 Whetstone-MIPS > [u]int_fast*_t: 285.7-294.1 Whetstone-MIPS > > Toshiba AC100: 833.3-909.1 Whetstone-MIPS This is much better than I was expecting when I suggested running a benchmark :-) and indicates that it's worth a bit of pain to switch to the fast* types. I've tested this series with the ARM VFP/Neon tests I have (random instruction sequence testing). I found a few extra bugs which I've sent a separate patch series for: those patches need to be incorporated into this series or otherwise applied before it. I've commented on patches 01, 05, 09; the rest are Reviewed-by: Peter Maydell <peter.maydell@linaro.org> with the following caveat. The changes from int32/uint32 to int_fast32_t/uint_fast32_t are the potentially dangerous ones in this set, since they change a type that was easily mistaken for "exactly 32 bits" and happened to be 32 bits to one that is now 64 bits (on 64 bit hosts). The other type changes are either "not a width change in practice" (int64) or "change from something that was already wider than the number in it" (int16) or "wasn't being used for things where we really cared about the type width" (int8, flag). Specifically, it's this change that revealed the latent bugs I sent a three-patch set for, and so it doesn't seem too unlikely that other platforms may have some similar bugs in their int-to-float/float-to-int code that will now be revealed on 64 bit hosts. I don't think that means we shouldn't apply this patch set but I would recommend some testing of other architectures :-) -- PMM
On 16.01.2012, at 20:02, Peter Maydell wrote: > On 16 January 2012 00:46, Andreas Färber <afaerber@suse.de> wrote: >> Based on a suggestion from Alex earlier this week, I managed to run a >> simple benchmark of softfloat performance with qemu-arm, as requested by >> Peter. >> >> I went for the Whetstone floating point benchmark: >> http://en.wikipedia.org/wiki/Whetstone_%28benchmark%29 >> >> For a loop count of 100,000 and 5 runs I got the following results: >> >> current: 138.9-204.1 Whetstone-MIPS >> [u]int*_t: 185.2-188.7 Whetstone-MIPS >> [u]int_fast*_t: 285.7-294.1 Whetstone-MIPS >> >> Toshiba AC100: 833.3-909.1 Whetstone-MIPS > > This is much better than I was expecting when I suggested running > a benchmark :-) and indicates that it's worth a bit of pain to > switch to the fast* types. > > I've tested this series with the ARM VFP/Neon tests I have (random > instruction sequence testing). I found a few extra bugs which I've > sent a separate patch series for: those patches need to be > incorporated into this series or otherwise applied before it. > > I've commented on patches 01, 05, 09; the rest are > Reviewed-by: Peter Maydell <peter.maydell@linaro.org> > > with the following caveat. > > The changes from int32/uint32 to int_fast32_t/uint_fast32_t are > the potentially dangerous ones in this set, since they change a > type that was easily mistaken for "exactly 32 bits" and happened > to be 32 bits to one that is now 64 bits (on 64 bit hosts). > The other type changes are either "not a width change in practice" > (int64) or "change from something that was already wider than the > number in it" (int16) or "wasn't being used for things where we > really cared about the type width" (int8, flag). > > Specifically, it's this change that revealed the latent bugs I > sent a three-patch set for, and so it doesn't seem too unlikely > that other platforms may have some similar bugs in their > int-to-float/float-to-int code that will now be revealed on > 64 bit hosts. > > I don't think that means we shouldn't apply this patch set but > I would recommend some testing of other architectures :-) So what if we leave uint32_t be uint32_t and make the other ones _fast_? Alex
On 16 January 2012 19:12, Alexander Graf <agraf@suse.de> wrote: > > On 16.01.2012, at 20:02, Peter Maydell wrote: >> The changes from int32/uint32 to int_fast32_t/uint_fast32_t are >> the potentially dangerous ones in this set, since they change a >> type that was easily mistaken for "exactly 32 bits" and happened >> to be 32 bits to one that is now 64 bits (on 64 bit hosts). >> The other type changes are either "not a width change in practice" >> (int64) or "change from something that was already wider than the >> number in it" (int16) or "wasn't being used for things where we >> really cared about the type width" (int8, flag). >> >> Specifically, it's this change that revealed the latent bugs I >> sent a three-patch set for, and so it doesn't seem too unlikely >> that other platforms may have some similar bugs in their >> int-to-float/float-to-int code that will now be revealed on >> 64 bit hosts. >> >> I don't think that means we shouldn't apply this patch set but >> I would recommend some testing of other architectures :-) > > So what if we leave uint32_t be uint32_t and make the other ones _fast_? We'd need to get Andreas to do another benchmark run to check that this didn't lose the perf gain. Also it's kind of aesthetically not very pleasing... -- PMM
On 16.01.2012, at 20:17, Peter Maydell wrote: > On 16 January 2012 19:12, Alexander Graf <agraf@suse.de> wrote: >> >> On 16.01.2012, at 20:02, Peter Maydell wrote: >>> The changes from int32/uint32 to int_fast32_t/uint_fast32_t are >>> the potentially dangerous ones in this set, since they change a >>> type that was easily mistaken for "exactly 32 bits" and happened >>> to be 32 bits to one that is now 64 bits (on 64 bit hosts). >>> The other type changes are either "not a width change in practice" >>> (int64) or "change from something that was already wider than the >>> number in it" (int16) or "wasn't being used for things where we >>> really cared about the type width" (int8, flag). >>> >>> Specifically, it's this change that revealed the latent bugs I >>> sent a three-patch set for, and so it doesn't seem too unlikely >>> that other platforms may have some similar bugs in their >>> int-to-float/float-to-int code that will now be revealed on >>> 64 bit hosts. >>> >>> I don't think that means we shouldn't apply this patch set but >>> I would recommend some testing of other architectures :-) >> >> So what if we leave uint32_t be uint32_t and make the other ones _fast_? > > We'd need to get Andreas to do another benchmark run to check that > this didn't lose the perf gain. Also it's kind of aesthetically > not very pleasing... Yeah, but it's safe. And I would assume that 32bit integers are better optimized for than smaller ones in today's CPUs. Alex
On 16 January 2012 19:18, Alexander Graf <agraf@suse.de> wrote: > On 16.01.2012, at 20:17, Peter Maydell wrote: >> On 16 January 2012 19:12, Alexander Graf <agraf@suse.de> wrote: >>> So what if we leave uint32_t be uint32_t and make the other ones _fast_? >> >> We'd need to get Andreas to do another benchmark run to check that >> this didn't lose the perf gain. Also it's kind of aesthetically >> not very pleasing... > > Yeah, but it's safe. Safety is for people with insufficiently comprehensive test suites :-) -- PMM
On 17.01.2012, at 00:52, Peter Maydell wrote: > On 16 January 2012 19:18, Alexander Graf <agraf@suse.de> wrote: >> On 16.01.2012, at 20:17, Peter Maydell wrote: >>> On 16 January 2012 19:12, Alexander Graf <agraf@suse.de> wrote: >>>> So what if we leave uint32_t be uint32_t and make the other ones _fast_? >>> >>> We'd need to get Andreas to do another benchmark run to check that >>> this didn't lose the perf gain. Also it's kind of aesthetically >>> not very pleasing... >> >> Yeah, but it's safe. > > Safety is for people with insufficiently comprehensive test suites :-) You mean all the non-arm targets? :) Alex
On 16 January 2012 00:46, Andreas Färber <afaerber@suse.de> wrote: > For a loop count of 100,000 and 5 runs I got the following results: > > current: 138.9-204.1 Whetstone-MIPS > [u]int*_t: 185.2-188.7 Whetstone-MIPS > [u]int_fast*_t: 285.7-294.1 Whetstone-MIPS > > Toshiba AC100: 833.3-909.1 Whetstone-MIPS > > These results seem to indicate that the "fast" POSIX types are indeed > somewhat faster, both compared to exact-size POSIX types and to the > current state. OTOH I did a run of scimark2 and got: current tree: ** ** ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark ** ** for details. (Results can be submitted to pozo@nist.gov) ** ** ** Using 2.00 seconds min time per kenel. Composite Score: 12.98 FFT Mflops: 7.66 (N=1024) SOR Mflops: 19.49 (100 x 100) MonteCarlo: Mflops: 6.12 Sparse matmult Mflops: 15.34 (N=1000, nz=5000) LU Mflops: 16.28 (M=100, N=100) with patches (yours and mine): ** ** ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark ** ** for details. (Results can be submitted to pozo@nist.gov) ** ** ** Using 2.00 seconds min time per kenel. Composite Score: 11.87 FFT Mflops: 7.12 (N=1024) SOR Mflops: 17.66 (100 x 100) MonteCarlo: Mflops: 5.75 Sparse matmult Mflops: 14.03 (N=1000, nz=5000) LU Mflops: 14.81 (M=100, N=100) Hmmm... -- PMM
Am 20.01.2012 14:05, schrieb Peter Maydell: > On 16 January 2012 00:46, Andreas Färber <afaerber@suse.de> wrote: >> For a loop count of 100,000 and 5 runs I got the following results: >> >> current: 138.9-204.1 Whetstone-MIPS >> [u]int*_t: 185.2-188.7 Whetstone-MIPS >> [u]int_fast*_t: 285.7-294.1 Whetstone-MIPS >> >> Toshiba AC100: 833.3-909.1 Whetstone-MIPS >> >> These results seem to indicate that the "fast" POSIX types are indeed >> somewhat faster, both compared to exact-size POSIX types and to the >> current state. > > OTOH I did a run of scimark2 and got: > current tree: > ** ** > ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark ** > ** for details. (Results can be submitted to pozo@nist.gov) ** > ** ** > Using 2.00 seconds min time per kenel. > Composite Score: 12.98 > FFT Mflops: 7.66 (N=1024) > SOR Mflops: 19.49 (100 x 100) > MonteCarlo: Mflops: 6.12 > Sparse matmult Mflops: 15.34 (N=1000, nz=5000) > LU Mflops: 16.28 (M=100, N=100) > > with patches (yours and mine): > ** ** > ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark ** > ** for details. (Results can be submitted to pozo@nist.gov) ** > ** ** > Using 2.00 seconds min time per kenel. > Composite Score: 11.87 > FFT Mflops: 7.12 (N=1024) > SOR Mflops: 17.66 (100 x 100) > MonteCarlo: Mflops: 5.75 > Sparse matmult Mflops: 14.03 (N=1000, nz=5000) > LU Mflops: 14.81 (M=100, N=100) One difference between our test environments comes to mind: I had tested only typedefs for the int types, whereas my series also converts flag. The fixes for lm32, sparc and qemu-tool shouldn't matter here. Your patches "degrade" some variables from fast types to exact types of course. Anyway, here's my Whetstone and CoreMark results for 520c0d8d2772ccc9f9275bd934e13ec9b15774e4 (target-sparc: Fix mixup of uint64 and uint64_t) plus our patches. The margin has shrunk for Whetstone, and for CoreMark I see a slight degradation of the max. (I also note a slight Whetstone degradation between Natty and Oneiric.) Have you tried benchmarking after our preceding patches but before the actual fast conversion for comparison? Andreas master: C Converted Double Precision Whetstones: 200.0 MIPS C Converted Double Precision Whetstones: 204.1 MIPS CoreMark 1.0 : 1287.747086 / GCC4.6.2 -O2 -static -DPERFORMANCE_RUN=1 -lrt / Heap CoreMark 1.0 : 1336.094595 / GCC4.6.2 -O2 -static -DPERFORMANCE_RUN=1 -lrt / Heap CoreMark 1.0 : 1339.943722 / GCC4.6.2 -O2 -static -DPERFORMANCE_RUN=1 -lrt / Heap master + PMM + fast: C Converted Double Precision Whetstones: 204.1 MIPS C Converted Double Precision Whetstones: 208.3 MIPS CoreMark 1.0 : 1297.690112 / GCC4.6.2 -O2 -static -DPERFORMANCE_RUN=1 -lrt / Heap CoreMark 1.0 : 1299.629606 / GCC4.6.2 -O2 -static -DPERFORMANCE_RUN=1 -lrt / Heap CoreMark 1.0 : 1309.071868 / GCC4.6.2 -O2 -static -DPERFORMANCE_RUN=1 -lrt / Heap CoreMark 1.0 : 1315.270288 / GCC4.6.2 -O2 -static -DPERFORMANCE_RUN=1 -lrt / Heap CoreMark 1.0 : 1318.913216 / GCC4.6.2 -O2 -static -DPERFORMANCE_RUN=1 -lrt / Heap CoreMark 1.0 : 1319.870653 / GCC4.6.2 -O2 -static -DPERFORMANCE_RUN=1 -lrt / Heap CoreMark 1.0 : 1321.527686 / GCC4.6.2 -O2 -static -DPERFORMANCE_RUN=1 -lrt / Heap AC100 (Ubuntu Oneiric): C Converted Double Precision Whetstones: 769.2 MIPS C Converted Double Precision Whetstones: 833.3 MIPS CoreMark 1.0 : 2257.506208 / GCC4.6.1 -O2 -static -DPERFORMANCE_RUN=1 -lrt / Heap CoreMark 1.0 : 2303.086135 / GCC4.6.1 -O2 -static -DPERFORMANCE_RUN=1 -lrt / Heap CoreMark 1.0 : 2326.122354 / GCC4.6.1 -O2 -static -DPERFORMANCE_RUN=1 -lrt / Heap CoreMark 1.0 : 2349.624060 / GCC4.6.1 -O2 -static -DPERFORMANCE_RUN=1 -lrt / Heap CoreMark 1.0 : 2350.360389 / GCC4.6.1 -O2 -static -DPERFORMANCE_RUN=1 -lrt / Heap Pandaboard (openSUSE Factory): C Converted Double Precision Whetstones: 833.3 MIPS CoreMark 1.0 : 2304.855562 / GCC4.6.2 -O2 -static -DPERFORMANCE_RUN=1 -lrt / Heap CoreMark 1.0 : 2305.209774 / GCC4.6.2 -O2 -static -DPERFORMANCE_RUN=1 -lrt / Heap CoreMark 1.0 : 2306.273063 / GCC4.6.2 -O2 -static -DPERFORMANCE_RUN=1 -lrt / Heap CoreMark 1.0 : 2306.627710 / GCC4.6.2 -O2 -static -DPERFORMANCE_RUN=1 -lrt / Heap Results are sorted, duplicates removed. whetstone.c was compiled with -mfloat-abi=hard this time, using GCC (SUSE Linux) 4.6.2, and so was CoreMark except on the AC100 with GCC (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1. coremark.exe was run with parameters 0x0 0x0 0x66 0.
diff --git a/fpu/softfloat.h b/fpu/softfloat.h index 07c2929..6fe19d7 100644 --- a/fpu/softfloat.h +++ b/fpu/softfloat.h @@ -57,11 +57,11 @@ typedef uint8_t flag; typedef uint8_t uint8; typedef int8_t int8; #ifndef _AIX -typedef int uint16; -typedef int int16; +typedef uint16_t uint16; +typedef int16_t int16; #endif -typedef unsigned int uint32; -typedef signed int int32; +typedef uint32_t uint32; +typedef int32_t int32; typedef uint64_t uint64; typedef int64_t int64; --- [u]int_fast*_t: --- diff --git a/fpu/softfloat.h b/fpu/softfloat.h index 07c2929..43486aa 100644 --- a/fpu/softfloat.h +++ b/fpu/softfloat.h @@ -54,16 +54,16 @@ these four paragraphs for those parts of this code that are retained. | to the same as `int'. *----------------------------------------------------------------------------*/ typedef uint8_t flag; -typedef uint8_t uint8; -typedef int8_t int8; +typedef uint_fast8_t uint8; +typedef int_fast8_t int8; #ifndef _AIX -typedef int uint16; -typedef int int16; +typedef uint_fast16_t uint16; +typedef int_fast16_t int16; #endif -typedef unsigned int uint32; -typedef signed int int32; -typedef uint64_t uint64; -typedef int64_t int64; +typedef uint_fast32_t uint32; +typedef int_fast32_t int32; +typedef uint_fast64_t uint64; +typedef int_fast64_t int64; #define LIT64( a ) a##LL