Message ID | CAHso6sOn=SgKoaG1pwXHRLTXtvqpGY_RoKSjLfFaYTm313GvVQ@mail.gmail.com |
---|---|
State | New |
Headers | show |
Series | RISC-V: Use convert instructions instead of calling library functions | expand |
On 3/18/24 3:09 AM, Jivan Hakobyan wrote: > As RV has round instructions it is reasonable to use them instead of > calling the library functions. > > With my patch for the following C code: > double foo(double a) { > return ceil(a); > } > > GCC generates the following ASM code (before it was tail call) > foo: > fabs.d fa4,fa0 > lui a5,%hi(.LC0) > fld fa3,%lo(.LC0)(a5) > flt.d a5,fa4,fa3 > beq a5,zero,.L3 > fcvt.l.d a5,fa0,rup > fcvt.d.l fa4,a5 > fsgnj.d fa0,fa4,fa0 > .L3: > ret > > .LC0: > .word 0 > .word 1127219200 // 0x4330000000000000 > > > The patch I have evaluated on SPEC2017. > Counted dynamic instructions counts and got the following improvements > > 510.parest_r 262 m - > 511.povray_r 2.1 b 0.04% > 521.wrt_r 269 m - > 526.blender_r 3 b 0.1% > 527.cam4_r 15 b 0.6% > 538.imagick_r 365 b 7.6% > > Overall executed 385 billion fewer instructions which is 0.5%. A few more notes. The sequence Jivan is using is derived from LLVM. The condition in the generated code tests for those values were are supposed to pass through unaltered. The condition in the pattern ensures we do something sensible WRT FE_INEXACT and mirrors how other ports handle these insns. Our internal testing shows a benefit well beyond the 7% reduction in icounts. Presumably due to fewer calls, fewer transfers across the register files, better scheduling around the call site, etc. Obviously for Zfa we'll use the more efficient instructions for that extension. But there's no reason to not go forward with this change for gcc-15. Jeff
On Mon, 18 Mar 2024 20:50:14 PDT (-0700), jeffreyalaw@gmail.com wrote: > > > On 3/18/24 3:09 AM, Jivan Hakobyan wrote: >> As RV has round instructions it is reasonable to use them instead of >> calling the library functions. >> >> With my patch for the following C code: >> double foo(double a) { >> return ceil(a); >> } >> >> GCC generates the following ASM code (before it was tail call) >> foo: >> fabs.d fa4,fa0 >> lui a5,%hi(.LC0) >> fld fa3,%lo(.LC0)(a5) >> flt.d a5,fa4,fa3 >> beq a5,zero,.L3 >> fcvt.l.d a5,fa0,rup I'm not sure exactly what context this is in, but my reading of "according to the current rounding mode" means we'd usually use the dynamic rounding mode. >> fcvt.d.l fa4,a5 >> fsgnj.d fa0,fa4,fa0 >> .L3: >> ret >> >> .LC0: >> .word 0 >> .word 1127219200 // 0x4330000000000000 >> >> >> The patch I have evaluated on SPEC2017. >> Counted dynamic instructions counts and got the following improvements >> >> 510.parest_r 262 m - >> 511.povray_r 2.1 b 0.04% >> 521.wrt_r 269 m - >> 526.blender_r 3 b 0.1% >> 527.cam4_r 15 b 0.6% >> 538.imagick_r 365 b 7.6% >> >> Overall executed 385 billion fewer instructions which is 0.5%. > A few more notes. > > The sequence Jivan is using is derived from LLVM. The condition in the > generated code tests for those values were are supposed to pass through > unaltered. The condition in the pattern ensures we do something > sensible WRT FE_INEXACT and mirrors how other ports handle these insns. My only worry here is that when we were doing the other patterns we decided not to do rint. I can't remember exactly why, but from reading the docs we probably just skipped it due to the inexact handling and Zfa having an instruction that just does this. FP stuff is always a bit of a time sink, so at that point it probably just fell off the priority list. I'm not really an FP guy, so I usually just poke around what the other ports generate and try to figure out what's going on. arm64 has the Zfa instruction and x86 FP is complicated, so I'm not sure exactly who else to look at for this sort of stuff. From just looking at the code, though, I think there's two issues -- I'm not really an FP person, though, so take this with a grain of salt: IIUC what we've got here doesn't actually set the inexact flag for the bounds check failure case, as we're just loading up an exact constant and doing the bounds check. We're also not clamping to INT_MAX-type values, but not sure if we're supposed to. I think we could fix both of those by adjusting the expansion to something like fabs.d fa4,fa0 lui a5,%hi(.LC0) fld fa3,%lo(.LC0)(a5) flt.d a5,fa4,fa3 bne a5,zero,.L3 mv fa0, fa3 .L3: fcvt.l.d a5,fa0,rup fcvt.d.l fa4,a5 fsgnj.d fa0,fa4,fa0 ret and then adjusting the constant to be an epsilon larger than INT_MAX so it'll still trigger the clamping but also inexact. There's also a pair of changes to the ISA in 2020 that added the conversion inexact handling requirement, it was a grey area before. I don't remember exactly what happened there, but I did remember it happening. I don't think anyone cares all that much about the performance of systems that target the older ISAs, so maybe we just restrict the non-libcall expansion to ISAs that contain the new wording? commit 5890a1a702abf4157d5879717a39d8ecdae0de68 (tag: draft-20201229-5890a1a) Author: Andrew Waterman <andrew@sifive.com> Date: Mon Dec 28 18:19:44 2020 -0800 Clarify when FP conversions raise the Inexact flag diff --git a/src/f.tex b/src/f.tex index 8649d75..97c253b 100644 --- a/src/f.tex +++ b/src/f.tex @@ -594,9 +594,9 @@ instructions round according to the {\em rm} field. A floating-point register can be initialized to floating-point positive zero using FCVT.S.W {\em rd}, {\tt x0}, which will never set any exception flags. -All floating-point conversion instructions set the Inexact exception flag if the -result differs from its operand value, yet is representable in the destination -format. +All floating-point conversion instructions set the Inexact exception flag if +the rounded result differs from the operand value and the Invalid exception +flag is not set. \vspace{-0.2in} \begin{center} commit 27b40fbc798357fcb4b1deaba4553646fe677576 (tag: draft-20200229-27b40fb) Author: Andrew Waterman <andrew@sifive.com> Date: Fri Feb 28 18:12:47 2020 -0800 Clarify that FCVT instructions signal inexact diff --git a/src/f.tex b/src/f.tex index 7680347..a9022c4 100644 --- a/src/f.tex +++ b/src/f.tex @@ -582,6 +582,10 @@ instructions round according to the {\em rm} field. A floating-point register can be initialized to floating-point positive zero using FCVT.S.W {\em rd}, {\tt x0}, which will never set any exception flags. +All floating-point conversion instructions raise the Inexact exception if the +result differs from its operand value, yet is representable in the destination +format. + \vspace{-0.2in} \begin{center} \begin{tabular}{R@{}F@{}R@{}R@{}F@{}R@{}O} > Our internal testing shows a benefit well beyond the 7% reduction in > icounts. Presumably due to fewer calls, fewer transfers across the > register files, better scheduling around the call site, etc. > > Obviously for Zfa we'll use the more efficient instructions for that > extension. But there's no reason to not go forward with this change for > gcc-15. Ya, I think those are all pretty minor issues (If the exist at all). So it seems like the right way to go in general. > > Jeff
On Tue, Mar 19, 2024 at 9:23 AM Palmer Dabbelt <palmer@dabbelt.com> wrote: > On Mon, 18 Mar 2024 20:50:14 PDT (-0700), jeffreyalaw@gmail.com wrote: > > > > > > On 3/18/24 3:09 AM, Jivan Hakobyan wrote: > >> As RV has round instructions it is reasonable to use them instead of > >> calling the library functions. > >> > >> With my patch for the following C code: > >> double foo(double a) { > >> return ceil(a); > >> } > >> > >> GCC generates the following ASM code (before it was tail call) > >> foo: > >> fabs.d fa4,fa0 > >> lui a5,%hi(.LC0) > >> fld fa3,%lo(.LC0)(a5) > >> flt.d a5,fa4,fa3 > >> beq a5,zero,.L3 > >> fcvt.l.d a5,fa0,rup > > I'm not sure exactly what context this is in, but my reading of > "according to the current rounding mode" means we'd usually use the > dynamic rounding mode. > ceil doesn't depend on the current rounding mode; rup is correct. For rint, you'd be correct. > > >> fcvt.d.l fa4,a5 > >> fsgnj.d fa0,fa4,fa0 > >> .L3: > >> ret > >> > >> .LC0: > >> .word 0 > >> .word 1127219200 // 0x4330000000000000 > >> > >> > >> The patch I have evaluated on SPEC2017. > >> Counted dynamic instructions counts and got the following improvements > >> > >> 510.parest_r 262 m - > >> 511.povray_r 2.1 b 0.04% > >> 521.wrt_r 269 m - > >> 526.blender_r 3 b 0.1% > >> 527.cam4_r 15 b 0.6% > >> 538.imagick_r 365 b 7.6% > >> > >> Overall executed 385 billion fewer instructions which is 0.5%. > > A few more notes. > > > > The sequence Jivan is using is derived from LLVM. The condition in the > > generated code tests for those values were are supposed to pass through > > unaltered. The condition in the pattern ensures we do something > > sensible WRT FE_INEXACT and mirrors how other ports handle these insns. > > My only worry here is that when we were doing the other patterns we > decided not to do rint. I can't remember exactly why, but from reading > the docs we probably just skipped it due to the inexact handling and Zfa > having an instruction that just does this. FP stuff is always a bit of > a time sink, so at that point it probably just fell off the priority > list. > > I'm not really an FP guy, so I usually just poke around what the other > ports generate and try to figure out what's going on. arm64 has the Zfa > instruction and x86 FP is complicated, so I'm not sure exactly who else > to look at for this sort of stuff. From just looking at the code, > though, I think there's two issues -- I'm not really an FP person, > though, so take this with a grain of salt: > > IIUC what we've got here doesn't actually set the inexact flag for the > bounds check failure case I think the original code was correct. If you exceed the bounds, then by construction you know that the input was already an integer, and so not setting NX is correct. If you didn't exceed the bounds, then fcvt.l.d will set NX when appropriate. , as we're just loading up an exact constant > and doing the bounds check. We're also not clamping to INT_MAX-type > values, but not sure if we're supposed to. I think we could fix both of > those by adjusting the expansion to something like > > fabs.d fa4,fa0 > lui a5,%hi(.LC0) > fld fa3,%lo(.LC0)(a5) > flt.d a5,fa4,fa3 > bne a5,zero,.L3 > mv fa0, fa3 > .L3: > fcvt.l.d a5,fa0,rup > fcvt.d.l fa4,a5 > fsgnj.d fa0,fa4,fa0 > ret > > and then adjusting the constant to be an epsilon larger than INT_MAX so > it'll still trigger the clamping but also inexact. > ceil/rint/etc. are supposed to work for the whole range of their floating-point type; the range of the integral types isn't supposed to affect the result. The original code was correct in this regard, too. > > There's also a pair of changes to the ISA in 2020 that added the > conversion inexact handling requirement, it was a grey area before. I > don't remember exactly what happened there, but I did remember it > happening. I don't think anyone cares all that much about the > performance of systems that target the older ISAs, so maybe we just > restrict the non-libcall expansion to ISAs that contain the new wording? > > commit 5890a1a702abf4157d5879717a39d8ecdae0de68 (tag: > draft-20201229-5890a1a) > Author: Andrew Waterman <andrew@sifive.com> > Date: Mon Dec 28 18:19:44 2020 -0800 > > Clarify when FP conversions raise the Inexact flag > > diff --git a/src/f.tex b/src/f.tex > index 8649d75..97c253b 100644 > --- a/src/f.tex > +++ b/src/f.tex > @@ -594,9 +594,9 @@ instructions round according to the {\em rm} > field. A floating-point register > can be initialized to floating-point positive zero using FCVT.S.W > {\em rd}, > {\tt x0}, which will never set any exception flags. > > -All floating-point conversion instructions set the Inexact exception > flag if the > -result differs from its operand value, yet is representable in the > destination > -format. > +All floating-point conversion instructions set the Inexact exception > flag if > +the rounded result differs from the operand value and the Invalid > exception > +flag is not set. > > \vspace{-0.2in} > \begin{center} > > commit 27b40fbc798357fcb4b1deaba4553646fe677576 (tag: > draft-20200229-27b40fb) > Author: Andrew Waterman <andrew@sifive.com> > Date: Fri Feb 28 18:12:47 2020 -0800 > > Clarify that FCVT instructions signal inexact > > diff --git a/src/f.tex b/src/f.tex > index 7680347..a9022c4 100644 > --- a/src/f.tex > +++ b/src/f.tex > @@ -582,6 +582,10 @@ instructions round according to the {\em rm} > field. A floating-point register > can be initialized to floating-point positive zero using FCVT.S.W > {\em rd}, > {\tt x0}, which will never set any exception flags. > > +All floating-point conversion instructions raise the Inexact > exception if the > +result differs from its operand value, yet is representable in the > destination > +format. > + > \vspace{-0.2in} > \begin{center} > \begin{tabular}{R@{}F@{}R@{}R@{}F@{}R@{}O} > > > Our internal testing shows a benefit well beyond the 7% reduction in > > icounts. Presumably due to fewer calls, fewer transfers across the > > register files, better scheduling around the call site, etc. > > > > Obviously for Zfa we'll use the more efficient instructions for that > > extension. But there's no reason to not go forward with this change for > > gcc-15. > > Ya, I think those are all pretty minor issues (If the exist at all). > So it seems like the right way to go in general. > > > > > Jeff >
On Tue, 19 Mar 2024 12:58:41 PDT (-0700), Andrew Waterman wrote: > On Tue, Mar 19, 2024 at 9:23 AM Palmer Dabbelt <palmer@dabbelt.com> wrote: > >> On Mon, 18 Mar 2024 20:50:14 PDT (-0700), jeffreyalaw@gmail.com wrote: >> > >> > >> > On 3/18/24 3:09 AM, Jivan Hakobyan wrote: >> >> As RV has round instructions it is reasonable to use them instead of >> >> calling the library functions. >> >> >> >> With my patch for the following C code: >> >> double foo(double a) { >> >> return ceil(a); >> >> } >> >> >> >> GCC generates the following ASM code (before it was tail call) >> >> foo: >> >> fabs.d fa4,fa0 >> >> lui a5,%hi(.LC0) >> >> fld fa3,%lo(.LC0)(a5) >> >> flt.d a5,fa4,fa3 >> >> beq a5,zero,.L3 >> >> fcvt.l.d a5,fa0,rup >> >> I'm not sure exactly what context this is in, but my reading of >> "according to the current rounding mode" means we'd usually use the >> dynamic rounding mode. >> > > ceil doesn't depend on the current rounding mode; rup is correct. For > rint, you'd be correct. Ya, right, that's pretty obvious -- I must have just been falling asleep this morning. >> >> fcvt.d.l fa4,a5 >> >> fsgnj.d fa0,fa4,fa0 >> >> .L3: >> >> ret >> >> >> >> .LC0: >> >> .word 0 >> >> .word 1127219200 // 0x4330000000000000 >> >> >> >> >> >> The patch I have evaluated on SPEC2017. >> >> Counted dynamic instructions counts and got the following improvements >> >> >> >> 510.parest_r 262 m - >> >> 511.povray_r 2.1 b 0.04% >> >> 521.wrt_r 269 m - >> >> 526.blender_r 3 b 0.1% >> >> 527.cam4_r 15 b 0.6% >> >> 538.imagick_r 365 b 7.6% >> >> >> >> Overall executed 385 billion fewer instructions which is 0.5%. >> > A few more notes. >> > >> > The sequence Jivan is using is derived from LLVM. The condition in the >> > generated code tests for those values were are supposed to pass through >> > unaltered. The condition in the pattern ensures we do something >> > sensible WRT FE_INEXACT and mirrors how other ports handle these insns. >> >> My only worry here is that when we were doing the other patterns we >> decided not to do rint. I can't remember exactly why, but from reading >> the docs we probably just skipped it due to the inexact handling and Zfa >> having an instruction that just does this. FP stuff is always a bit of >> a time sink, so at that point it probably just fell off the priority >> list. >> >> I'm not really an FP guy, so I usually just poke around what the other >> ports generate and try to figure out what's going on. arm64 has the Zfa >> instruction and x86 FP is complicated, so I'm not sure exactly who else >> to look at for this sort of stuff. From just looking at the code, >> though, I think there's two issues -- I'm not really an FP person, >> though, so take this with a grain of salt: >> >> IIUC what we've got here doesn't actually set the inexact flag for the >> bounds check failure case > > > I think the original code was correct. If you exceed the bounds, then by > construction you know that the input was already an integer, and so not > setting NX is correct. If you didn't exceed the bounds, then fcvt.l.d will > set NX when appropriate. > > , as we're just loading up an exact constant >> and doing the bounds check. We're also not clamping to INT_MAX-type >> values, but not sure if we're supposed to. I think we could fix both of >> those by adjusting the expansion to something like >> >> fabs.d fa4,fa0 >> lui a5,%hi(.LC0) >> fld fa3,%lo(.LC0)(a5) >> flt.d a5,fa4,fa3 >> bne a5,zero,.L3 >> mv fa0, fa3 >> .L3: >> fcvt.l.d a5,fa0,rup >> fcvt.d.l fa4,a5 >> fsgnj.d fa0,fa4,fa0 >> ret >> >> and then adjusting the constant to be an epsilon larger than INT_MAX so >> it'll still trigger the clamping but also inexact. >> > > ceil/rint/etc. are supposed to work for the whole range of their > floating-point type; the range of the integral types isn't supposed to > affect the result. The original code was correct in this regard, too. Ya, sorry, I think I just misunderstood what was going on here -- it's not INT_MAX, it's just the largest FP values for which mantissas might produce non-integral values. >> There's also a pair of changes to the ISA in 2020 that added the >> conversion inexact handling requirement, it was a grey area before. I >> don't remember exactly what happened there, but I did remember it >> happening. I don't think anyone cares all that much about the >> performance of systems that target the older ISAs, so maybe we just >> restrict the non-libcall expansion to ISAs that contain the new wording? >> >> commit 5890a1a702abf4157d5879717a39d8ecdae0de68 (tag: >> draft-20201229-5890a1a) >> Author: Andrew Waterman <andrew@sifive.com> >> Date: Mon Dec 28 18:19:44 2020 -0800 >> >> Clarify when FP conversions raise the Inexact flag >> >> diff --git a/src/f.tex b/src/f.tex >> index 8649d75..97c253b 100644 >> --- a/src/f.tex >> +++ b/src/f.tex >> @@ -594,9 +594,9 @@ instructions round according to the {\em rm} >> field. A floating-point register >> can be initialized to floating-point positive zero using FCVT.S.W >> {\em rd}, >> {\tt x0}, which will never set any exception flags. >> >> -All floating-point conversion instructions set the Inexact exception >> flag if the >> -result differs from its operand value, yet is representable in the >> destination >> -format. >> +All floating-point conversion instructions set the Inexact exception >> flag if >> +the rounded result differs from the operand value and the Invalid >> exception >> +flag is not set. >> >> \vspace{-0.2in} >> \begin{center} >> >> commit 27b40fbc798357fcb4b1deaba4553646fe677576 (tag: >> draft-20200229-27b40fb) >> Author: Andrew Waterman <andrew@sifive.com> >> Date: Fri Feb 28 18:12:47 2020 -0800 >> >> Clarify that FCVT instructions signal inexact >> >> diff --git a/src/f.tex b/src/f.tex >> index 7680347..a9022c4 100644 >> --- a/src/f.tex >> +++ b/src/f.tex >> @@ -582,6 +582,10 @@ instructions round according to the {\em rm} >> field. A floating-point register >> can be initialized to floating-point positive zero using FCVT.S.W >> {\em rd}, >> {\tt x0}, which will never set any exception flags. >> >> +All floating-point conversion instructions raise the Inexact >> exception if the >> +result differs from its operand value, yet is representable in the >> destination >> +format. >> + >> \vspace{-0.2in} >> \begin{center} >> \begin{tabular}{R@{}F@{}R@{}R@{}F@{}R@{}O} >> >> > Our internal testing shows a benefit well beyond the 7% reduction in >> > icounts. Presumably due to fewer calls, fewer transfers across the >> > register files, better scheduling around the call site, etc. >> > >> > Obviously for Zfa we'll use the more efficient instructions for that >> > extension. But there's no reason to not go forward with this change for >> > gcc-15. >> >> Ya, I think those are all pretty minor issues (If the exist at all). >> So it seems like the right way to go in general. >> >> > >> > Jeff >>
On 3/19/24 10:23 AM, Palmer Dabbelt wrote: > On Mon, 18 Mar 2024 20:50:14 PDT (-0700), jeffreyalaw@gmail.com wrote: >> >> >> On 3/18/24 3:09 AM, Jivan Hakobyan wrote: >>> As RV has round instructions it is reasonable to use them instead of >>> calling the library functions. >>> >>> With my patch for the following C code: >>> double foo(double a) { >>> return ceil(a); >>> } >>> >>> GCC generates the following ASM code (before it was tail call) >>> foo: >>> fabs.d fa4,fa0 >>> lui a5,%hi(.LC0) >>> fld fa3,%lo(.LC0)(a5) >>> flt.d a5,fa4,fa3 >>> beq a5,zero,.L3 >>> fcvt.l.d a5,fa0,rup > > I'm not sure exactly what context this is in, but my reading of > "according to the current rounding mode" means we'd usually use the > dynamic rounding mode. As Andrew W. noted, we're dealing with ceil and thus rup is the correct rounding mode to use here. > > My only worry here is that when we were doing the other patterns we > decided not to do rint. I can't remember exactly why, but from reading > the docs we probably just skipped it due to the inexact handling and Zfa > having an instruction that just does this. FP stuff is always a bit of > a time sink, so at that point it probably just fell off the priority list. rint is supposed to raise FE_INEXACT, so it's actually a good match for RISC-V fcvt semantics as they appropriately raise FE_INEXACT. nearby* do not arise FE_INEXACT and thus would rely on the new Zfa instructions where we have ones that do not raise FE_INEXACT or they need to be conditional on flag_fp_int_builtin_inexact. One could reasonably argue that when flag_fp_int_builtin_inexact is enabled that a call to nearby* ought to be converted into a call to rint*. > > I'm not really an FP guy, so I usually just poke around what the other > ports generate and try to figure out what's going on. arm64 has the Zfa > instruction and x86 FP is complicated, so I'm not sure exactly who else > to look at for this sort of stuff. From just looking at the code, > though, I think there's two issues -- I'm not really an FP person, > though, so take this with a grain of salt: Right. And the condition under which we use the new sequence for ceil/round actually borrows from x86. Essentially we only use the new sequence when we've been told we don't care about FE_INEXACT or fp exceptions in general. > > IIUC what we've got here doesn't actually set the inexact flag for the > bounds check failure case, as we're just loading up an exact constant > and doing the bounds check. We're also not clamping to INT_MAX-type > values, but not sure if we're supposed to. I think we could fix both of > those by adjusting the expansion to something like The state of FE_INEXACT is a don't care here due the condition on the expansion code. > > fabs.d fa4,fa0 > lui a5,%hi(.LC0) > fld fa3,%lo(.LC0)(a5) > flt.d a5,fa4,fa3 > bne a5,zero,.L3 > mv fa0, fa3 > .L3: > fcvt.l.d a5,fa0,rup > fcvt.d.l fa4,a5 > fsgnj.d fa0,fa4,fa0 > ret > > and then adjusting the constant to be an epsilon larger than INT_MAX so > it'll still trigger the clamping but also inexact. I think Jivan's sequence is more correct. It's not just INT_MAX here that's concerning, there's a whole class of values that cause problems. > > There's also a pair of changes to the ISA in 2020 that added the > conversion inexact handling requirement, it was a grey area before. I > don't remember exactly what happened there, but I did remember it > happening. I don't think anyone cares all that much about the > performance of systems that target the older ISAs, so maybe we just > restrict the non-libcall expansion to ISAs that contain the new wording? I think all this got sufficiently cleaned up. The spec is explicit about when FE_INEXACT gets raised on the fcvt instructions. I referred to it repeatedly when analyzing Jivan's work. We can hash through the final items in a few weeks once the trunk re-opens for development. Jeff
On Wed, 20 Mar 2024 11:54:34 PDT (-0700), Jeff Law wrote: > > > On 3/19/24 10:23 AM, Palmer Dabbelt wrote: >> On Mon, 18 Mar 2024 20:50:14 PDT (-0700), jeffreyalaw@gmail.com wrote: >>> >>> >>> On 3/18/24 3:09 AM, Jivan Hakobyan wrote: >>>> As RV has round instructions it is reasonable to use them instead of >>>> calling the library functions. >>>> >>>> With my patch for the following C code: >>>> double foo(double a) { >>>> return ceil(a); >>>> } >>>> >>>> GCC generates the following ASM code (before it was tail call) >>>> foo: >>>> fabs.d fa4,fa0 >>>> lui a5,%hi(.LC0) >>>> fld fa3,%lo(.LC0)(a5) >>>> flt.d a5,fa4,fa3 >>>> beq a5,zero,.L3 >>>> fcvt.l.d a5,fa0,rup >> >> I'm not sure exactly what context this is in, but my reading of >> "according to the current rounding mode" means we'd usually use the >> dynamic rounding mode. > As Andrew W. noted, we're dealing with ceil and thus rup is the correct > rounding mode to use here. > > > >> >> My only worry here is that when we were doing the other patterns we >> decided not to do rint. I can't remember exactly why, but from reading >> the docs we probably just skipped it due to the inexact handling and Zfa >> having an instruction that just does this. FP stuff is always a bit of >> a time sink, so at that point it probably just fell off the priority list. > rint is supposed to raise FE_INEXACT, so it's actually a good match for > RISC-V fcvt semantics as they appropriately raise FE_INEXACT. > > nearby* do not arise FE_INEXACT and thus would rely on the new Zfa > instructions where we have ones that do not raise FE_INEXACT or they > need to be conditional on flag_fp_int_builtin_inexact. One could > reasonably argue that when flag_fp_int_builtin_inexact is enabled that a > call to nearby* ought to be converted into a call to rint*. > > >> >> I'm not really an FP guy, so I usually just poke around what the other >> ports generate and try to figure out what's going on. arm64 has the Zfa >> instruction and x86 FP is complicated, so I'm not sure exactly who else >> to look at for this sort of stuff. From just looking at the code, >> though, I think there's two issues -- I'm not really an FP person, >> though, so take this with a grain of salt: > Right. And the condition under which we use the new sequence for > ceil/round actually borrows from x86. Essentially we only use the new > sequence when we've been told we don't care about FE_INEXACT or fp > exceptions in general. > >> >> IIUC what we've got here doesn't actually set the inexact flag for the >> bounds check failure case, as we're just loading up an exact constant >> and doing the bounds check. We're also not clamping to INT_MAX-type >> values, but not sure if we're supposed to. I think we could fix both of >> those by adjusting the expansion to something like > The state of FE_INEXACT is a don't care here due the condition on the > expansion code. > > >> >> fabs.d fa4,fa0 >> lui a5,%hi(.LC0) >> fld fa3,%lo(.LC0)(a5) >> flt.d a5,fa4,fa3 >> bne a5,zero,.L3 >> mv fa0, fa3 >> .L3: >> fcvt.l.d a5,fa0,rup >> fcvt.d.l fa4,a5 >> fsgnj.d fa0,fa4,fa0 >> ret >> >> and then adjusting the constant to be an epsilon larger than INT_MAX so >> it'll still trigger the clamping but also inexact. > I think Jivan's sequence is more correct. It's not just INT_MAX here > that's concerning, there's a whole class of values that cause problems. Ya, sorry, I thought I'd replied to Andrew's email somewhere -- I'd just managed to confuse myself about how the FP stuff works, I also think Jivan's code is correct now. >> There's also a pair of changes to the ISA in 2020 that added the >> conversion inexact handling requirement, it was a grey area before. I >> don't remember exactly what happened there, but I did remember it >> happening. I don't think anyone cares all that much about the >> performance of systems that target the older ISAs, so maybe we just >> restrict the non-libcall expansion to ISAs that contain the new wording? > I think all this got sufficiently cleaned up. The spec is explicit > about when FE_INEXACT gets raised on the fcvt instructions. I referred > to it repeatedly when analyzing Jivan's work. We still have support for stuf like -misa-spec=2.2 (and some 2019 releases with clunky version numbers). Those all predate the convert/inexact wording that got added. Though if FE_INEXACT is a don't care here, then I think it doesn't matter if the wording got changed. In that case I think this is fine, so Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> > We can hash through the final items in a few weeks once the trunk > re-opens for development. I think those were all the isuses on my end ;) > > > Jeff
diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md index a7694137685aee97ca249c0e720afdfc62ec33c9..75e119e407a36c273eaa6e5ffab24be42af7a8d7 100644 --- a/gcc/config/riscv/iterators.md +++ b/gcc/config/riscv/iterators.md @@ -196,6 +196,13 @@ (define_code_iterator bitmanip_rotate [rotate rotatert]) +;; These code iterators allow the signed and unsigned fix operations to use +;; the same template. +(define_code_iterator fix_ops [fix unsigned_fix]) + +(define_code_attr fix_uns [(fix "fix") (unsigned_fix "fixuns")]) + + ;; ------------------------------------------------------------------- ;; Code Attributes ;; ------------------------------------------------------------------- @@ -312,11 +319,6 @@ ;; Int Iterators. ;; ------------------------------------------------------------------- -;; Iterator and attributes for floating-point rounding instructions. -(define_int_iterator RINT [UNSPEC_LRINT UNSPEC_LROUND]) -(define_int_attr rint_pattern [(UNSPEC_LRINT "rint") (UNSPEC_LROUND "round")]) -(define_int_attr rint_rm [(UNSPEC_LRINT "dyn") (UNSPEC_LROUND "rmm")]) - ;; Iterator and attributes for quiet comparisons. (define_int_iterator QUIET_COMPARISON [UNSPEC_FLT_QUIET UNSPEC_FLE_QUIET]) (define_int_attr quiet_pattern [(UNSPEC_FLT_QUIET "lt") (UNSPEC_FLE_QUIET "le")]) diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index b87355938052a3a0ca9107774bb3a683c85b74d9..03486f4c4e3dab733e48a702c4ffbb5865f1884d 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -709,6 +709,7 @@ bool gather_scatter_valid_offset_p (machine_mode); HOST_WIDE_INT estimated_poly_value (poly_int64, unsigned int); bool whole_reg_to_reg_move_p (rtx *, machine_mode, int); bool splat_to_scalar_move_p (rtx *); +rtx get_fp_rounding_coefficient (machine_mode); } /* We classify builtin types into two classes: diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 967f4e382875dfeee7d5de5ab05a2b537766844e..95a233dea44168dec2a28c9ffff4a46004a40e18 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -4494,7 +4494,7 @@ vls_mode_valid_p (machine_mode vls_mode) All double floating point will be unchanged for ceil if it is greater than and equal to 4503599627370496. */ -static rtx +rtx get_fp_rounding_coefficient (machine_mode inner_mode) { REAL_VALUE_TYPE real; diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index b16ed97909c04456ce4fe5234a82c5597549b67d..d4eb440d7baeb3d71c7e58291ce4136da6852246 100644 --- a/gcc/config/riscv/riscv.md +++ b/gcc/config/riscv/riscv.md @@ -64,7 +64,6 @@ UNSPEC_ROUNDEVEN UNSPEC_NEARBYINT UNSPEC_LRINT - UNSPEC_LROUND UNSPEC_FMIN UNSPEC_FMAX UNSPEC_FMINM @@ -1967,21 +1966,48 @@ ;; ;; .................... -(define_insn "fix_trunc<ANYF:mode><GPR:mode>2" - [(set (match_operand:GPR 0 "register_operand" "=r") - (fix:GPR +(define_expand "<fix_uns>_trunc<ANYF:mode>si2" + [(set (match_operand:SI 0 "register_operand" "=r") + (fix_ops:SI (match_operand:ANYF 1 "register_operand" " f")))] "TARGET_HARD_FLOAT || TARGET_ZFINX" - "fcvt.<GPR:ifmt>.<ANYF:fmt> %0,%1,rtz" +{ + if (TARGET_64BIT) + { + rtx t = gen_reg_rtx (DImode); + emit_insn (gen_<fix_uns>_trunc<ANYF:mode>si2_sext (t, operands[1])); + t = gen_lowpart (SImode, t); + SUBREG_PROMOTED_VAR_P (t) = 1; + SUBREG_PROMOTED_SET (t, SRP_SIGNED); + emit_move_insn (operands[0], t); + DONE; + } +}) + +(define_insn "*<fix_uns>_trunc<ANYF:mode>si2" + [(set (match_operand:SI 0 "register_operand" "=r") + (fix_ops:SI + (match_operand:ANYF 1 "register_operand" " f")))] + "TARGET_HARD_FLOAT || TARGET_ZFINX" + "fcvt.w<u>.<ANYF:fmt> %0,%1,rtz" + [(set_attr "type" "fcvt_f2i") + (set_attr "mode" "<ANYF:MODE>")]) + +(define_insn "<fix_uns>_trunc<ANYF:mode>si2_sext" + [(set (match_operand:DI 0 "register_operand" "=r") + (sign_extend:DI (fix_ops:SI + (match_operand:ANYF 1 "register_operand" " f"))))] + "TARGET_64BIT && (TARGET_HARD_FLOAT || TARGET_ZFINX)" + "fcvt.w<u>.<ANYF:fmt> %0,%1,rtz" [(set_attr "type" "fcvt_f2i") (set_attr "mode" "<ANYF:MODE>")]) -(define_insn "fixuns_trunc<ANYF:mode><GPR:mode>2" - [(set (match_operand:GPR 0 "register_operand" "=r") - (unsigned_fix:GPR +(define_insn "<fix_uns>_trunc<ANYF:mode>di2" + [(set (match_operand:DI 0 "register_operand" "=r") + (fix_ops:DI (match_operand:ANYF 1 "register_operand" " f")))] - "TARGET_HARD_FLOAT || TARGET_ZFINX" - "fcvt.<GPR:ifmt>u.<ANYF:fmt> %0,%1,rtz" + "TARGET_64BIT && (TARGET_HARD_FLOAT || TARGET_ZFINX)" + "fcvt.l<u>.<ANYF:fmt> %0,%1,rtz" [(set_attr "type" "fcvt_f2i") (set_attr "mode" "<ANYF:MODE>")]) @@ -2003,17 +2029,170 @@ [(set_attr "type" "fcvt_i2f") (set_attr "mode" "<ANYF:MODE>")]) -(define_insn "l<rint_pattern><ANYF:mode><GPR:mode>2" - [(set (match_operand:GPR 0 "register_operand" "=r") - (unspec:GPR +(define_expand "lrint<ANYF:mode>si2" + [(set (match_operand:SI 0 "register_operand" "=r") + (unspec:SI [(match_operand:ANYF 1 "register_operand" " f")] - RINT))] + UNSPEC_LRINT))] "TARGET_HARD_FLOAT || TARGET_ZFINX" - "fcvt.<GPR:ifmt>.<ANYF:fmt> %0,%1,<rint_rm>" +{ + if (TARGET_64BIT) + { + rtx t = gen_reg_rtx (DImode); + emit_insn (gen_lrint<ANYF:mode>si2_sext (t, operands[1])); + t = gen_lowpart (SImode, t); + SUBREG_PROMOTED_VAR_P (t) = 1; + SUBREG_PROMOTED_SET (t, SRP_SIGNED); + emit_move_insn (operands[0], t); + DONE; + } +}) + +(define_insn "*lrint<ANYF:mode>si2" + [(set (match_operand:SI 0 "register_operand" "=r") + (unspec:SI + [(match_operand:ANYF 1 "register_operand" " f")] + UNSPEC_LRINT))] + "TARGET_HARD_FLOAT || TARGET_ZFINX" + "fcvt.w.<ANYF:fmt> %0,%1,dyn" [(set_attr "type" "fcvt_f2i") (set_attr "mode" "<ANYF:MODE>")]) -(define_insn "<round_pattern><ANYF:mode>2" +(define_insn "lrint<ANYF:mode>si2_sext" + [(set (match_operand:DI 0 "register_operand" "=r") + (sign_extend:DI (unspec:SI + [(match_operand:ANYF 1 "register_operand" " f")] + UNSPEC_LRINT)))] + "TARGET_64BIT && (TARGET_HARD_FLOAT || TARGET_ZFINX)" + "fcvt.w.<ANYF:fmt> %0,%1,dyn" + [(set_attr "type" "fcvt_f2i") + (set_attr "mode" "<ANYF:MODE>")]) + +(define_insn "lrint<ANYF:mode>di2" + [(set (match_operand:DI 0 "register_operand" "=r") + (unspec:DI + [(match_operand:ANYF 1 "register_operand" " f")] + UNSPEC_LRINT))] + "TARGET_64BIT && (TARGET_HARD_FLOAT || TARGET_ZFINX)" + "fcvt.l.<ANYF:fmt> %0,%1,dyn" + [(set_attr "type" "fcvt_f2i") + (set_attr "mode" "<ANYF:MODE>")]) + +(define_expand "l<round_pattern><ANYF:mode>si2" + [(set (match_operand:SI 0 "register_operand" "=r") + (unspec:SI + [(match_operand:ANYF 1 "register_operand" " f")] + ROUND))] + "TARGET_HARD_FLOAT || TARGET_ZFINX" +{ + if (TARGET_64BIT) + { + rtx t = gen_reg_rtx (DImode); + emit_insn (gen_l<round_pattern><ANYF:mode>si2_sext (t, operands[1])); + t = gen_lowpart (SImode, t); + SUBREG_PROMOTED_VAR_P (t) = 1; + SUBREG_PROMOTED_SET (t, SRP_SIGNED); + emit_move_insn (operands[0], t); + DONE; + } +}) + +(define_insn "*l<round_pattern><ANYF:mode>si2" + [(set (match_operand:SI 0 "register_operand" "=r") + (unspec:SI + [(match_operand:ANYF 1 "register_operand" " f")] + ROUND))] + "TARGET_HARD_FLOAT || TARGET_ZFINX" + "fcvt.w.<ANYF:fmt> %0,%1,<round_rm>" + [(set_attr "type" "fcvt_f2i") + (set_attr "mode" "<ANYF:MODE>")]) + +(define_insn "l<round_pattern><ANYF:mode>si2_sext" + [(set (match_operand:DI 0 "register_operand" "=r") + (sign_extend:DI (unspec:SI + [(match_operand:ANYF 1 "register_operand" " f")] + ROUND)))] + "TARGET_64BIT && (TARGET_HARD_FLOAT || TARGET_ZFINX)" + "fcvt.w.<ANYF:fmt> %0,%1,<round_rm>" + [(set_attr "type" "fcvt_f2i") + (set_attr "mode" "<ANYF:MODE>")]) + +(define_insn "l<round_pattern><ANYF:mode>di2" + [(set (match_operand:DI 0 "register_operand" "=r") + (unspec:DI + [(match_operand:ANYF 1 "register_operand" " f")] + ROUND))] + "TARGET_64BIT && (TARGET_HARD_FLOAT || TARGET_ZFINX)" + "fcvt.l.<ANYF:fmt> %0,%1,<round_rm>" + [(set_attr "type" "fcvt_f2i") + (set_attr "mode" "<ANYF:MODE>")]) + +(define_expand "<round_pattern><ANYF:mode>2" + [(set (match_operand:ANYF 0 "register_operand" "=f") + (unspec:ANYF + [(match_operand:ANYF 1 "register_operand" " f")] + ROUND))] + "TARGET_HARD_FLOAT && (TARGET_ZFA + || flag_fp_int_builtin_inexact || !flag_trapping_math)" +{ + if (TARGET_ZFA) + emit_insn (gen_<round_pattern><ANYF:mode>_zfa2 (operands[0], + operands[1])); + else + { + rtx reg; + rtx label = gen_label_rtx (); + rtx end_label = gen_label_rtx (); + rtx abs_reg = gen_reg_rtx (<ANYF:MODE>mode); + rtx coeff_reg = gen_reg_rtx (<ANYF:MODE>mode); + rtx tmp_reg = gen_reg_rtx (<ANYF:MODE>mode); + + riscv_emit_move (tmp_reg, operands[1]); + riscv_emit_move (coeff_reg, + riscv_vector::get_fp_rounding_coefficient (<ANYF:MODE>mode)); + emit_insn (gen_abs<ANYF:mode>2 (abs_reg, operands[1])); + + riscv_expand_conditional_branch (label, LT, abs_reg, coeff_reg); + + emit_jump_insn (gen_jump (end_label)); + emit_barrier (); + + emit_label (label); + switch (<ANYF:MODE>mode) + { + case SFmode: + reg = gen_reg_rtx (SImode); + emit_insn (gen_l<round_pattern>sfsi2 (reg, operands[1])); + emit_insn (gen_floatsisf2 (abs_reg, reg)); + break; + case DFmode: + if (TARGET_64BIT) + { + reg = gen_reg_rtx (DImode); + emit_insn (gen_l<round_pattern>dfdi2 (reg, operands[1])); + emit_insn (gen_floatdidf2 (abs_reg, reg)); + } + else + { + reg = gen_reg_rtx (SImode); + emit_insn (gen_l<round_pattern>dfsi2 (reg, operands[1])); + emit_insn (gen_floatsidf2 (abs_reg, reg)); + } + break; + default: + gcc_unreachable (); + } + + emit_insn (gen_copysign<ANYF:mode>3 (tmp_reg, abs_reg, operands[1])); + + emit_label (end_label); + riscv_emit_move (operands[0], tmp_reg); + } + + DONE; +}) + +(define_insn "<round_pattern><ANYF:mode>_zfa2" [(set (match_operand:ANYF 0 "register_operand" "=f") (unspec:ANYF [(match_operand:ANYF 1 "register_operand" " f")] diff --git a/gcc/testsuite/gcc.target/riscv/fix.c b/gcc/testsuite/gcc.target/riscv/fix.c new file mode 100644 index 0000000000000000000000000000000000000000..265a7da1fc5d49cfbe8f3459407c7fa787b747dd --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/fix.c @@ -0,0 +1,34 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gc -mabi=lp64d" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } } */ + +int +foo (double n) +{ + return n; +} + +int +foo_1 (float n) +{ + return n; +} + +unsigned int +foo_2 (double n) +{ + return n; +} + +unsigned int +foo_3 (float n) +{ + return n; +} + +/* { dg-final { scan-assembler-times {\mfcvt.w.d} 1 } } */ +/* { dg-final { scan-assembler-times {\mfcvt.w.s} 1 } } */ +/* { dg-final { scan-assembler-times {\mfcvt.wu.d} 1 } } */ +/* { dg-final { scan-assembler-times {\mfcvt.wu.s} 1 } } */ +/* { dg-final { scan-assembler-not "\\ssext.w\\s" } } */ + diff --git a/gcc/testsuite/gcc.target/riscv/round.c b/gcc/testsuite/gcc.target/riscv/round.c new file mode 100644 index 0000000000000000000000000000000000000000..decfc82a390e8f89bdc839cd00b0cb451030e4c8 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/round.c @@ -0,0 +1,144 @@ +#include <math.h> + +extern void abort (void); +extern void exit (int); + +#define NEQ(a, b) (fabs((a) - (b)) > 0.000001) + +#define DECL_FUNC(TYPE1, TYPE2, ROUND) \ + __attribute__ ((noinline, noclone)) TYPE2 \ + convert_##TYPE1##_to_##TYPE2##_##ROUND (TYPE1 N) \ + { \ + return ROUND (N); \ + } + +#define DECL_ALL_ROUNDS_FOR(ROUND_FUNC) \ + DECL_FUNC(float, float, ROUND_FUNC) \ + DECL_FUNC(double, double, ROUND_FUNC) \ + DECL_FUNC(double, int, ROUND_FUNC) \ + DECL_FUNC(double, long, ROUND_FUNC) \ + DECL_FUNC(float, int, ROUND_FUNC) \ + DECL_FUNC(float, long, ROUND_FUNC) + + +DECL_ALL_ROUNDS_FOR(round) +DECL_ALL_ROUNDS_FOR(ceil) +DECL_ALL_ROUNDS_FOR(floor) +DECL_ALL_ROUNDS_FOR(trunc) +DECL_ALL_ROUNDS_FOR(nearbyint) + +#define TEST_ROUND(TYPE1, TYPE2, N, N_R, ROUND) \ + if (NEQ (convert_##TYPE1##_to_##TYPE2##_##ROUND (N), N_R)) \ + abort (); + + +int main () { + + /* Round */ + TEST_ROUND(double, double, -4.8, -5.0, round); + TEST_ROUND(double, double, -4.2, -4.0, round); + TEST_ROUND(double, double, 4.8, 5.0, round); + TEST_ROUND(double, double, 4.2, 4.0, round); + + TEST_ROUND(double, int, -4.8, -5, round); + TEST_ROUND(double, int, -4.2, -4, round); + TEST_ROUND(double, int, 4.8, 5, round); + TEST_ROUND(double, int, 4.2, 4, round); + + TEST_ROUND(double, long, -4.8, -5, round); + TEST_ROUND(double, long, -4.2, -4, round); + TEST_ROUND(double, long, 4.8, 5, round); + TEST_ROUND(double, long, 4.2, 4, round); + + TEST_ROUND(float, long, -4.8, -5, round); + TEST_ROUND(float, long, -4.2, -4, round); + TEST_ROUND(float, long, 4.8, 5, round); + TEST_ROUND(float, long, 4.2, 4, round); + + /* Ceil */ + TEST_ROUND(double, double, -4.8, -4.0, ceil); + TEST_ROUND(double, double, -4.2, -4.0, ceil); + TEST_ROUND(double, double, 4.8, 5.0, ceil); + TEST_ROUND(double, double, 4.2, 5.0, ceil); + + TEST_ROUND(double, int, -4.8, -4, ceil); + TEST_ROUND(double, int, -4.2, -4, ceil); + TEST_ROUND(double, int, 4.8, 5, ceil); + TEST_ROUND(double, int, 4.2, 5, ceil); + + TEST_ROUND(double, long, -4.8, -4, ceil); + TEST_ROUND(double, long, -4.2, -4, ceil); + TEST_ROUND(double, long, 4.8, 5, ceil); + TEST_ROUND(double, long, 4.2, 5, ceil); + + TEST_ROUND(float, long, -4.8, -4, ceil); + TEST_ROUND(float, long, -4.2, -4, ceil); + TEST_ROUND(float, long, 4.8, 5, ceil); + TEST_ROUND(float, long, 4.2, 5, ceil); + + /* Floor */ + TEST_ROUND(double, double, -4.8, -5.0, floor); + TEST_ROUND(double, double, -4.2, -5.0, floor); + TEST_ROUND(double, double, 4.8, 4.0, floor); + TEST_ROUND(double, double, 4.2, 4.0, floor); + + TEST_ROUND(double, int, -4.8, -5, floor); + TEST_ROUND(double, int, -4.2, -5, floor); + TEST_ROUND(double, int, 4.8, 4, floor); + TEST_ROUND(double, int, 4.2, 4, floor); + + TEST_ROUND(double, long, -4.8, -5, floor); + TEST_ROUND(double, long, -4.2, -5, floor); + TEST_ROUND(double, long, 4.8, 4, floor); + TEST_ROUND(double, long, 4.2, 4, floor); + + TEST_ROUND(float, long, -4.8, -5, floor); + TEST_ROUND(float, long, -4.2, -5, floor); + TEST_ROUND(float, long, 4.8, 4, floor); + TEST_ROUND(float, long, 4.2, 4, floor); + + /* Trunc */ + TEST_ROUND(double, double, -4.8, -4.0, trunc); + TEST_ROUND(double, double, -4.2, -4.0, trunc); + TEST_ROUND(double, double, 4.8, 4.0, trunc); + TEST_ROUND(double, double, 4.2, 4.0, trunc); + + TEST_ROUND(double, int, -4.8, -4, trunc); + TEST_ROUND(double, int, -4.2, -4, trunc); + TEST_ROUND(double, int, 4.8, 4, trunc); + TEST_ROUND(double, int, 4.2, 4, trunc); + + TEST_ROUND(double, long, -4.8, -4, trunc); + TEST_ROUND(double, long, -4.2, -4, trunc); + TEST_ROUND(double, long, 4.8, 4, trunc); + TEST_ROUND(double, long, 4.2, 4, trunc); + + TEST_ROUND(float, long, -4.8, -4, trunc); + TEST_ROUND(float, long, -4.2, -4, trunc); + TEST_ROUND(float, long, 4.8, 4, trunc); + TEST_ROUND(float, long, 4.2, 4, trunc); + + /* Nearbyint */ + TEST_ROUND(double, double, -4.8, -5.0, nearbyint); + TEST_ROUND(double, double, -4.2, -4.0, nearbyint); + TEST_ROUND(double, double, 4.8, 5.0, nearbyint); + TEST_ROUND(double, double, 4.2, 4.0, nearbyint); + + TEST_ROUND(double, int, -4.8, -5, nearbyint); + TEST_ROUND(double, int, -4.2, -4, nearbyint); + TEST_ROUND(double, int, 4.8, 5, nearbyint); + TEST_ROUND(double, int, 4.2, 4, nearbyint); + + TEST_ROUND(double, long, -4.8, -5, nearbyint); + TEST_ROUND(double, long, -4.2, -4, nearbyint); + TEST_ROUND(double, long, 4.8, 5, nearbyint); + TEST_ROUND(double, long, 4.2, 4, nearbyint); + + TEST_ROUND(float, long, -4.8, -5, nearbyint); + TEST_ROUND(float, long, -4.2, -4, nearbyint); + TEST_ROUND(float, long, 4.8, 5, nearbyint); + TEST_ROUND(float, long, 4.2, 4, nearbyint); + + exit(0); +} + diff --git a/gcc/testsuite/gcc.target/riscv/round_32.c b/gcc/testsuite/gcc.target/riscv/round_32.c new file mode 100644 index 0000000000000000000000000000000000000000..f9fea70ad5523c76c98bd2c20ce4d17db98b8b79 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/round_32.c @@ -0,0 +1,22 @@ +/* { dg-do compile { target { riscv32*-*-* } } } */ +/* { dg-options "-march=rv32gc -mabi=ilp32d -fno-math-errno -funsafe-math-optimizations -fno-inline" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */ + +#include "round.c" + +/* { dg-final { scan-assembler-times {\mfcvt.w.s} 15 } } */ +/* { dg-final { scan-assembler-times {\mfcvt.s.w} 5 } } */ +/* { dg-final { scan-assembler-times {\mfcvt.d.w} 65 } } */ +/* { dg-final { scan-assembler-times {\mfcvt.w.d} 15 } } */ +/* { dg-final { scan-assembler-times {,rup} 6 } } */ +/* { dg-final { scan-assembler-times {,rmm} 6 } } */ +/* { dg-final { scan-assembler-times {,rdn} 6 } } */ +/* { dg-final { scan-assembler-times {,rtz} 6 } } */ +/* { dg-final { scan-assembler-not {\mfcvt.l.d} } } */ +/* { dg-final { scan-assembler-not {\mfcvt.d.l} } } */ +/* { dg-final { scan-assembler-not "\\sceil\\s" } } */ +/* { dg-final { scan-assembler-not "\\sfloor\\s" } } */ +/* { dg-final { scan-assembler-not "\\sround\\s" } } */ +/* { dg-final { scan-assembler-not "\\snearbyint\\s" } } */ +/* { dg-final { scan-assembler-not "\\srint\\s" } } */ +/* { dg-final { scan-assembler-not "\\stail\\s" } } */ diff --git a/gcc/testsuite/gcc.target/riscv/round_64.c b/gcc/testsuite/gcc.target/riscv/round_64.c new file mode 100644 index 0000000000000000000000000000000000000000..e79690979a50663c1c3518724cc23466d9e20490 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/round_64.c @@ -0,0 +1,23 @@ +/* { dg-do compile { target { riscv64*-*-* } } } */ +/* { dg-options "-march=rv64gc -mabi=lp64d -fno-math-errno -funsafe-math-optimizations -fno-inline" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */ + +#include "round.c" + +/* { dg-final { scan-assembler-times {\mfcvt.w.s} 10 } } */ +/* { dg-final { scan-assembler-times {\mfcvt.s.w} 5 } } */ +/* { dg-final { scan-assembler-times {\mfcvt.l.d} 10 } } */ +/* { dg-final { scan-assembler-times {\mfcvt.d.l} 45 } } */ +/* { dg-final { scan-assembler-times {\mfcvt.w.d} 5 } } */ +/* { dg-final { scan-assembler-times {,rup} 6 } } */ +/* { dg-final { scan-assembler-times {,rmm} 6 } } */ +/* { dg-final { scan-assembler-times {,rdn} 6 } } */ +/* { dg-final { scan-assembler-times {,rtz} 6 } } */ +/* { dg-final { scan-assembler-not "\\sceil\\s" } } */ +/* { dg-final { scan-assembler-not "\\sfloor\\s" } } */ +/* { dg-final { scan-assembler-not "\\sround\\s" } } */ +/* { dg-final { scan-assembler-not "\\snearbyint\\s" } } */ +/* { dg-final { scan-assembler-not "\\srint\\s" } } */ +/* { dg-final { scan-assembler-not "\\stail\\s" } } */ +/* { dg-final { scan-assembler-not "\\ssext.w\\s" } } */ +