Message ID | 1490043778-30228-1-git-send-email-tuliom@linux.vnet.ibm.com |
---|---|
State | New |
Headers | show |
On 20/03/2017 18:02, Tulio Magno Quites Machado Filho wrote: > 1. Fix the results for negative subnormals by ignoring the signal when > normalizing the value. > 2. Fix the output when the high part is a power of 2 and the low part > is a nonzero number with opposite sign. This fix is based on commit > 380bd0fd2418f8988217de950f8b8ff18af0cb2b. It is not a blocker, but based on the complexity of the change and the optimization real case usage, I wonder if it would be better to just remove it and use the default implementation instead.
Adhemerval Zanella <adhemerval.zanella@linaro.org> writes: > On 20/03/2017 18:02, Tulio Magno Quites Machado Filho wrote: >> 1. Fix the results for negative subnormals by ignoring the signal when >> normalizing the value. >> 2. Fix the output when the high part is a power of 2 and the low part >> is a nonzero number with opposite sign. This fix is based on commit >> 380bd0fd2418f8988217de950f8b8ff18af0cb2b. > > It is not a blocker, but based on the complexity of the change and the > optimization real case usage, I wonder if it would be better to just remove > it and use the default implementation instead. I tried to do that before fixing these, but the performance difference is still around 2x.
On 21/03/2017 17:09, Tulio Magno Quites Machado Filho wrote: > Adhemerval Zanella <adhemerval.zanella@linaro.org> writes: > >> On 20/03/2017 18:02, Tulio Magno Quites Machado Filho wrote: >>> 1. Fix the results for negative subnormals by ignoring the signal when >>> normalizing the value. >>> 2. Fix the output when the high part is a power of 2 and the low part >>> is a nonzero number with opposite sign. This fix is based on commit >>> 380bd0fd2418f8988217de950f8b8ff18af0cb2b. >> >> It is not a blocker, but based on the complexity of the change and the >> optimization real case usage, I wonder if it would be better to just remove >> it and use the default implementation instead. > > I tried to do that before fixing these, but the performance difference is > still around 2x. I would expect it, but the question is do we really care about the performance of logbl-ibm128 to maintain a separated implementation for power7+ only?
Adhemerval Zanella <adhemerval.zanella@linaro.org> writes: > On 21/03/2017 17:09, Tulio Magno Quites Machado Filho wrote: >> Adhemerval Zanella <adhemerval.zanella@linaro.org> writes: >> >>> On 20/03/2017 18:02, Tulio Magno Quites Machado Filho wrote: >>>> 1. Fix the results for negative subnormals by ignoring the signal when >>>> normalizing the value. >>>> 2. Fix the output when the high part is a power of 2 and the low part >>>> is a nonzero number with opposite sign. This fix is based on commit >>>> 380bd0fd2418f8988217de950f8b8ff18af0cb2b. >>> >>> It is not a blocker, but based on the complexity of the change and the >>> optimization real case usage, I wonder if it would be better to just remove >>> it and use the default implementation instead. >> >> I tried to do that before fixing these, but the performance difference is >> still around 2x. > > I would expect it, but the question is do we really care about the performance > of logbl-ibm128 to maintain a separated implementation for power7+ only? Yes, we still need it. Considering your proposal was not a blocker, I plan to merge this fix. Anyway, I agree in re-evaluating the removal of this optimization in a few releases.
On Tue, 2017-03-21 at 17:18 -0300, Adhemerval Zanella wrote: > > On 21/03/2017 17:09, Tulio Magno Quites Machado Filho wrote: > > Adhemerval Zanella <adhemerval.zanella@linaro.org> writes: > > > >> On 20/03/2017 18:02, Tulio Magno Quites Machado Filho wrote: > >>> 1. Fix the results for negative subnormals by ignoring the signal when > >>> normalizing the value. > >>> 2. Fix the output when the high part is a power of 2 and the low part > >>> is a nonzero number with opposite sign. This fix is based on commit > >>> 380bd0fd2418f8988217de950f8b8ff18af0cb2b. > >> > >> It is not a blocker, but based on the complexity of the change and the > >> optimization real case usage, I wonder if it would be better to just remove > >> it and use the default implementation instead. > > > > I tried to do that before fixing these, but the performance difference is > > still around 2x. > > I would expect it, but the question is do we really care about the performance > of logbl-ibm128 to maintain a separated implementation for power7+ only? > Yes we still care. The power7 optimization avoids LHS by performing the mask and convert in VSX. and applies to power8, power9 for IBM long double.. It will be some time before _float128 is fully deployed and until then customers will notice, if this IBM long double optimization went missing.
On 28/04/2017 15:26, Steven Munroe wrote: > On Tue, 2017-03-21 at 17:18 -0300, Adhemerval Zanella wrote: >> >> On 21/03/2017 17:09, Tulio Magno Quites Machado Filho wrote: >>> Adhemerval Zanella <adhemerval.zanella@linaro.org> writes: >>> >>>> On 20/03/2017 18:02, Tulio Magno Quites Machado Filho wrote: >>>>> 1. Fix the results for negative subnormals by ignoring the signal when >>>>> normalizing the value. >>>>> 2. Fix the output when the high part is a power of 2 and the low part >>>>> is a nonzero number with opposite sign. This fix is based on commit >>>>> 380bd0fd2418f8988217de950f8b8ff18af0cb2b. >>>> >>>> It is not a blocker, but based on the complexity of the change and the >>>> optimization real case usage, I wonder if it would be better to just remove >>>> it and use the default implementation instead. >>> >>> I tried to do that before fixing these, but the performance difference is >>> still around 2x. >> >> I would expect it, but the question is do we really care about the performance >> of logbl-ibm128 to maintain a separated implementation for power7+ only? >> > Yes we still care. The power7 optimization avoids LHS by performing the > mask and convert in VSX. and applies to power8, power9 for IBM long > double.. Yeah I am aware of it, I was the one that code it ;) > > It will be some time before _float128 is fully deployed and until then > customers will notice, if this IBM long double optimization went > missing. > It is not a blocker, just a realization if it worth the maintenance of keep a separate implementation for powerpc for this specific symbol. I usually prefer to simplify the code and try to use the default implementation where possible, however if the powerpc arch maintainer sees that it is still worth the trouble I am ok with it.
diff --git a/sysdeps/powerpc/power7/fpu/s_logbl.c b/sysdeps/powerpc/power7/fpu/s_logbl.c index f7ecbd1..3ae383a 100644 --- a/sysdeps/powerpc/power7/fpu/s_logbl.c +++ b/sysdeps/powerpc/power7/fpu/s_logbl.c @@ -35,14 +35,16 @@ static const union { long double __logbl (long double x) { - double xh; + double xh, xl; double ret; + int64_t hx; if (__builtin_expect (x == 0.0L, 0)) /* Raise FE_DIVBYZERO and return -HUGE_VAL[LF]. */ return -1.0L / __builtin_fabsl (x); - xh = ldbl_high (x); + ldbl_unpack (x, &xh, &xl); + EXTRACT_WORDS64 (hx, xh); /* ret = x & 0x7ff0000000000000; */ asm ( "xxland %x0,%x1,%x2\n" @@ -58,10 +60,20 @@ __logbl (long double x) { /* POSIX specifies that denormal number is treated as though it were normalized. */ - int64_t hx; - - EXTRACT_WORDS64 (hx, xh); - return (long double) (-1023 - (__builtin_clzll (hx) - 12)); + return (long double) (- (__builtin_clzll (hx & 0x7fffffffffffffffLL) \ + - 12) - 1023); + } + else if ((hx & 0x000fffffffffffffLL) == 0) + { + /* If the high part is a power of 2, and the low part is nonzero + with the opposite sign, the low part affects the + exponent. */ + int64_t lx, rhx; + EXTRACT_WORDS64 (lx, xl); + rhx = (hx & 0x7ff0000000000000LL) >> 52; + if ((hx ^ lx) < 0 && (lx & 0x7fffffffffffffffLL) != 0) + rhx--; + return (long double) (rhx - 1023); } /* Test to avoid logb_downward (0.0) == -0.0. */ return ret == -0.0 ? 0.0 : ret;