diff mbox series

[v2] Inline IBM long double __gcc_qsub

Message ID CAGWvny=toN2XQSN_XjArYcn4KYY3Lxkuk3ate5CTkhEtVQ4MQQ@mail.gmail.com
State New
Headers show
Series [v2] Inline IBM long double __gcc_qsub | expand

Commit Message

David Edelsohn Aug. 26, 2021, 6:57 p.m. UTC
While performing some tests of IEEE 128 float for PPC64LE, Michael
    Meissner noticed that __gcc_qsub is substantially slower than
    __gcc_qadd.  __gcc_qsub calls __gcc_add with the second operand
    negated.  Because the functions normally are invoked through
    libgcc shared object, the extra PLT overhead has a large impact
    on the overall time of the function.  This patch converts
    __gcc_qadd to a static inline function invoked by __gcc_qadd
    and __gcc_qsub.

    libgcc/ChangeLog:

            * config/rs6000/ibm-ldouble.c (ldouble_qadd_internal): Rename from
            __gcc_qadd.
            (__gcc_qadd): Call ldouble_qadd_internal.
            (__gcc_qsub): Call ldouble_qadd_internal with second long double
            argument negated.

Comments

Segher Boessenkool Aug. 26, 2021, 10:51 p.m. UTC | #1
Hi!

On Thu, Aug 26, 2021 at 02:57:35PM -0400, David Edelsohn wrote:
>             * config/rs6000/ibm-ldouble.c (ldouble_qadd_internal): Rename from
>             __gcc_qadd.
>             (__gcc_qadd): Call ldouble_qadd_internal.
>             (__gcc_qsub): Call ldouble_qadd_internal with second long double
>             argument negated.

Still looks good, please commit.  Thanks :-)

> +static inline IBM128_TYPE
> +ldouble_qadd_internal (double a, double aa, double c, double cc)

Does it end up actually inlined, or as one static function that both
__gcc_qadd and __gcc_qsub use?  This is fine for complexity, it is just
a simple tail-call jump, just wondering what the compiler thinks is best
here (it matters in other cases, if the inline function has conditional
branches for example).


Segher
David Edelsohn Aug. 26, 2021, 11:07 p.m. UTC | #2
On Thu, Aug 26, 2021 at 6:53 PM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> Hi!
>
> On Thu, Aug 26, 2021 at 02:57:35PM -0400, David Edelsohn wrote:
> >             * config/rs6000/ibm-ldouble.c (ldouble_qadd_internal): Rename from
> >             __gcc_qadd.
> >             (__gcc_qadd): Call ldouble_qadd_internal.
> >             (__gcc_qsub): Call ldouble_qadd_internal with second long double
> >             argument negated.
>
> Still looks good, please commit.  Thanks :-)
>
> > +static inline IBM128_TYPE
> > +ldouble_qadd_internal (double a, double aa, double c, double cc)
>
> Does it end up actually inlined, or as one static function that both
> __gcc_qadd and __gcc_qsub use?  This is fine for complexity, it is just
> a simple tail-call jump, just wondering what the compiler thinks is best
> here (it matters in other cases, if the inline function has conditional
> branches for example).

I confirmed that the implementation is inlined in both functions when
compiled with optimization, and the negation is propagated into qsub.

Thanks, David
diff mbox series

Patch

diff --git a/libgcc/config/rs6000/ibm-ldouble.c
b/libgcc/config/rs6000/ibm-ldouble.c
index 4c13453f975..0b385aa940b 100644
--- a/libgcc/config/rs6000/ibm-ldouble.c
+++ b/libgcc/config/rs6000/ibm-ldouble.c
@@ -118,8 +118,8 @@  pack_ldouble (double dh, double dl)
 }

 /* Add two 'IBM128_TYPE' values and return the result. */
-IBM128_TYPE
-__gcc_qadd (double a, double aa, double c, double cc)
+static inline IBM128_TYPE
+ldouble_qadd_internal (double a, double aa, double c, double cc)
 {
   double xh, xl, z, q, zz;

@@ -158,9 +158,15 @@  __gcc_qadd (double a, double aa, double c, double cc)
 }

 IBM128_TYPE
-__gcc_qsub (double a, double b, double c, double d)
+__gcc_qadd (double a, double aa, double c, double cc)
+{
+  return ldouble_qadd_internal (a, aa, c, cc);
+}
+
+IBM128_TYPE
+__gcc_qsub (double a, double aa, double c, double cc)
 {
-  return __gcc_qadd (a, b, -c, -d);
+  return ldouble_qadd_internal (a, aa, -c, -cc);
 }

 #ifdef __NO_FPRS__