diff mbox

remove nested function hack_digit

Message ID CAGQ9bdyxCW-_3rLy6uLg4Vc2FPx+gUL7PChaXA4i6aKmnjGVZg@mail.gmail.com
State New
Headers show

Commit Message

Konstantin Serebryany Sept. 16, 2014, 1:53 a.m. UTC
Yep. I've realized that I've interpreted the test results incorrectly
(missed the failing tests).
The following patch passes tests.

On Mon, Sep 15, 2014 at 4:50 PM, Andreas Schwab <schwab@linux-m68k.org> wrote:
> Konstantin Serebryany <konstantin.s.serebryany@gmail.com> writes:
>
>> +  if (expsign != 0 && type == 'f' && exponent-- > 0)
>
>> +       fracsize = scalesize;
>
>> +         --fracsize;
>
>> +           fracsize = 1;
>
>> +     frac[fracsize++] = _cy;
>
> All these side effects won't work any more.
>
> Andreas.
>
> --
> Andreas Schwab, schwab@linux-m68k.org
> GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
> "And now for something completely different."

Comments

Andreas Schwab Sept. 16, 2014, 7:44 a.m. UTC | #1
Konstantin Serebryany <konstantin.s.serebryany@gmail.com> writes:

> Yep. I've realized that I've interpreted the test results incorrectly
> (missed the failing tests).
> The following patch passes tests.

How does it affect code generation?

Andreas.
Konstantin Serebryany Sept. 16, 2014, 3:58 p.m. UTC | #2
Are you asking about the results from my particular compiler  (GCC
4.8.2) or in theoretical sense?
With 4.8.2 in both cases the calls to hack_digit are not inlined
(which I think is correct).

W/o the change the calls look like this:

  4b362:       4c 8b 95 30 ff ff ff    mov    -0xd0(%rbp),%r10
   4b369:       89 95 38 ff ff ff       mov    %edx,-0xc8(%rbp)
   4b36f:       e8 1c f0 ff ff          callq  4a390 <hack_digit.13608>

W/ this change the call takes more instructions to execute:
   4b1e4:       4c 89 ad 50 ff ff ff    mov    %r13,-0xb0(%rbp)
   4b1eb:       4c 89 bd 00 ff ff ff    mov    %r15,-0x100(%rbp)
   4b1f2:       4c 8b a5 40 ff ff ff    mov    -0xc0(%rbp),%r12
   4b1f9:       4c 8b ad f8 fe ff ff    mov    -0x108(%rbp),%r13
   4b200:       31 db                   xor    %ebx,%ebx
   4b202:       4c 8b b5 70 ff ff ff    mov    -0x90(%rbp),%r14
   4b209:       4c 8b bd 48 ff ff ff    mov    -0xb8(%rbp),%r15
   4b210:       48 89 85 58 ff ff ff    mov    %rax,-0xa8(%rbp)
   4b217:       66 0f 1f 84 00 00 00    nopw   0x0(%rax,%rax,1)
   4b21e:       00 00
   4b220:       4c 8b 8d 78 ff ff ff    mov    -0x88(%rbp),%r9
   4b227:       4c 8b 85 58 ff ff ff    mov    -0xa8(%rbp),%r8
   4b22e:       4c 89 fa                mov    %r15,%rdx
   4b231:       48 8b 4d 80             mov    -0x80(%rbp),%rcx
   4b235:       8b b5 68 ff ff ff       mov    -0x98(%rbp),%esi
   4b23b:       8b bd 60 ff ff ff       mov    -0xa0(%rbp),%edi
   4b241:       4c 89 74 24 08          mov    %r14,0x8(%rsp)
   4b246:       4c 89 24 24             mov    %r12,(%rsp)
   4b24a:       e8 41 f1 ff ff          callq  4a390 <hack_digit>

It doesn't mean the new code is worse or better.
There is no miracle in the nested function call,
you still need to pass all the parameters and this is just done with a
different calling convention.

--kcc

On Tue, Sep 16, 2014 at 12:44 AM, Andreas Schwab <schwab@linux-m68k.org> wrote:
> Konstantin Serebryany <konstantin.s.serebryany@gmail.com> writes:
>
>> Yep. I've realized that I've interpreted the test results incorrectly
>> (missed the failing tests).
>> The following patch passes tests.
>
> How does it affect code generation?
>
> Andreas.
>
> --
> Andreas Schwab, schwab@linux-m68k.org
> GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
> "And now for something completely different."
Andreas Schwab Sept. 16, 2014, 5:31 p.m. UTC | #3
Konstantin Serebryany <konstantin.s.serebryany@gmail.com> writes:

> It doesn't mean the new code is worse or better.

That's something we should find out.

Andreas.
Konstantin Serebryany Sept. 16, 2014, 6:03 p.m. UTC | #4
Suggestions on how to do that?

On Tue, Sep 16, 2014 at 10:31 AM, Andreas Schwab <schwab@linux-m68k.org> wrote:
> Konstantin Serebryany <konstantin.s.serebryany@gmail.com> writes:
>
>> It doesn't mean the new code is worse or better.
>
> That's something we should find out.
>
> Andreas.
>
> --
> Andreas Schwab, schwab@linux-m68k.org
> GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
> "And now for something completely different."
Roland McGrath Sept. 22, 2014, 9:43 p.m. UTC | #5
> Are you asking about the results from my particular compiler  (GCC
> 4.8.2) or in theoretical sense?

When we're concerned with generated code quality, in general it is
sufficient to demonstrate the effects just with the most-recent or a
pretty-recent GCC version.  4.8.2 is a fine reference version today.

> It doesn't mean the new code is worse or better.

What you showed does indeed mean that the new code is worse: more code to
do the same thing.  You did not show the code of hack_digit itself.  It's
possible that things improved inside the function such that the total
effect of your change is in fact an improvement or a wash, but you have to
actually analyze all the code and make a real judgment to say one or the
other.

> There is no miracle in the nested function call,
> you still need to pass all the parameters and this is just done with a
> different calling convention.

This is not really true, or at least presented in a rather misleading way.
A call to a nested function does not pass "all the parameters" at all.
Semantically, you could say that it "passes" them all by reference.  But
what it actually does is pass a single pointer parameter (the static chain)
that is used inside the nested function for indirect access to the
containing function's local variables.  So the closest approximation of
what the existing code is doing is to put all the referenced parent locals
together into a local struct and pass a pointer to that.

In this particular case, that might be worth doing or it might be just as
good (or better) from a source maintenance perspective and/or a generated
code quality perspective to pass individual variables by reference as your
last patch does.  The onus is on you to propose a particular change,
demonstrate with thorough analysis that it improves or at least does not
harm generated code quality, and make the aesthetic case that improves or
does not substantially harm source maintenance.
Konstantin Serebryany Sept. 22, 2014, 9:57 p.m. UTC | #6
On Mon, Sep 22, 2014 at 2:43 PM, Roland McGrath <roland@hack.frob.com> wrote:
>> Are you asking about the results from my particular compiler  (GCC
>> 4.8.2) or in theoretical sense?
>
> When we're concerned with generated code quality, in general it is
> sufficient to demonstrate the effects just with the most-recent or a
> pretty-recent GCC version.  4.8.2 is a fine reference version today.
>
>> It doesn't mean the new code is worse or better.
>
> What you showed does indeed mean that the new code is worse: more code to
> do the same thing.  You did not show the code of hack_digit itself.

hack_digit becomes longer too due to longer function prologue epilogue:
Was:
000000000004a390 <hack_digit.13608>:
   4a390:       55                      push   %rbp
   4a391:       53                      push   %rbx
   4a392:       4c 89 d3                mov    %r10,%rbx
   4a395:       48 83 ec 08             sub    $0x8,%rsp
   4a399:       41 8b 42 30             mov    0x30(%r10),%eax
   4a39d:       85 c0                   test   %eax,%eax

Now:
000000000004a390 <hack_digit>:
   4a390:       41 55                   push   %r13
   4a392:       41 54                   push   %r12
   4a394:       55                      push   %rbp
   4a395:       4c 89 c5                mov    %r8,%rbp
   4a398:       4d 89 c8                mov    %r9,%r8
   4a39b:       53                      push   %rbx
   4a39c:       48 89 cb                mov    %rcx,%rbx
   4a39f:       48 83 ec 08             sub    $0x8,%rsp
   4a3a3:       85 ff                   test   %edi,%edi

(similar for multiple epilogues)


> It's
> possible that things improved inside the function such that the total
> effect of your change is in fact an improvement or a wash, but you have to
> actually analyze all the code and make a real judgment to say one or the
> other.
>
>> There is no miracle in the nested function call,
>> you still need to pass all the parameters and this is just done with a
>> different calling convention.
>
> This is not really true, or at least presented in a rather misleading way.
> A call to a nested function does not pass "all the parameters" at all.
> Semantically, you could say that it "passes" them all by reference.  But
> what it actually does is pass a single pointer parameter (the static chain)
> that is used inside the nested function for indirect access to the
> containing function's local variables.  So the closest approximation of
> what the existing code is doing is to put all the referenced parent locals
> together into a local struct and pass a pointer to that.

I considered doing such a patch but it turned out a huge textual
change that will make the code much less readable.
Still, let me do it and send it here anyway, unless you tell me no to.

--kcc

>
> In this particular case, that might be worth doing or it might be just as
> good (or better) from a source maintenance perspective and/or a generated
> code quality perspective to pass individual variables by reference as your
> last patch does.  The onus is on you to propose a particular change,
> demonstrate with thorough analysis that it improves or at least does not
> harm generated code quality, and make the aesthetic case that improves or
> does not substantially harm source maintenance.
Roland McGrath Sept. 22, 2014, 10:45 p.m. UTC | #7
> hack_digit becomes longer too due to longer function prologue epilogue:

Clearly that cannot be the only difference.  Was the function's code itself
actually all nearly identical, modulo trivial differences like different
register allocation choices?  Each access to one of the parent's locals
surely looks different, and how different that code looks is probably where
the most important differences are.

> I considered doing such a patch but it turned out a huge textual
> change that will make the code much less readable.
> Still, let me do it and send it here anyway, unless you tell me no to.

Of course readability is very subjective, so there really is no substitute
for each interested person just seeing how things look and giving their
opinion.  The most trivial mechanical change might harm readability in ways
that can be improved with a little thought.
Richard Henderson Sept. 23, 2014, 2:39 p.m. UTC | #8
On 09/22/2014 02:43 PM, Roland McGrath wrote:
> So the closest approximation of
> what the existing code is doing is to put all the referenced parent locals
> together into a local struct and pass a pointer to that.

That description is no longer the closest approximation, that is *exactly* what
happens.

See the block comment at the start of gcc/tree-nested.c which explains how
insane the gcc 2.x scheme really was.


r~
diff mbox

Patch

diff --git a/stdio-common/printf_fp.c b/stdio-common/printf_fp.c
index 9cd4b4b..679cb70 100644
--- a/stdio-common/printf_fp.c
+++ b/stdio-common/printf_fp.c
@@ -148,6 +148,49 @@  static wchar_t *group_number (wchar_t *buf, wchar_t *bufend,
 			      wchar_t thousands_sep, int ngroups)
      internal_function;
 
+static wchar_t hack_digit (int expsign, int type, int *exponent,
+			   mp_limb_t *frac, mp_size_t *fracsize,
+			   mp_limb_t *scale, mp_size_t scalesize,
+			   mp_limb_t *tmp)
+{
+  mp_limb_t hi;
+
+  if (expsign != 0 && type == 'f' && (*exponent)-- > 0)
+    hi = 0;
+  else if (scalesize == 0)
+    {
+      hi = frac[*fracsize - 1];
+      frac[*fracsize - 1] = __mpn_mul_1 (frac, frac, *fracsize - 1, 10);
+    }
+  else
+    {
+      if (*fracsize < scalesize)
+	hi = 0;
+      else
+	{
+	  hi = mpn_divmod (tmp, frac, *fracsize, scale, scalesize);
+	  tmp[*fracsize - scalesize] = hi;
+	  hi = tmp[0];
+
+	  *fracsize = scalesize;
+	  while (*fracsize != 0 && frac[*fracsize - 1] == 0)
+	    --(*fracsize);
+	  if (*fracsize == 0)
+	    {
+	      /* We're not prepared for an mpn variable with zero
+		 limbs.  */
+	      *fracsize = 1;
+	      return L'0' + hi;
+	    }
+	}
+
+      mp_limb_t _cy = __mpn_mul_1 (frac, frac, *fracsize, 10);
+      if (_cy != 0)
+	frac[(*fracsize)++] = _cy;
+    }
+
+  return L'0' + hi;
+}
 
 int
 ___printf_fp (FILE *fp,
@@ -213,50 +256,6 @@  ___printf_fp (FILE *fp,
   /* Flag whether wbuffer is malloc'ed or not.  */
   int buffer_malloced = 0;
 
-  auto wchar_t hack_digit (void);
-
-  wchar_t hack_digit (void)
-    {
-      mp_limb_t hi;
-
-      if (expsign != 0 && type == 'f' && exponent-- > 0)
-	hi = 0;
-      else if (scalesize == 0)
-	{
-	  hi = frac[fracsize - 1];
-	  frac[fracsize - 1] = __mpn_mul_1 (frac, frac, fracsize - 1, 10);
-	}
-      else
-	{
-	  if (fracsize < scalesize)
-	    hi = 0;
-	  else
-	    {
-	      hi = mpn_divmod (tmp, frac, fracsize, scale, scalesize);
-	      tmp[fracsize - scalesize] = hi;
-	      hi = tmp[0];
-
-	      fracsize = scalesize;
-	      while (fracsize != 0 && frac[fracsize - 1] == 0)
-		--fracsize;
-	      if (fracsize == 0)
-		{
-		  /* We're not prepared for an mpn variable with zero
-		     limbs.  */
-		  fracsize = 1;
-		  return L'0' + hi;
-		}
-	    }
-
-	  mp_limb_t _cy = __mpn_mul_1 (frac, frac, fracsize, 10);
-	  if (_cy != 0)
-	    frac[fracsize++] = _cy;
-	}
-
-      return L'0' + hi;
-    }
-
-
   /* Figure out the decimal point character.  */
   if (info->extra == 0)
     {
@@ -914,7 +913,8 @@  ___printf_fp (FILE *fp,
 	while (intdig_no < intdig_max)
 	  {
 	    ++intdig_no;
-	    *wcp++ = hack_digit ();
+	    *wcp++ = hack_digit (expsign, type, &exponent, frac, &fracsize,
+				 scale, scalesize, tmp);
 	  }
 	significant = 1;
 	if (info->alt
@@ -938,7 +938,8 @@  ___printf_fp (FILE *fp,
 	   || (fracdig_no < fracdig_max && (fracsize > 1 || frac[0] != 0)))
       {
 	++fracdig_no;
-	*wcp = hack_digit ();
+	*wcp = hack_digit (expsign, type, &exponent, frac, &fracsize,
+			   scale, scalesize, tmp);
 	if (*wcp++ != L'0')
 	  significant = 1;
 	else if (significant == 0)
@@ -951,7 +952,8 @@  ___printf_fp (FILE *fp,
 
     /* Do rounding.  */
     wchar_t last_digit = wcp[-1] != decimalwc ? wcp[-1] : wcp[-2];
-    wchar_t next_digit = hack_digit ();
+    wchar_t next_digit = hack_digit (expsign, type, &exponent, frac, &fracsize,
+                                     scale, scalesize, tmp);
     bool more_bits;
     if (next_digit != L'0' && next_digit != L'5')
       more_bits = true;