diff mbox series

Small improvements to coverage info (4/n)

Message ID 2735580.mzTH3LooGU@polaris
State New
Headers show
Series Small improvements to coverage info (4/n) | expand

Commit Message

Eric Botcazou July 10, 2019, 9:16 p.m. UTC
Hi,

the returns are a little problematic for coverage info because they are 
commonalized in GIMPLE, i.e. turned into gotos to a representative return.
This means that you need to clear the location of the representative return, 
see lower_gimple_return:

	  /* Remove the line number from the representative return statement.
	     It now fills in for many such returns.  Failure to remove this
	     will result in incorrect results for coverage analysis.  */
	  gimple_set_location (tmp_rs.stmt, UNKNOWN_LOCATION);

otherwise the coverage info is blatantly wrong.  But this in turn means that 
such representative returns are vulneable to inheriting random source location 
information in the debug info generated by the compiler.

I have attached an Ada testcase that demonstrates the problem at -O0:

	.loc 1 14 10 discriminator 2
	movl	$0, %r13d
.L12:
	movl	%r13d, %eax
	jmp	.L23

.L12 is what's left at the RTL level of a representative return in GIMPLE and 
it inherits the location directive, which gives wrong coverage info (that's 
not visible with gcov because the instrumentation done by -fprofile-arcs 
papers over the issue, but for example with callgrind).

The proposed fix is attached: it sets the current location to the function's 
end locus when expanding such a representative return to RTL.  That's a bit on 
the kludgy side, but doing something in GIMPLE, e.g. in lower_gimple_return, 
leads to various regressions in terms of quality of diagnostics.

Tested on x86_64-suse-linux, both GCC and GDB, OK for mainline?


2019-07-10  Eric Botcazou  <ebotcazou@adacore.com>

	* cfgexpand.c (expand_gimple_stmt_1) <GIMPLE_RETURN>: If the statement
	doesn't have a location, set the current location to the function's end.

Comments

Richard Biener July 11, 2019, 9:07 a.m. UTC | #1
On Wed, Jul 10, 2019 at 11:16 PM Eric Botcazou <ebotcazou@adacore.com> wrote:
>
> Hi,
>
> the returns are a little problematic for coverage info because they are
> commonalized in GIMPLE, i.e. turned into gotos to a representative return.
> This means that you need to clear the location of the representative return,
> see lower_gimple_return:
>
>           /* Remove the line number from the representative return statement.
>              It now fills in for many such returns.  Failure to remove this
>              will result in incorrect results for coverage analysis.  */
>           gimple_set_location (tmp_rs.stmt, UNKNOWN_LOCATION);
>
> otherwise the coverage info is blatantly wrong.  But this in turn means that
> such representative returns are vulneable to inheriting random source location
> information in the debug info generated by the compiler.
>
> I have attached an Ada testcase that demonstrates the problem at -O0:
>
>         .loc 1 14 10 discriminator 2
>         movl    $0, %r13d
> .L12:
>         movl    %r13d, %eax
>         jmp     .L23
>
> .L12 is what's left at the RTL level of a representative return in GIMPLE and
> it inherits the location directive, which gives wrong coverage info (that's
> not visible with gcov because the instrumentation done by -fprofile-arcs
> papers over the issue, but for example with callgrind).
>
> The proposed fix is attached: it sets the current location to the function's
> end locus when expanding such a representative return to RTL.  That's a bit on
> the kludgy side, but doing something in GIMPLE, e.g. in lower_gimple_return,
> leads to various regressions in terms of quality of diagnostics.
>
> Tested on x86_64-suse-linux, both GCC and GDB, OK for mainline?

Hmm.  So for

int baz;
int foo()
{
  int i;
  goto skip;
done:
  return i;
skip:
  i = 1;
  if (baz)
    return baz;
  /* ... */
  goto done;
} /* (*) */

we'd assign (*) to the return?  It might be better to record
(in struct function?) the location of any actually existing
return location and use that?  In fact, similar kludgy would
be to simply not clear the GIMPLE_RETURN location
but kludge it up right away?
Can you explain how diagnostics regress when doing that?

Does coverage somehow treat the function end locus specially
so you chose that?  Will it show the alternate returns as
covered at all?  I presume stuff like cross-jumping or
GIMPLE tail-merging also wrecks coverage info in similar
ways (well, not by injecting random locations but by
not covering taken paths).

Thanks,
Richard.

>
>
> 2019-07-10  Eric Botcazou  <ebotcazou@adacore.com>
>
>         * cfgexpand.c (expand_gimple_stmt_1) <GIMPLE_RETURN>: If the statement
>         doesn't have a location, set the current location to the function's end.
>
> --
> Eric Botcazou
Eric Botcazou July 11, 2019, 10:52 a.m. UTC | #2
> Hmm.  So for
> 
> int baz;
> int foo()
> {
>   int i;
>   goto skip;
> done:
>   return i;
> skip:
>   i = 1;
>   if (baz)
>     return baz;
>   /* ... */
>   goto done;
> } /* (*) */
> 
> we'd assign (*) to the return?  It might be better to record
> (in struct function?) the location of any actually existing
> return location and use that?  In fact, similar kludgy would
> be to simply not clear the GIMPLE_RETURN location
> but kludge it up right away?

As I mentioned, this leads to various diagnostics regressions.

> Can you explain how diagnostics regress when doing that?

For example gcc.dg/uninit-17.c:

/* { dg-do compile } */
/* { dg-options "-O -Wuninitialized" } */

typedef _Complex float C;
C foo(int cond)
{
  C f;
  __imag__ f = 0;
  if (cond)
    {
      __real__ f = 1;
      return f;
    }
  return f;     /* { dg-warning "may be used" "unconditional" } */
}

The warning is not emitted on the expected line, but on the final } line.
There are a couple of other similar cases in the C and C++ testsuites.

> Does coverage somehow treat the function end locus specially
> so you chose that?  Will it show the alternate returns as
> covered at all?  I presume stuff like cross-jumping or
> GIMPLE tail-merging also wrecks coverage info in similar
> ways (well, not by injecting random locations but by
> not covering taken paths).

Simple things first please. :-)  The function's end locus is sort of a kitchen 
sink, you cannot have wrong coverage info when you use it, but only a possibly 
incomplete info.
Richard Biener July 11, 2019, 1:23 p.m. UTC | #3
On Thu, Jul 11, 2019 at 12:52 PM Eric Botcazou <ebotcazou@adacore.com> wrote:
>
> > Hmm.  So for
> >
> > int baz;
> > int foo()
> > {
> >   int i;
> >   goto skip;
> > done:
> >   return i;
> > skip:
> >   i = 1;
> >   if (baz)
> >     return baz;
> >   /* ... */
> >   goto done;
> > } /* (*) */
> >
> > we'd assign (*) to the return?  It might be better to record
> > (in struct function?) the location of any actually existing
> > return location and use that?  In fact, similar kludgy would
> > be to simply not clear the GIMPLE_RETURN location
> > but kludge it up right away?
>
> As I mentioned, this leads to various diagnostics regressions.
>
> > Can you explain how diagnostics regress when doing that?
>
> For example gcc.dg/uninit-17.c:
>
> /* { dg-do compile } */
> /* { dg-options "-O -Wuninitialized" } */
>
> typedef _Complex float C;
> C foo(int cond)
> {
>   C f;
>   __imag__ f = 0;
>   if (cond)
>     {
>       __real__ f = 1;
>       return f;
>     }
>   return f;     /* { dg-warning "may be used" "unconditional" } */
> }
>
> The warning is not emitted on the expected line, but on the final } line.

It's odd that we pick up this location w/o a location on the return stmt ...
and probably luck on which one we warn.

> There are a couple of other similar cases in the C and C++ testsuites.
>
> > Does coverage somehow treat the function end locus specially
> > so you chose that?  Will it show the alternate returns as
> > covered at all?  I presume stuff like cross-jumping or
> > GIMPLE tail-merging also wrecks coverage info in similar
> > ways (well, not by injecting random locations but by
> > not covering taken paths).
>
> Simple things first please. :-)  The function's end locus is sort of a kitchen
> sink, you cannot have wrong coverage info when you use it, but only a possibly
> incomplete info.

After your patch does behavior change when trying to break on a line
with a return stmt inside a debugger?

Thanks,
Richard.

> --
> Eric Botcazou
Eric Botcazou July 11, 2019, 5:04 p.m. UTC | #4
> After your patch does behavior change when trying to break on a line
> with a return stmt inside a debugger?

Note that every patch in the series was tested against GDB too, so hopefully 
this would have been caught...  But the answer is no, see lower_gimple_return:

  /* Generate a goto statement and remove the return statement.  */
 found:
  /* When not optimizing, make sure user returns are preserved.  */
  if (!optimize && gimple_has_location (stmt))
    DECL_ARTIFICIAL (tmp_rs.label) = 0;
  t = gimple_build_goto (tmp_rs.label);
  /* location includes block.  */
  gimple_set_location (t, gimple_location (stmt));
  gsi_insert_before (gsi, t, GSI_SAME_STMT);
  gsi_remove (gsi, false);

So the label, the goto and its location are all preserved:

(gdb) b ops.adb:11
Breakpoint 1 at 0x40308e: file ops.adb, line 11.
(gdb) run
Starting program: /home/eric/gnat/test_ops_okovb 

Breakpoint 1, ops.both_ok (a=..., b=...) at ops.adb:11
11               return True;      -- # ok

(gdb) b ops.adb:14
Breakpoint 1 at 0x4030c1: file ops.adb, line 14.
(gdb) run
Starting program: /home/eric/gnat/test_ops_okovb 

Breakpoint 1, ops.both_ok (a=..., b=...) at ops.adb:14
14               return False;     -- # ko

(gdb) b ops.adb:19
Breakpoint 1 at 0x403115: file ops.adb, line 19.
(gdb) run
Starting program: /home/eric/gnat/test_ops_okovb 

Breakpoint 1, ops.both_ok (a=..., b=...) at ops.adb:19
19               return False; -- # ov
Richard Biener July 12, 2019, 9:31 a.m. UTC | #5
On Thu, Jul 11, 2019 at 7:04 PM Eric Botcazou <ebotcazou@adacore.com> wrote:
>
> > After your patch does behavior change when trying to break on a line
> > with a return stmt inside a debugger?
>
> Note that every patch in the series was tested against GDB too, so hopefully
> this would have been caught...  But the answer is no, see lower_gimple_return:
>
>   /* Generate a goto statement and remove the return statement.  */
>  found:
>   /* When not optimizing, make sure user returns are preserved.  */
>   if (!optimize && gimple_has_location (stmt))
>     DECL_ARTIFICIAL (tmp_rs.label) = 0;
>   t = gimple_build_goto (tmp_rs.label);
>   /* location includes block.  */
>   gimple_set_location (t, gimple_location (stmt));
>   gsi_insert_before (gsi, t, GSI_SAME_STMT);
>   gsi_remove (gsi, false);
>
> So the label, the goto and its location are all preserved:
>
> (gdb) b ops.adb:11
> Breakpoint 1 at 0x40308e: file ops.adb, line 11.
> (gdb) run
> Starting program: /home/eric/gnat/test_ops_okovb
>
> Breakpoint 1, ops.both_ok (a=..., b=...) at ops.adb:11
> 11               return True;      -- # ok
>
> (gdb) b ops.adb:14
> Breakpoint 1 at 0x4030c1: file ops.adb, line 14.
> (gdb) run
> Starting program: /home/eric/gnat/test_ops_okovb
>
> Breakpoint 1, ops.both_ok (a=..., b=...) at ops.adb:14
> 14               return False;     -- # ko
>
> (gdb) b ops.adb:19
> Breakpoint 1 at 0x403115: file ops.adb, line 19.
> (gdb) run
> Starting program: /home/eric/gnat/test_ops_okovb
>
> Breakpoint 1, ops.both_ok (a=..., b=...) at ops.adb:19
> 19               return False; -- # ov

I see.  The patch is OK then.

Thanks,
Richard.

> --
> Eric Botcazou
diff mbox series

Patch

Index: cfgexpand.c
===================================================================
--- cfgexpand.c	(revision 273294)
+++ cfgexpand.c	(working copy)
@@ -3712,6 +3712,12 @@  expand_gimple_stmt_1 (gimple *stmt)
       {
 	op0 = gimple_return_retval (as_a <greturn *> (stmt));
 
+	/* If a return doesn't have a location, it very likely represents
+	   multiple user returns so we cannot let it inherit the location
+	   of the last statement of the previous basic block in RTL.  */
+	if (!gimple_has_location (stmt))
+	  set_curr_insn_location (cfun->function_end_locus);
+
 	if (op0 && op0 != error_mark_node)
 	  {
 	    tree result = DECL_RESULT (current_function_decl);