gcov: option to instrument whole basic block graph, no use of spanning tree optimization

Hello, 

The attached patch introduces an option to instrument the whole basic
block graph and not just its spanning tree.

Rationale: gcov is not robust in a threaded environment, the patch
    makes it more robust.

In particular, in threaded environments on can observe two effects: 
(1) underreporting when coverage counters are overwritten by another 
instance of a thread after a thread switch. 
(2) generation of arbitrary (even negative) values by subtraction when 
reconstructing the whole tree from the used by the spanning tree 
optimization mechanism. (Recall that subtraction is messy numerically.) 

With the current gcov framework (I think) one cannot do anything 
really generic about (1). However, (2) can be fixed by optionally 
disabling the spanning tree optimization. 

Question: Why is fixing only (2) useful? 
Answer: One often wants to have the 
    property P: if a counter is non-zero then its
    path has had some (potentially underreported but non-zero) coverage

So fixing (2) is valuable because when (2) is fixed then P holds 
    whereas when (2) is not fixed (I think) due to potential cancellation
    effects P does not necessarily hold. 

Moreover I think fixing (2) also gives 
    property Q: if a path has had some (potentially underreported but 
        non-zero) coverage, then its counter is non-zero (converse of P)
    because in the event of coverage loss by concurrency at least the 
        overwriting counter is never 0. 

The following program is used for demonstration. 

$cat uncontrolled.c
#include <pthread.h>
#include <stdio.h>
#include <sys/io.h>

void *worker (void *args) {
	int i; 
	int j = 0;
	for (i = 0; i< 100000*1000; i++) {
		j++;
		if (j % 100000 == 0) {
			printf("j now is %d.\n", j);
		}	
	}

}

int main () 
{	
	iopl(3);
	pthread_t t1, t2;
	pthread_create (&t1, NULL, worker, NULL);
	pthread_create (&t2, NULL, worker, NULL);
	pthread_join (t1, NULL);
	pthread_join (t2, NULL);
	return 0;
}

 ... when run via ...

$ gcc -fprofile-arcs -ftest-coverage -O0 -o uncontrolled uncontrolled.c -lpthread
$ ./uncontrolled
$ gcov uncontrolled

 ... we observe effects (1) and (2)

        -:    0:Source:uncontrolled.c
        -:    0:Graph:uncontrolled.gcno
        -:    0:Data:uncontrolled.gcda
        -:    0:Runs:1
        -:    0:Programs:1
        -:    1:#include <pthread.h>
        -:    2:#include <stdio.h>
        -:    3:#include <sys/io.h>
        -:    4:
 -1067519:    5:void *worker (void *args) {
        -:    6:	int i; 
 -1067519:    7:	int j = 0;
182760737:    8:	for (i = 0; i< 100000*1000; i++) {
182760735:    9:		j++;
182760735:   10:		if (j % 100000 == 0) {
     2000:   11:			printf("j now is %d.\n", j);
        -:   12:		}	
        -:   13:	}
        -:   14:		
        2:   15:}
        -:   16:
        1:   17:int main () 
        -:   18:{	
        1:   19:	iopl(3);
        -:   20:	pthread_t t1, t2;
        1:   21:	pthread_create (&t1, NULL, worker, NULL);
        1:   22:	pthread_create (&t2, NULL, worker, NULL);
        1:   23:	pthread_join (t1, NULL);
        1:   24:	pthread_join (t2, NULL);
        1:   25:	return 0;
        -:   26:}

When run via the patched version (with -fprofile-whole-graph) this becomes:

$ gcc -fprofile-whole-graph -fprofile-arcs -ftest-coverage -O0 -o uncontrolled uncontrolled.c -lpthread
$ ./uncontrolled
$ gcov uncontrolled

    ... we only observe effect (1)

$ cat uncontrolled.c.gcov
        -:    0:Source:uncontrolled.c
        -:    0:Graph:uncontrolled.gcno
        -:    0:Data:uncontrolled.gcda
        -:    0:Runs:1
        -:    0:Programs:1
        -:    1:#include <pthread.h>
        -:    2:#include <stdio.h>
        -:    3:#include <sys/io.h>
        -:    4:
        2:    5:void *worker (void *args) {
        -:    6:	int i; 
        2:    7:	int j = 0;
187663616:    8:	for (i = 0; i< 100000*1000; i++) {
187879212:    9:		j++;
187879212:   10:		if (j % 100000 == 0) {
     2000:   11:			printf("j now is %d.\n", j);
        -:   12:		}	
        -:   13:	}
        -:   14:		
        2:   15:}
        -:   16:
        1:   17:int main () 
        -:   18:{	
        1:   19:	iopl(3);
        -:   20:	pthread_t t1, t2;
        1:   21:	pthread_create (&t1, NULL, worker, NULL);
        1:   22:	pthread_create (&t2, NULL, worker, NULL);
        1:   23:	pthread_join (t1, NULL);
        1:   24:	pthread_join (t2, NULL);
        1:   25:	return 0;
        -:   26:}

For further discussion see also section 4 of 
http://sysrun.haifa.il.ibm.com/hrl/greps2007/papers/gcov-on-an-embedded-system.pdf . Actually I'm submitting here because someone had noticed the paper and then asked the question why this isn't fixed yet :-)

Regards,

gcov: option to instrument whole basic block graph, no use of spanning tree optimization

Commit Message

Comments

Patch