diff mbox

Output DIEs for outlined OpenMP functions in correct lexical scope

Message ID 20170504174551.27e1ec17@pinnacle.lan
State New
Headers show

Commit Message

Kevin Buettner May 5, 2017, 12:45 a.m. UTC
Consider the following OpenMP program:

    void foo (int a1) {}

    int
    main (void)
    {
      static int s1 = -41;
      int i1 = 11, i2;

      for (i2 = 1; i2 <= 2; i2++)
        {
          int pass = i2;
    #pragma omp parallel num_threads (2) firstprivate (i1)
          {
            foo (i1);
          }
          foo(pass);
        }
      foo (s1); foo (i2);
    }

At the moment, when debugging such a program, GDB is not able to find
and print the values of s1, i2, and pass.

My changes to omp-low.c and omp-expand.c, in conjunction with several
other patches, allow GDB to find and print these values.

This is the current behavior when debugging in GDB:

    (gdb) b 14
    Breakpoint 1 at 0x400617: file ex3.c, line 14.
    (gdb) run
    Starting program: /mesquite2/.ironwood2/omp-tests/k/ex3-trunk
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib64/libthread_db.so.1".
    [New Thread 0x7ffff73ca700 (LWP 32628)]

    Thread 1 "ex3-trunk" hit Breakpoint 1, main._omp_fn.0 () at ex3.c:14
    14              foo (i1);
    (gdb) p s1
    No symbol "s1" in current context.
    (gdb) p i1
    $1 = 11
    (gdb) p i2
    No symbol "i2" in current context.
    (gdb) p pass
    No symbol "pass" in current context.
    (gdb) c
    Continuing.
    [Switching to Thread 0x7ffff73ca700 (LWP 32628)]

    Thread 2 "ex3-trunk" hit Breakpoint 1, main._omp_fn.0 () at ex3.c:14
    14              foo (i1);
    (gdb) p s1
    No symbol "s1" in current context.
    (gdb) p i1
    $2 = 11
    (gdb) p i2
    No symbol "i2" in current context.
    (gdb) p pass
    No symbol "pass" in current context.
    (gdb) bt
    #0  main._omp_fn.0 () at ex3.c:14
    #1  0x00007ffff7bc4926 in gomp_thread_start (xdata=<optimized out>)
    at gcc/libgomp/team.c:122
    #2  0x00007ffff799761a in start_thread () from /lib64/libpthread.so.0
    #3  0x00007ffff76d159d in clone () from /lib64/libc.so.6

Note that GDB is unable to find s1, i2, or pass for either thread.

I show the backtrace for thread 2 because it's the more difficult case
to handle due to the stack trace stopping at clone().  The stack frame
for main(), which is where the variables of interest reside, is not a
part of this stack.

When we run this example using the patches associated with this change
along with several other patches, GDB's behavior looks like this:

    (gdb) b 14
    Breakpoint 1 at 0x400617: file ex3.c, line 14.
    (gdb) run
    Starting program: /mesquite2/.ironwood2/omp-tests/k/ex3-new
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib64/libthread_db.so.1".
    [New Thread 0x7ffff73ca700 (LWP 32643)]

    Thread 1 "ex3-new" hit Breakpoint 1, main._omp_fn.0 () at ex3.c:14
    14              foo (i1);
    (gdb) p s1
    $1 = -41
    (gdb) p i1
    $2 = 11
    (gdb) p i2
    $3 = 1
    (gdb) p pass
    $4 = 1
    (gdb) c
    Continuing.
    [Switching to Thread 0x7ffff73ca700 (LWP 32643)]

    Thread 2 "ex3-new" hit Breakpoint 1, main._omp_fn.0 () at ex3.c:14
    14              foo (i1);
    (gdb) p s1
    $5 = -41
    (gdb) p i1
    $6 = 11
    (gdb) p i2
    $7 = 1
    (gdb) p pass
    $8 = 1

I didn't show the stack here.  It's the same as before.  (I would,
however, like to be able to make GDB display a unified stack.)

Note that GDB is now able to find and print values for s1, i2, and
pass.

GCC constructs a new function for executing the parallel code.
The debugging information entry for this function is presently
placed at the same level as that of the function in which the
"#pragma omp parallel" directive appeared.

This is partial output from "readelf -v" for the (non-working) example
shown above:

 <1><2d>: Abbrev Number: 2 (DW_TAG_subprogram)
    <2e>   DW_AT_external    : 1
    <2e>   DW_AT_name        : (indirect string, offset: 0x80): main
    <32>   DW_AT_decl_file   : 1
    <33>   DW_AT_decl_line   : 4
    <34>   DW_AT_prototyped  : 1
    <34>   DW_AT_type        : <0x9d>
    <38>   DW_AT_low_pc      : 0x400591
    <40>   DW_AT_high_pc     : 0x71
    <48>   DW_AT_frame_base  : 1 byte block: 9c         (DW_OP_call_frame_cfa)
    <4a>   DW_AT_GNU_all_tail_call_sites: 1
    <4a>   DW_AT_sibling     : <0x9d>
 <2><4e>: Abbrev Number: 3 (DW_TAG_variable)
    <4f>   DW_AT_name        : s1
    <52>   DW_AT_decl_file   : 1
    <53>   DW_AT_decl_line   : 6
    <54>   DW_AT_type        : <0x9d>
    <58>   DW_AT_location    : 9 byte block: 3 30 10 60 0 0 0 0 0       (DW_OP_addr: 601030)
 ...
 <2><7c>: Abbrev Number: 4 (DW_TAG_lexical_block)
    <7d>   DW_AT_low_pc      : 0x4005a9
    <85>   DW_AT_high_pc     : 0x31
 <3><8d>: Abbrev Number: 5 (DW_TAG_variable)
    <8e>   DW_AT_name        : (indirect string, offset: 0x55): pass
    <92>   DW_AT_decl_file   : 1
    <93>   DW_AT_decl_line   : 11
    <94>   DW_AT_type        : <0x9d>
    <98>   DW_AT_location    : 2 byte block: 91 64      (DW_OP_fbreg: -28)
 ...
 <1><a4>: Abbrev Number: 7 (DW_TAG_subprogram)
    <a5>   DW_AT_name        : (indirect string, offset: 0x5a): main._omp_fn.0
    <a9>   DW_AT_prototyped  : 1
    <a9>   DW_AT_artificial  : 1
    <a9>   DW_AT_low_pc      : 0x400602
    <b1>   DW_AT_high_pc     : 0x21
    <b9>   DW_AT_frame_base  : 1 byte block: 9c         (DW_OP_call_frame_cfa)
    <bb>   DW_AT_GNU_all_tail_call_sites: 1
    <bb>   DW_AT_sibling     : <0xd5>
 <2><bf>: Abbrev Number: 8 (DW_TAG_formal_parameter)
    <c0>   DW_AT_type        : <0xed>
    <c4>   DW_AT_artificial  : 1
    <c4>   DW_AT_location    : 2 byte block: 91 58      (DW_OP_fbreg: -40)
 ...

The nesting level is indicated by the leading number in angle
brackets.  This shows that the DW_TAG_VARIABLE DIE for s1 is nested
within the DIE for main's DW_TAG_subprogram DIE.  Likewise, the
lexical block corresponding to the body of the for-loop is also nested
within main().  However, the DIES for main and main._omp_fun.0 are
both at level 1.

With my patches for omp-low.c and omp-expand.c, the readelf -w output
looks like this instead:

 <1><2d>: Abbrev Number: 2 (DW_TAG_subprogram)
    <2e>   DW_AT_external    : 1
    <2e>   DW_AT_name        : (indirect string, offset: 0x80): main
    <32>   DW_AT_decl_file   : 1
    <33>   DW_AT_decl_line   : 4
    <34>   DW_AT_prototyped  : 1
    <34>   DW_AT_type        : <0xca>
    <38>   DW_AT_low_pc      : 0x400591
    <40>   DW_AT_high_pc     : 0x71
    <48>   DW_AT_frame_base  : 1 byte block: 9c         (DW_OP_call_frame_cfa)
    <4a>   DW_AT_GNU_all_tail_call_sites: 1
    <4a>   DW_AT_sibling     : <0xca>
 <2><4e>: Abbrev Number: 3 (DW_TAG_variable)
    <4f>   DW_AT_name        : s1
    <52>   DW_AT_decl_file   : 1
    <53>   DW_AT_decl_line   : 6
    <54>   DW_AT_type        : <0xca>
    <58>   DW_AT_location    : 9 byte block: 3 30 10 60 0 0 0 0 0       (DW_OP_addr: 601030)
 <2><62>: Abbrev Number: 3 (DW_TAG_variable)
    <63>   DW_AT_name        : i1
    <66>   DW_AT_decl_file   : 1
    <67>   DW_AT_decl_line   : 7
    <68>   DW_AT_type        : <0xca>
    <6c>   DW_AT_location    : 2 byte block: 91 68      (DW_OP_fbreg: -24)
 <2><6f>: Abbrev Number: 3 (DW_TAG_variable)
    <70>   DW_AT_name        : i2
    <73>   DW_AT_decl_file   : 1
    <74>   DW_AT_decl_line   : 7
    <75>   DW_AT_type        : <0xca>
    <79>   DW_AT_location    : 2 byte block: 91 6c      (DW_OP_fbreg: -20)
 <2><7c>: Abbrev Number: 4 (DW_TAG_lexical_block)
    <7d>   DW_AT_low_pc      : 0x4005a9
    <85>   DW_AT_high_pc     : 0x31
 <3><8d>: Abbrev Number: 5 (DW_TAG_variable)
    <8e>   DW_AT_name        : (indirect string, offset: 0x55): pass
    <92>   DW_AT_decl_file   : 1
    <93>   DW_AT_decl_line   : 11
    <94>   DW_AT_type        : <0xca>
    <98>   DW_AT_location    : 2 byte block: 91 64      (DW_OP_fbreg: -28)
 <3><9b>: Abbrev Number: 6 (DW_TAG_subprogram)
    <9c>   DW_AT_name        : (indirect string, offset: 0x5a): main._omp_fn.0
    <a0>   DW_AT_prototyped  : 1
    <a0>   DW_AT_artificial  : 1
    <a0>   DW_AT_low_pc      : 0x400602
    <a8>   DW_AT_high_pc     : 0x21
    <b0>   DW_AT_frame_base  : 1 byte block: 9c         (DW_OP_call_frame_cfa)
    <b2>   DW_AT_GNU_all_tail_call_sites: 1
 <4><b2>: Abbrev Number: 7 (DW_TAG_formal_parameter)
    <b3>   DW_AT_type        : <0xe9>
    <b7>   DW_AT_artificial  : 1
    <b7>   DW_AT_location    : 2 byte block: 91 58      (DW_OP_fbreg: -40)
 <4><ba>: Abbrev Number: 3 (DW_TAG_variable)
    <bb>   DW_AT_name        : i1
    <be>   DW_AT_decl_file   : 1
    <bf>   DW_AT_decl_line   : 7
    <c0>   DW_AT_type        : <0xca>
    <c4>   DW_AT_location    : 2 byte block: 91 6c      (DW_OP_fbreg: -20)

This time, note that the lexical block for the for-loop is still nested
within main().  The DIE for main._omp_fn.0 is nested within that lexical
block.

This nesting enables GDB to find variables which ought to be in scope.

(Obviously, there's some extra work required for finding variables
which reside on the stack in a different thread.  That's handled in
patches to GDB and to libgomp.)

gcc/ChangeLog:
    
        * omp-low.c (create_omp_child_function): Set DECL_CONTEXT
        of child function to that of source function.
        * omp-expand.c (expand_parallel_call): Add child function
        to var chain in block from which child function originated.

Comments

Kevin Buettner May 5, 2017, 1:51 a.m. UTC | #1
Ahem...  I forgot to note that:

I have bootstrapped and regression tested my patch on x86_64-pc-linux-gnu.

Kevin

On Thu, 4 May 2017 17:45:51 -0700
Kevin Buettner <kevinb@redhat.com> wrote:

> Consider the following OpenMP program:
> 
>     void foo (int a1) {}
> 
>     int
>     main (void)
>     {
>       static int s1 = -41;
>       int i1 = 11, i2;
> 
>       for (i2 = 1; i2 <= 2; i2++)
>         {
>           int pass = i2;
>     #pragma omp parallel num_threads (2) firstprivate (i1)
>           {
>             foo (i1);
>           }
>           foo(pass);
>         }
>       foo (s1); foo (i2);
>     }
> 
> At the moment, when debugging such a program, GDB is not able to find
> and print the values of s1, i2, and pass.
> 
> My changes to omp-low.c and omp-expand.c, in conjunction with several
> other patches, allow GDB to find and print these values.
> 
> This is the current behavior when debugging in GDB:
> 
>     (gdb) b 14
>     Breakpoint 1 at 0x400617: file ex3.c, line 14.
>     (gdb) run
>     Starting program: /mesquite2/.ironwood2/omp-tests/k/ex3-trunk
>     [Thread debugging using libthread_db enabled]
>     Using host libthread_db library "/lib64/libthread_db.so.1".
>     [New Thread 0x7ffff73ca700 (LWP 32628)]
> 
>     Thread 1 "ex3-trunk" hit Breakpoint 1, main._omp_fn.0 () at ex3.c:14
>     14              foo (i1);
>     (gdb) p s1
>     No symbol "s1" in current context.
>     (gdb) p i1
>     $1 = 11
>     (gdb) p i2
>     No symbol "i2" in current context.
>     (gdb) p pass
>     No symbol "pass" in current context.
>     (gdb) c
>     Continuing.
>     [Switching to Thread 0x7ffff73ca700 (LWP 32628)]
> 
>     Thread 2 "ex3-trunk" hit Breakpoint 1, main._omp_fn.0 () at ex3.c:14
>     14              foo (i1);
>     (gdb) p s1
>     No symbol "s1" in current context.
>     (gdb) p i1
>     $2 = 11
>     (gdb) p i2
>     No symbol "i2" in current context.
>     (gdb) p pass
>     No symbol "pass" in current context.
>     (gdb) bt
>     #0  main._omp_fn.0 () at ex3.c:14
>     #1  0x00007ffff7bc4926 in gomp_thread_start (xdata=<optimized out>)
>     at gcc/libgomp/team.c:122
>     #2  0x00007ffff799761a in start_thread () from /lib64/libpthread.so.0
>     #3  0x00007ffff76d159d in clone () from /lib64/libc.so.6
> 
> Note that GDB is unable to find s1, i2, or pass for either thread.
> 
> I show the backtrace for thread 2 because it's the more difficult case
> to handle due to the stack trace stopping at clone().  The stack frame
> for main(), which is where the variables of interest reside, is not a
> part of this stack.
> 
> When we run this example using the patches associated with this change
> along with several other patches, GDB's behavior looks like this:
> 
>     (gdb) b 14
>     Breakpoint 1 at 0x400617: file ex3.c, line 14.
>     (gdb) run
>     Starting program: /mesquite2/.ironwood2/omp-tests/k/ex3-new
>     [Thread debugging using libthread_db enabled]
>     Using host libthread_db library "/lib64/libthread_db.so.1".
>     [New Thread 0x7ffff73ca700 (LWP 32643)]
> 
>     Thread 1 "ex3-new" hit Breakpoint 1, main._omp_fn.0 () at ex3.c:14
>     14              foo (i1);
>     (gdb) p s1
>     $1 = -41
>     (gdb) p i1
>     $2 = 11
>     (gdb) p i2
>     $3 = 1
>     (gdb) p pass
>     $4 = 1
>     (gdb) c
>     Continuing.
>     [Switching to Thread 0x7ffff73ca700 (LWP 32643)]
> 
>     Thread 2 "ex3-new" hit Breakpoint 1, main._omp_fn.0 () at ex3.c:14
>     14              foo (i1);
>     (gdb) p s1
>     $5 = -41
>     (gdb) p i1
>     $6 = 11
>     (gdb) p i2
>     $7 = 1
>     (gdb) p pass
>     $8 = 1
> 
> I didn't show the stack here.  It's the same as before.  (I would,
> however, like to be able to make GDB display a unified stack.)
> 
> Note that GDB is now able to find and print values for s1, i2, and
> pass.
> 
> GCC constructs a new function for executing the parallel code.
> The debugging information entry for this function is presently
> placed at the same level as that of the function in which the
> "#pragma omp parallel" directive appeared.
> 
> This is partial output from "readelf -v" for the (non-working) example
> shown above:
> 
>  <1><2d>: Abbrev Number: 2 (DW_TAG_subprogram)
>     <2e>   DW_AT_external    : 1
>     <2e>   DW_AT_name        : (indirect string, offset: 0x80): main
>     <32>   DW_AT_decl_file   : 1
>     <33>   DW_AT_decl_line   : 4
>     <34>   DW_AT_prototyped  : 1
>     <34>   DW_AT_type        : <0x9d>
>     <38>   DW_AT_low_pc      : 0x400591
>     <40>   DW_AT_high_pc     : 0x71
>     <48>   DW_AT_frame_base  : 1 byte block: 9c         (DW_OP_call_frame_cfa)
>     <4a>   DW_AT_GNU_all_tail_call_sites: 1
>     <4a>   DW_AT_sibling     : <0x9d>
>  <2><4e>: Abbrev Number: 3 (DW_TAG_variable)
>     <4f>   DW_AT_name        : s1
>     <52>   DW_AT_decl_file   : 1
>     <53>   DW_AT_decl_line   : 6
>     <54>   DW_AT_type        : <0x9d>
>     <58>   DW_AT_location    : 9 byte block: 3 30 10 60 0 0 0 0 0       (DW_OP_addr: 601030)
>  ...
>  <2><7c>: Abbrev Number: 4 (DW_TAG_lexical_block)
>     <7d>   DW_AT_low_pc      : 0x4005a9
>     <85>   DW_AT_high_pc     : 0x31
>  <3><8d>: Abbrev Number: 5 (DW_TAG_variable)
>     <8e>   DW_AT_name        : (indirect string, offset: 0x55): pass
>     <92>   DW_AT_decl_file   : 1
>     <93>   DW_AT_decl_line   : 11
>     <94>   DW_AT_type        : <0x9d>
>     <98>   DW_AT_location    : 2 byte block: 91 64      (DW_OP_fbreg: -28)
>  ...
>  <1><a4>: Abbrev Number: 7 (DW_TAG_subprogram)
>     <a5>   DW_AT_name        : (indirect string, offset: 0x5a): main._omp_fn.0
>     <a9>   DW_AT_prototyped  : 1
>     <a9>   DW_AT_artificial  : 1
>     <a9>   DW_AT_low_pc      : 0x400602
>     <b1>   DW_AT_high_pc     : 0x21
>     <b9>   DW_AT_frame_base  : 1 byte block: 9c         (DW_OP_call_frame_cfa)
>     <bb>   DW_AT_GNU_all_tail_call_sites: 1
>     <bb>   DW_AT_sibling     : <0xd5>
>  <2><bf>: Abbrev Number: 8 (DW_TAG_formal_parameter)
>     <c0>   DW_AT_type        : <0xed>
>     <c4>   DW_AT_artificial  : 1
>     <c4>   DW_AT_location    : 2 byte block: 91 58      (DW_OP_fbreg: -40)
>  ...
> 
> The nesting level is indicated by the leading number in angle
> brackets.  This shows that the DW_TAG_VARIABLE DIE for s1 is nested
> within the DIE for main's DW_TAG_subprogram DIE.  Likewise, the
> lexical block corresponding to the body of the for-loop is also nested
> within main().  However, the DIES for main and main._omp_fun.0 are
> both at level 1.
> 
> With my patches for omp-low.c and omp-expand.c, the readelf -w output
> looks like this instead:
> 
>  <1><2d>: Abbrev Number: 2 (DW_TAG_subprogram)
>     <2e>   DW_AT_external    : 1
>     <2e>   DW_AT_name        : (indirect string, offset: 0x80): main
>     <32>   DW_AT_decl_file   : 1
>     <33>   DW_AT_decl_line   : 4
>     <34>   DW_AT_prototyped  : 1
>     <34>   DW_AT_type        : <0xca>
>     <38>   DW_AT_low_pc      : 0x400591
>     <40>   DW_AT_high_pc     : 0x71
>     <48>   DW_AT_frame_base  : 1 byte block: 9c         (DW_OP_call_frame_cfa)
>     <4a>   DW_AT_GNU_all_tail_call_sites: 1
>     <4a>   DW_AT_sibling     : <0xca>
>  <2><4e>: Abbrev Number: 3 (DW_TAG_variable)
>     <4f>   DW_AT_name        : s1
>     <52>   DW_AT_decl_file   : 1
>     <53>   DW_AT_decl_line   : 6
>     <54>   DW_AT_type        : <0xca>
>     <58>   DW_AT_location    : 9 byte block: 3 30 10 60 0 0 0 0 0       (DW_OP_addr: 601030)
>  <2><62>: Abbrev Number: 3 (DW_TAG_variable)
>     <63>   DW_AT_name        : i1
>     <66>   DW_AT_decl_file   : 1
>     <67>   DW_AT_decl_line   : 7
>     <68>   DW_AT_type        : <0xca>
>     <6c>   DW_AT_location    : 2 byte block: 91 68      (DW_OP_fbreg: -24)
>  <2><6f>: Abbrev Number: 3 (DW_TAG_variable)
>     <70>   DW_AT_name        : i2
>     <73>   DW_AT_decl_file   : 1
>     <74>   DW_AT_decl_line   : 7
>     <75>   DW_AT_type        : <0xca>
>     <79>   DW_AT_location    : 2 byte block: 91 6c      (DW_OP_fbreg: -20)
>  <2><7c>: Abbrev Number: 4 (DW_TAG_lexical_block)
>     <7d>   DW_AT_low_pc      : 0x4005a9
>     <85>   DW_AT_high_pc     : 0x31
>  <3><8d>: Abbrev Number: 5 (DW_TAG_variable)
>     <8e>   DW_AT_name        : (indirect string, offset: 0x55): pass
>     <92>   DW_AT_decl_file   : 1
>     <93>   DW_AT_decl_line   : 11
>     <94>   DW_AT_type        : <0xca>
>     <98>   DW_AT_location    : 2 byte block: 91 64      (DW_OP_fbreg: -28)
>  <3><9b>: Abbrev Number: 6 (DW_TAG_subprogram)
>     <9c>   DW_AT_name        : (indirect string, offset: 0x5a): main._omp_fn.0
>     <a0>   DW_AT_prototyped  : 1
>     <a0>   DW_AT_artificial  : 1
>     <a0>   DW_AT_low_pc      : 0x400602
>     <a8>   DW_AT_high_pc     : 0x21
>     <b0>   DW_AT_frame_base  : 1 byte block: 9c         (DW_OP_call_frame_cfa)
>     <b2>   DW_AT_GNU_all_tail_call_sites: 1
>  <4><b2>: Abbrev Number: 7 (DW_TAG_formal_parameter)
>     <b3>   DW_AT_type        : <0xe9>
>     <b7>   DW_AT_artificial  : 1
>     <b7>   DW_AT_location    : 2 byte block: 91 58      (DW_OP_fbreg: -40)
>  <4><ba>: Abbrev Number: 3 (DW_TAG_variable)
>     <bb>   DW_AT_name        : i1
>     <be>   DW_AT_decl_file   : 1
>     <bf>   DW_AT_decl_line   : 7
>     <c0>   DW_AT_type        : <0xca>
>     <c4>   DW_AT_location    : 2 byte block: 91 6c      (DW_OP_fbreg: -20)
> 
> This time, note that the lexical block for the for-loop is still nested
> within main().  The DIE for main._omp_fn.0 is nested within that lexical
> block.
> 
> This nesting enables GDB to find variables which ought to be in scope.
> 
> (Obviously, there's some extra work required for finding variables
> which reside on the stack in a different thread.  That's handled in
> patches to GDB and to libgomp.)
> 
> gcc/ChangeLog:
>     
>         * omp-low.c (create_omp_child_function): Set DECL_CONTEXT
>         of child function to that of source function.
>         * omp-expand.c (expand_parallel_call): Add child function
>         to var chain in block from which child function originated.
> 
> diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
> index 5c48b78..7029951 100644
> --- a/gcc/omp-expand.c
> +++ b/gcc/omp-expand.c
> @@ -667,6 +667,25 @@ expand_parallel_call (struct omp_region *region, basic_block bb,
>    tree child_fndecl = gimple_omp_parallel_child_fn (entry_stmt);
>    t2 = build_fold_addr_expr (child_fndecl);
>  
> +  if (gimple_block (entry_stmt) != NULL_TREE
> +      && TREE_CODE (gimple_block (entry_stmt)) == BLOCK)
> +    {
> +      tree b = BLOCK_SUPERCONTEXT (gimple_block (entry_stmt));
> +
> +      /* Add child_fndecl to var chain of the supercontext of the
> +        block corresponding to entry_stmt.  This ensures that debug
> +        info for the outlined function will be emitted for the correct
> +        lexical scope.  */
> +      if (b != NULL_TREE && TREE_CODE (b) == BLOCK)
> +       {
> +         tree *tp;
> +
> +         for (tp = &BLOCK_VARS (b); *tp; tp = &DECL_CHAIN (*tp))
> +           ;
> +         *tp = child_fndecl;
> +        }
> +    }
> +
>    vec_alloc (args, 4 + vec_safe_length (ws_args));
>    args->quick_push (t2);
>    args->quick_push (t1);
> diff --git a/gcc/omp-low.c b/gcc/omp-low.c
> index 9cc2996..601d1b4 100644
> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -1626,7 +1626,7 @@ create_omp_child_function (omp_context *ctx, bool task_copy)
>    TREE_PUBLIC (decl) = 0;
>    DECL_UNINLINABLE (decl) = 1;
>    DECL_EXTERNAL (decl) = 0;
> -  DECL_CONTEXT (decl) = NULL_TREE;
> +  DECL_CONTEXT (decl) = ctx->cb.src_fn;
>    DECL_INITIAL (decl) = make_node (BLOCK);
>    BLOCK_SUPERCONTEXT (DECL_INITIAL (decl)) = decl;
>    if (omp_maybe_offloaded_ctx (ctx))
>
Alexander Monakov May 5, 2017, 11:23 a.m. UTC | #2
On Thu, 4 May 2017, Kevin Buettner wrote:
> diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
> index 5c48b78..7029951 100644
> --- a/gcc/omp-expand.c
> +++ b/gcc/omp-expand.c
> @@ -667,6 +667,25 @@ expand_parallel_call (struct omp_region *region, basic_block bb,

Outlined functions are also used for 'omp task' and 'omp target' regions, but
here only 'omp parallel' is handled. Will this code need to be duplicated for
those region types?

>    tree child_fndecl = gimple_omp_parallel_child_fn (entry_stmt);
>    t2 = build_fold_addr_expr (child_fndecl);
>  
> +  if (gimple_block (entry_stmt) != NULL_TREE
> +      && TREE_CODE (gimple_block (entry_stmt)) == BLOCK)

Here and also below, ...

> +    {
> +      tree b = BLOCK_SUPERCONTEXT (gimple_block (entry_stmt));
> +
> +      /* Add child_fndecl to var chain of the supercontext of the
> +        block corresponding to entry_stmt.  This ensures that debug
> +        info for the outlined function will be emitted for the correct
> +        lexical scope.  */
> +      if (b != NULL_TREE && TREE_CODE (b) == BLOCK)

... here, I'm curious why the conditionals are necessary -- I don't see why the
conditions can be sometimes true and sometimes false.  Sorry if I'm missing
something obvious.

Thanks.
Alexander
Kevin Buettner May 5, 2017, 5:23 p.m. UTC | #3
On Fri, 5 May 2017 14:23:14 +0300 (MSK)
Alexander Monakov <amonakov@ispras.ru> wrote:

> On Thu, 4 May 2017, Kevin Buettner wrote:
> > diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
> > index 5c48b78..7029951 100644
> > --- a/gcc/omp-expand.c
> > +++ b/gcc/omp-expand.c
> > @@ -667,6 +667,25 @@ expand_parallel_call (struct omp_region *region, basic_block bb,  
> 
> Outlined functions are also used for 'omp task' and 'omp target' regions, but
> here only 'omp parallel' is handled. Will this code need to be duplicated for
> those region types?

For 'omp task' and 'omp target', I think it's possible or even likely
that the original context which started these parallel tasks will no
longer exist.  So, it might not make sense to do something equivalent
for 'task' and 'target'.

That said, I have not yet given the matter much study.  There may be
cases where having scoped debug info might still prove useful.

The short answer is, "I don't know."

> >    tree child_fndecl = gimple_omp_parallel_child_fn (entry_stmt);
> >    t2 = build_fold_addr_expr (child_fndecl);
> >  
> > +  if (gimple_block (entry_stmt) != NULL_TREE
> > +      && TREE_CODE (gimple_block (entry_stmt)) == BLOCK)  
> 
> Here and also below, ...
> 
> > +    {
> > +      tree b = BLOCK_SUPERCONTEXT (gimple_block (entry_stmt));
> > +
> > +      /* Add child_fndecl to var chain of the supercontext of the
> > +        block corresponding to entry_stmt.  This ensures that debug
> > +        info for the outlined function will be emitted for the correct
> > +        lexical scope.  */
> > +      if (b != NULL_TREE && TREE_CODE (b) == BLOCK)  
> 
> ... here, I'm curious why the conditionals are necessary -- I don't see why the
> conditions can be sometimes true and sometimes false.  Sorry if I'm missing
> something obvious.

I'm not especially knowledgeable about gcc internals.  It may be the
case these conditionals that you noted are unnecessary and might
perhaps better be handled via the use of an assert.

I will note that when I originally coded it, I had fewer tests.  The code
still worked for the cases that I tried.  Later, when I reviewed it
for posting here, I decided to add some more checks.

I'll explain my reasoning for each of them...

> > +  if (gimple_block (entry_stmt) != NULL_TREE

If we have NULL_TREE here, dereferencing gimple_block (entry_stmt) further
won't work.

> > +      && TREE_CODE (gimple_block (entry_stmt)) == BLOCK)  

It seemed to me that a having a BLOCK is necessary in order to later
use BLOCK_SUPERCONTEXT.

> > +      if (b != NULL_TREE && TREE_CODE (b) == BLOCK)  

I check to make sure that b is a block so that I can later refer to
BLOCK_VARS (b).

Again, it may be the case that these should always evaluate to true.
If so, then use of an assert might be better here.

Kevin
Jakub Jelinek May 10, 2017, 3:24 p.m. UTC | #4
On Fri, May 05, 2017 at 10:23:59AM -0700, Kevin Buettner wrote:
> On Fri, 5 May 2017 14:23:14 +0300 (MSK)
> Alexander Monakov <amonakov@ispras.ru> wrote:
> 
> > On Thu, 4 May 2017, Kevin Buettner wrote:
> > > diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
> > > index 5c48b78..7029951 100644
> > > --- a/gcc/omp-expand.c
> > > +++ b/gcc/omp-expand.c
> > > @@ -667,6 +667,25 @@ expand_parallel_call (struct omp_region *region, basic_block bb,  
> > 
> > Outlined functions are also used for 'omp task' and 'omp target' regions, but
> > here only 'omp parallel' is handled. Will this code need to be duplicated for
> > those region types?
> 
> For 'omp task' and 'omp target', I think it's possible or even likely
> that the original context which started these parallel tasks will no
> longer exist.  So, it might not make sense to do something equivalent
> for 'task' and 'target'.

It depends.  E.g. for #pragma omp taskloop without nogroup clause, it acts the
same as #pragma omp parallel in the nesting regard, the GOMP_taskloop* function will
not return until all the tasks finished.  Or if you have #pragma omp task and 
#pragma omp taskwait on the next line, or #pragma omp taskgroup
around it, or #pragma omp target without nowait clause, it will behave the same.
Then there are cases where the encountering function will still be around,
but already not all the lexical scopes (or inline functions), e.g. if there
is #pragma omp taskwait or taskgroup etc. outside of the innermost lexical
scope(s), but still somewhere in the function.  What the debugger should do
in that case is that it should figure out that the spot the task has been
created in has passed, so not show vars in the lexical scopes already left,
but still show others?  Then of course if there is nothing waiting for the
task or async target in the current function, the function's frame could be
left, perhaps multiple callers too.

> > >    tree child_fndecl = gimple_omp_parallel_child_fn (entry_stmt);
> > >    t2 = build_fold_addr_expr (child_fndecl);
> > >  
> > > +  if (gimple_block (entry_stmt) != NULL_TREE
> > > +      && TREE_CODE (gimple_block (entry_stmt)) == BLOCK)  
> > 
> > Here and also below, ...
> > 
> > > +    {
> > > +      tree b = BLOCK_SUPERCONTEXT (gimple_block (entry_stmt));
> > > +
> > > +      /* Add child_fndecl to var chain of the supercontext of the
> > > +        block corresponding to entry_stmt.  This ensures that debug
> > > +        info for the outlined function will be emitted for the correct
> > > +        lexical scope.  */
> > > +      if (b != NULL_TREE && TREE_CODE (b) == BLOCK)  
> > 
> > ... here, I'm curious why the conditionals are necessary -- I don't see why the
> > conditions can be sometimes true and sometimes false.  Sorry if I'm missing
> > something obvious.

gimple_block can be NULL.  And, most calls of gimple_block that want to
ensure it is a BLOCK actually do verify it is a BLOCK, while it is unlikely
and it is usually just LTO that screws things up, I'd keep it.

> > > +      if (b != NULL_TREE && TREE_CODE (b) == BLOCK)  
> 
> I check to make sure that b is a block so that I can later refer to
> BLOCK_VARS (b).

I believe BLOCK_SUPERCONTEXT of a BLOCK should always be non-NULL, either
another BLOCK, or FUNCTION_DECL.  Thus I think b != NULL_TREE && is
redundant here.

What I don't like is that the patch is inconsistent, it sets DECL_CONTEXT
of the child function for all kinds of outlined functions, but then you just
choose one of the many places and add it into the BLOCK tree.  Any reason
why the DECL_CONTEXT change can't be done in a helper function together
with all the changes you've added into omp-expand.c, and then call it from
expand_omp_parallel (with the child_fn and entry_stmt arguments) so that
you can call it easily also for other constructs, either now or later on?
Also, is there any rationale on appending the FUNCTION_DECL to BLOCK_VARS
instead of prepending it there (which is cheaper)?  Does the debugger
care about relative order of those artificial functions vs. other
variables in the lexical scope?

	Jakub
Richard Biener May 11, 2017, 7:31 a.m. UTC | #5
On Wed, May 10, 2017 at 5:24 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Fri, May 05, 2017 at 10:23:59AM -0700, Kevin Buettner wrote:
>> On Fri, 5 May 2017 14:23:14 +0300 (MSK)
>> Alexander Monakov <amonakov@ispras.ru> wrote:
>>
>> > On Thu, 4 May 2017, Kevin Buettner wrote:
>> > > diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
>> > > index 5c48b78..7029951 100644
>> > > --- a/gcc/omp-expand.c
>> > > +++ b/gcc/omp-expand.c
>> > > @@ -667,6 +667,25 @@ expand_parallel_call (struct omp_region *region, basic_block bb,
>> >
>> > Outlined functions are also used for 'omp task' and 'omp target' regions, but
>> > here only 'omp parallel' is handled. Will this code need to be duplicated for
>> > those region types?
>>
>> For 'omp task' and 'omp target', I think it's possible or even likely
>> that the original context which started these parallel tasks will no
>> longer exist.  So, it might not make sense to do something equivalent
>> for 'task' and 'target'.
>
> It depends.  E.g. for #pragma omp taskloop without nogroup clause, it acts the
> same as #pragma omp parallel in the nesting regard, the GOMP_taskloop* function will
> not return until all the tasks finished.  Or if you have #pragma omp task and
> #pragma omp taskwait on the next line, or #pragma omp taskgroup
> around it, or #pragma omp target without nowait clause, it will behave the same.
> Then there are cases where the encountering function will still be around,
> but already not all the lexical scopes (or inline functions), e.g. if there
> is #pragma omp taskwait or taskgroup etc. outside of the innermost lexical
> scope(s), but still somewhere in the function.  What the debugger should do
> in that case is that it should figure out that the spot the task has been
> created in has passed, so not show vars in the lexical scopes already left,
> but still show others?  Then of course if there is nothing waiting for the
> task or async target in the current function, the function's frame could be
> left, perhaps multiple callers too.
>
>> > >    tree child_fndecl = gimple_omp_parallel_child_fn (entry_stmt);
>> > >    t2 = build_fold_addr_expr (child_fndecl);
>> > >
>> > > +  if (gimple_block (entry_stmt) != NULL_TREE
>> > > +      && TREE_CODE (gimple_block (entry_stmt)) == BLOCK)
>> >
>> > Here and also below, ...
>> >
>> > > +    {
>> > > +      tree b = BLOCK_SUPERCONTEXT (gimple_block (entry_stmt));
>> > > +
>> > > +      /* Add child_fndecl to var chain of the supercontext of the
>> > > +        block corresponding to entry_stmt.  This ensures that debug
>> > > +        info for the outlined function will be emitted for the correct
>> > > +        lexical scope.  */
>> > > +      if (b != NULL_TREE && TREE_CODE (b) == BLOCK)
>> >
>> > ... here, I'm curious why the conditionals are necessary -- I don't see why the
>> > conditions can be sometimes true and sometimes false.  Sorry if I'm missing
>> > something obvious.
>
> gimple_block can be NULL.  And, most calls of gimple_block that want to
> ensure it is a BLOCK actually do verify it is a BLOCK, while it is unlikely
> and it is usually just LTO that screws things up, I'd keep it.

gimple_block should always be either NULL or a BLOCK.  It's *_CONTEXT that
can be types, decls or blocks.

>> > > +      if (b != NULL_TREE && TREE_CODE (b) == BLOCK)
>>
>> I check to make sure that b is a block so that I can later refer to
>> BLOCK_VARS (b).
>
> I believe BLOCK_SUPERCONTEXT of a BLOCK should always be non-NULL, either
> another BLOCK, or FUNCTION_DECL.  Thus I think b != NULL_TREE && is
> redundant here.

Yes.

>
> What I don't like is that the patch is inconsistent, it sets DECL_CONTEXT
> of the child function for all kinds of outlined functions, but then you just
> choose one of the many places and add it into the BLOCK tree.  Any reason
> why the DECL_CONTEXT change can't be done in a helper function together
> with all the changes you've added into omp-expand.c, and then call it from
> expand_omp_parallel (with the child_fn and entry_stmt arguments) so that
> you can call it easily also for other constructs, either now or later on?
> Also, is there any rationale on appending the FUNCTION_DECL to BLOCK_VARS
> instead of prepending it there (which is cheaper)?  Does the debugger
> care about relative order of those artificial functions vs. other
> variables in the lexical scope?
>
>         Jakub
diff mbox

Patch

diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
index 5c48b78..7029951 100644
--- a/gcc/omp-expand.c
+++ b/gcc/omp-expand.c
@@ -667,6 +667,25 @@  expand_parallel_call (struct omp_region *region, basic_block bb,
   tree child_fndecl = gimple_omp_parallel_child_fn (entry_stmt);
   t2 = build_fold_addr_expr (child_fndecl);
 
+  if (gimple_block (entry_stmt) != NULL_TREE
+      && TREE_CODE (gimple_block (entry_stmt)) == BLOCK)
+    {
+      tree b = BLOCK_SUPERCONTEXT (gimple_block (entry_stmt));
+
+      /* Add child_fndecl to var chain of the supercontext of the
+        block corresponding to entry_stmt.  This ensures that debug
+        info for the outlined function will be emitted for the correct
+        lexical scope.  */
+      if (b != NULL_TREE && TREE_CODE (b) == BLOCK)
+       {
+         tree *tp;
+
+         for (tp = &BLOCK_VARS (b); *tp; tp = &DECL_CHAIN (*tp))
+           ;
+         *tp = child_fndecl;
+        }
+    }
+
   vec_alloc (args, 4 + vec_safe_length (ws_args));
   args->quick_push (t2);
   args->quick_push (t1);
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 9cc2996..601d1b4 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -1626,7 +1626,7 @@  create_omp_child_function (omp_context *ctx, bool task_copy)
   TREE_PUBLIC (decl) = 0;
   DECL_UNINLINABLE (decl) = 1;
   DECL_EXTERNAL (decl) = 0;
-  DECL_CONTEXT (decl) = NULL_TREE;
+  DECL_CONTEXT (decl) = ctx->cb.src_fn;
   DECL_INITIAL (decl) = make_node (BLOCK);
   BLOCK_SUPERCONTEXT (DECL_INITIAL (decl)) = decl;
   if (omp_maybe_offloaded_ctx (ctx))