[fortran] Load scalar intent-in variables at the beginning of procedures
diff mbox series

Message ID 48286910-ebbb-10e4-488b-8c96e505375c@tkoenig.net
State New
Headers show
Series
  • [fortran] Load scalar intent-in variables at the beginning of procedures
Related show

Commit Message

Thomas König Nov. 11, 2019, 9:55 p.m. UTC
Hello world,

the attached patch loads scalar INTENT(IN) variables to a local
variable at the start of a procedure, as suggested in PR 67202, in
order to aid optimization.  This is controlled by front-end
optimization so it is easier to catch if any bugs should turn up :-)

This is done to make optimization by the middle-end easier.

I left in the parts for debugging that I added for this patch.
Seeing the difference between EXEC_INIT_ASSIGN and EXEC_ASSIGN was
particularly instructive.

Regression-tested. OK for trunk?

Regards

	Thomas

Comments

Thomas Koenig Nov. 11, 2019, 9:56 p.m. UTC | #1
Am 11.11.19 um 22:55 schrieb Thomas König:
> Regression-tested. OK for trunk?

Of course, better with a ChangeLog entry.

2019-11-11  Thomas Koenig  <tkoenig@gcc.gnu.org>

	PR fortran/67202
	* dump-parse-tree.c (debug): Add for gfc_namespace.
	(show_code_node): Add INIT_ on dumping EXEC_INIT_ASSIGN.
	* frontent-passes.c (replace_intent_in): Add prototype.  New function.
	(optimize_namespace): Call it.
	(sym_replacement): New struct.
	(replace_symbol_in_expr): New function.

2019-11-11  Thomas Koenig  <tkoenig@gcc.gnu.org>

	PR fortran/67202
	* gfortran.dg/intent_optimize_3.f90: New test.
Janne Blomqvist Nov. 11, 2019, 10:41 p.m. UTC | #2
On Mon, Nov 11, 2019 at 11:56 PM Thomas König <tk@tkoenig.net> wrote:
>
> Hello world,
>
> the attached patch loads scalar INTENT(IN) variables to a local
> variable at the start of a procedure, as suggested in PR 67202, in
> order to aid optimization.  This is controlled by front-end
> optimization so it is easier to catch if any bugs should turn up :-)
>
> This is done to make optimization by the middle-end easier.
>
> I left in the parts for debugging that I added for this patch.
> Seeing the difference between EXEC_INIT_ASSIGN and EXEC_ASSIGN was
> particularly instructive.
>
> Regression-tested. OK for trunk?

Wouldn't it be even better to pass scalar intent(in) variables by
value? The obvious objection of course is ABI, but for procedures with
an explicit interface we're not following any particular ABI anyways?
Thomas König Nov. 11, 2019, 10:53 p.m. UTC | #3
Hi Janne,

> Wouldn't it be even better to pass scalar intent(in) variables by
> value? The obvious objection of course is ABI, but for procedures with
> an explicit interface we're not following any particular ABI anyways?

The problem with that is that we don't know when we compile a procedure
if it will be called with an explicit interface or not.

The user can always add an interface block for a stand-alone procedure.

Regards

	Thomas
Janne Blomqvist Nov. 12, 2019, 7:46 a.m. UTC | #4
On Tue, Nov 12, 2019 at 12:53 AM Thomas König <tk@tkoenig.net> wrote:
>
> Hi Janne,
>
> > Wouldn't it be even better to pass scalar intent(in) variables by
> > value? The obvious objection of course is ABI, but for procedures with
> > an explicit interface we're not following any particular ABI anyways?
>
> The problem with that is that we don't know when we compile a procedure
> if it will be called with an explicit interface or not.
>
> The user can always add an interface block for a stand-alone procedure.

Ah, of course. I should have said module procedures. Or even module
procedures without bind(C)?

That being said, I've seen examples where people have figured out the
symbol mangling and are calling module procedures directly from C, so
will breaking such code (even if not officially supported) be an
issue?
Thomas König Nov. 12, 2019, 12:42 p.m. UTC | #5
Hi Janne,

> Ah, of course. I should have said module procedures. Or even module
> procedures without bind(C)?

It would probably be the latter.

The change would actually be rather small: If conditions are met, just add attr.value for INTENT(IN).

This is something we should probably do when we are forced into doing an ABI change by other circumstances.

Regards, Thomad
Tobias Burnus Nov. 12, 2019, 2:21 p.m. UTC | #6
Hi Thomas,

On 11/12/19 1:42 PM, Thomas König wrote:
>> Ah, of course. I should have said module procedures. Or even module procedures without bind(C)?
> It would probably be the latter. The change would actually be rather small: If conditions are met, just add attr.value for INTENT(IN). This is something we should probably do when we are forced into doing an ABI change by other circumstances.

Will this still work if one does:

module m
contains
integer function val(y)
   integer, intent(in) :: y
   val = 2*y
end function val
end module m

use m
interface
   integer function proc(z)
     integer, intent(in) :: z
   end function proc
end interface
procedure(proc), pointer :: ff
ff => val
print *, ff(10)
end

Tobias
Thomas König Nov. 12, 2019, 5:04 p.m. UTC | #7
Hi Tobias,

> On 11/12/19 1:42 PM, Thomas König wrote:
>>> Ah, of course. I should have said module procedures. Or even module procedures without bind(C)?
>> It would probably be the latter. The change would actually be rather small: If conditions are met, just add attr.value for INTENT(IN). This is something we should probably do when we are forced into doing an ABI change by other circumstances.
> 
> Will this still work if one does:
> 
> module m
> contains
> integer function val(y)
>  integer, intent(in) :: y
>  val = 2*y
> end function val
> end module m
> 
> use m
> interface
>  integer function proc(z)
>    integer, intent(in) :: z
>  end function proc
> end interface
> procedure(proc), pointer :: ff
> ff => val
> print *, ff(10)
> end

You are right, it would not work. So, scratch that idea. Maybe we should commit this as a test case so nobody gets funny ideas in two year‘s time 😉

So, I think we can then discuss the original patch.

Regards

Thomas
Tobias Burnus Nov. 15, 2019, 7:30 a.m. UTC | #8
Hi Thomas,

On 11/11/19 10:55 PM, Thomas König wrote:
> the attached patch loads scalar INTENT(IN) variables to a local
> variable at the start of a procedure, as suggested in PR 67202, in
> order to aid optimization.  This is controlled by front-end
> optimization so it is easier to catch if any bugs should turn up :-)

> +      if (f->sym == NULL || f->sym->attr.dimension || f->sym->attr.allocatable
> +	  || f->sym->attr.optional || f->sym->attr.pointer
> +	  || f->sym->attr.codimension || f->sym->attr.value
> +	  || f->sym->attr.proc_pointer || f->sym->attr.target
> +	  || f->sym->attr.asynchronous
> +	  || f->sym->ts.type == BT_CHARACTER || f->sym->ts.type == BT_DERIVED
> +	  || f->sym->ts.type == BT_CLASS)
> +	continue;
I think you need to add at least VOLATILE to this list – otherwise, I 
have not thought much about corner cases nor have studied the patch, sorry.

Cheers,

Tobias
Thomas König Nov. 15, 2019, 6:06 p.m. UTC | #9
Hi Tobias,

> I think you need to add at least VOLATILE to this list

Yes, I'll do that.

Any other comments?

Regards

	Thomas
Thomas Koenig Nov. 16, 2019, 8:33 p.m. UTC | #10
Hello world,

here is an update to the patch.

I have now included variables where the user did not specify INTENT(IN)
by checking that the dummy variables to be replaced by temporaries
are not, indeed, assigned a value. This also includes being passed
as an actual argument to a non-INTENT(IN) dummy argument.

Extending this led to being able to catch a few more bugs.

I have addes one test case to check where the new temporaries are added.

Regression-tested. The only change I see in the testsuite now is

XPASS: gfortran.dg/goacc/kernels-loop-n.f95   -O   scan-tree-dump-times 
parloops1 "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc 
function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1

So, OK for trunk?

Regards

	Thomas

2019-11-11  Thomas Koenig  <tkoenig@gcc.gnu.org>

	PR fortran/67202
	* dump-parse-tree.c (debug): Add for gfc_namespace.
	(show_code_node): Add INIT_ on dumping EXEC_INIT_ASSIGN.
	* frontent-passes.c (replace_intent_in): Add prototype.  New
	function.
	(optimize_namespace): Call it.
	(sym_replacement): New struct.
	(defined_code_callback): New function.
	(defined_expr_callback): New function.
	(replace_symbol_in_expr): New function.

2019-11-11  Thomas Koenig  <tkoenig@gcc.gnu.org>

	PR fortran/67202
	* gfortran.dg/intent_optimize_3.f90: New test.
	* gfortran.dg/intent_optimize_4.f90: New test.
	* gfortran.dg/pr26246_2.f90: Add -fno-frontend-optimize to flags.
Bernhard Reutner-Fischer Nov. 19, 2019, 10:39 a.m. UTC | #11
On 16 November 2019 21:33:58 CET, Thomas Koenig <tkoenig@netcologne.de> wrote:
>Hello world,
>
>here is an update to the patch.

+	  char name[GFC_MAX_SYMBOL_LEN + 1];
+	  snprintf (name, GFC_MAX_SYMBOL_LEN, "__dummy_%d_%s", var_num++,
+		    f->sym->name);
+
+	  if (gfc_get_sym_tree (name, ns, &symtree, false) != 0)

(I) you should + sizeof(__dummy__) + 10 for unsigned long %d or the like.
(II) s/__dummy/__intent_in/ for clarity?

thanks,
Thomas Koenig Nov. 19, 2019, 10:54 p.m. UTC | #12
Am 19.11.19 um 11:39 schrieb Bernhard Reutner-Fischer:
> +	  char name[GFC_MAX_SYMBOL_LEN + 1];
> +	  snprintf (name, GFC_MAX_SYMBOL_LEN, "__dummy_%d_%s", var_num++,
> +		    f->sym->name);
> +
> +	  if (gfc_get_sym_tree (name, ns, &symtree, false) != 0)
> 
> (I) you should + sizeof(__dummy__) + 10 for unsigned long %d or the like.

GFC_MAX_SYMBOL_LEN is the maximum length of a gfortran symbol. AFAIK, it
is not possible to use a longer symbol name than that, so it needs to be
truncated. Uniqueness of the variable name is guaranteed by the var_num
variable.

If the user puts dummy arguments Supercalifragilisticexpialidociousa and
Supercalifragilisticexpialidociousb into the argument list of a
procedure, he will have to look at the numbers to differentiate them :-)

> (II) s/__dummy/__intent_in/ for clarity?

It's moved away a bit from INTENT(IN) now, because an argument which
cannot be modified (even by passing to a procedure with a dummy argument
with unknown intent) is now also handled.

Regards

	Thomas
Bernhard Reutner-Fischer Nov. 20, 2019, 5:59 p.m. UTC | #13
On 19 November 2019 23:54:55 CET, Thomas Koenig <tkoenig@netcologne.de> wrote:
>Am 19.11.19 um 11:39 schrieb Bernhard Reutner-Fischer:
>> +	  char name[GFC_MAX_SYMBOL_LEN + 1];
>> +	  snprintf (name, GFC_MAX_SYMBOL_LEN, "__dummy_%d_%s", var_num++,
>> +		    f->sym->name);
>> +
>> +	  if (gfc_get_sym_tree (name, ns, &symtree, false) != 0)
>> 
>> (I) you should + sizeof(__dummy__) + 10 for unsigned long %d or the
>like.
>
>GFC_MAX_SYMBOL_LEN is the maximum length of a gfortran symbol. AFAIK,

This is only true for user-provided names in the source code.

Think e.g. class names as can be seen in the dumps..

>it
>is not possible to use a longer symbol name than that, so it needs to
>be
>truncated. Uniqueness of the variable name is guaranteed by the var_num
>variable.
>
>If the user puts dummy arguments Supercalifragilisticexpialidociousa
>and
>Supercalifragilisticexpialidociousb into the argument list of a
>procedure, he will have to look at the numbers to differentiate them
>:-)
>
>> (II) s/__dummy/__intent_in/ for clarity?
>
>It's moved away a bit from INTENT(IN) now, because an argument which
>cannot be modified (even by passing to a procedure with a dummy
>argument
>with unknown intent) is now also handled.

So maybe __readonly_ , __rdonly_, __rd_or the like? dummy is really misleading a name in the dumps..

thanks,
Janne Blomqvist Nov. 20, 2019, 8:38 p.m. UTC | #14
On Wed, Nov 20, 2019 at 8:00 PM Bernhard Reutner-Fischer
<rep.dot.nop@gmail.com> wrote:
>
> On 19 November 2019 23:54:55 CET, Thomas Koenig <tkoenig@netcologne.de> wrote:
> >Am 19.11.19 um 11:39 schrieb Bernhard Reutner-Fischer:
> >> +      char name[GFC_MAX_SYMBOL_LEN + 1];
> >> +      snprintf (name, GFC_MAX_SYMBOL_LEN, "__dummy_%d_%s", var_num++,
> >> +                f->sym->name);
> >> +
> >> +      if (gfc_get_sym_tree (name, ns, &symtree, false) != 0)
> >>
> >> (I) you should + sizeof(__dummy__) + 10 for unsigned long %d or the
> >like.
> >
> >GFC_MAX_SYMBOL_LEN is the maximum length of a gfortran symbol. AFAIK,
>
> This is only true for user-provided names in the source code.
>
> Think e.g. class names as can be seen in the dumps..

We have GFC_MAX_MANGLED_SYMBOL_LEN for that. *Insert my standard pet
peeve rant that we should use heap allocated unlimited length strings
for these rather than copying stack allocated strings around, or
preferable a symbol table*

> >it
> >is not possible to use a longer symbol name than that, so it needs to
> >be
> >truncated. Uniqueness of the variable name is guaranteed by the var_num
> >variable.
> >
> >If the user puts dummy arguments Supercalifragilisticexpialidociousa
> >and
> >Supercalifragilisticexpialidociousb into the argument list of a
> >procedure, he will have to look at the numbers to differentiate them
> >:-)
> >
> >> (II) s/__dummy/__intent_in/ for clarity?
> >
> >It's moved away a bit from INTENT(IN) now, because an argument which
> >cannot be modified (even by passing to a procedure with a dummy
> >argument
> >with unknown intent) is now also handled.
>
> So maybe __readonly_ , __rdonly_, __rd_or the like? dummy is really misleading a name in the dumps..

Well, dummy is a term with a precise definition in the Fortran
standard, so in a way it's good so one realizes it has something to do
with a dummy argument. But yes, it's a bit misleading because it's not
the dummy argument itself but rather a dereferenced copy of it. I
suggest __readonly_dereferenced_dummy_copy_yes_this_is_a_really_long_name_.
:)
Janne Blomqvist Nov. 20, 2019, 8:45 p.m. UTC | #15
On Sat, Nov 16, 2019 at 10:34 PM Thomas Koenig <tkoenig@netcologne.de> wrote:
>
> Hello world,
>
> here is an update to the patch.
>
> I have now included variables where the user did not specify INTENT(IN)
> by checking that the dummy variables to be replaced by temporaries
> are not, indeed, assigned a value. This also includes being passed
> as an actual argument to a non-INTENT(IN) dummy argument.
>
> Extending this led to being able to catch a few more bugs.
>
> I have addes one test case to check where the new temporaries are added.
>
> Regression-tested. The only change I see in the testsuite now is
>
> XPASS: gfortran.dg/goacc/kernels-loop-n.f95   -O   scan-tree-dump-times
> parloops1 "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc
> function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1

BTW, since this is done for the purpose of optimization, have you done
testing on some suitable benchmark suite such as polyhedron, whether
it a) generates any different code b) does it make it go faster?

Is there a risk of performance regressions due to higher register pressure?
Steve Kargl Nov. 20, 2019, 9:03 p.m. UTC | #16
On Wed, Nov 20, 2019 at 10:38:30PM +0200, Janne Blomqvist wrote:
> On Wed, Nov 20, 2019 at 8:00 PM Bernhard Reutner-Fischer
> <rep.dot.nop@gmail.com> wrote:
> >
> > On 19 November 2019 23:54:55 CET, Thomas Koenig <tkoenig@netcologne.de> wrote:
> > >Am 19.11.19 um 11:39 schrieb Bernhard Reutner-Fischer:
> > >> +      char name[GFC_MAX_SYMBOL_LEN + 1];
> > >> +      snprintf (name, GFC_MAX_SYMBOL_LEN, "__dummy_%d_%s", var_num++,
> > >> +                f->sym->name);
> > >> +
> > >> +      if (gfc_get_sym_tree (name, ns, &symtree, false) != 0)
> > >>
> > >> (I) you should + sizeof(__dummy__) + 10 for unsigned long %d or the
> > >like.
> > >
> > >GFC_MAX_SYMBOL_LEN is the maximum length of a gfortran symbol. AFAIK,
> >
> > This is only true for user-provided names in the source code.
> >
> > Think e.g. class names as can be seen in the dumps..
> 
> We have GFC_MAX_MANGLED_SYMBOL_LEN for that. *Insert my standard pet
> peeve rant that we should use heap allocated unlimited length strings
> for these rather than copying stack allocated strings around, or
> preferable a symbol table*
> 

I agree with Janne.  This seems like a very good candidate
for someone that would like to contribute to gfortran, but
does not know where to start.  Any lurkers on the mailing list
care to give it shot?
Bernhard Reutner-Fischer Nov. 20, 2019, 9:32 p.m. UTC | #17
On Wed, 20 Nov 2019 22:38:30 +0200
Janne Blomqvist <blomqvist.janne@gmail.com> wrote:

> On Wed, Nov 20, 2019 at 8:00 PM Bernhard Reutner-Fischer
> <rep.dot.nop@gmail.com> wrote:
> >
> > On 19 November 2019 23:54:55 CET, Thomas Koenig <tkoenig@netcologne.de> wrote:  
> > >Am 19.11.19 um 11:39 schrieb Bernhard Reutner-Fischer:  
> > >> +      char name[GFC_MAX_SYMBOL_LEN + 1];
> > >> +      snprintf (name, GFC_MAX_SYMBOL_LEN, "__dummy_%d_%s", var_num++,
> > >> +                f->sym->name);
> > >> +
> > >> +      if (gfc_get_sym_tree (name, ns, &symtree, false) != 0)
> > >>
> > >> (I) you should + sizeof(__dummy__) + 10 for unsigned long %d or the  
> > >like.
> > >
> > >GFC_MAX_SYMBOL_LEN is the maximum length of a gfortran symbol. AFAIK,  
> >
> > This is only true for user-provided names in the source code.
> >
> > Think e.g. class names as can be seen in the dumps..  
> 
> We have GFC_MAX_MANGLED_SYMBOL_LEN for that. *Insert my standard pet
> peeve rant that we should use heap allocated unlimited length strings
> for these rather than copying stack allocated strings around, or
> preferable a symbol table*

yea, which i started to lay grounds to address that in
https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/aldot/fortran-fe-stringpool
about a year ago ;) Reminds me: i had to change the symbol names that
are persisted in module-files to make it work; Still not sure if that's
acceptable so if somebody would be willing to lend me a hand for
sanity-checking that aspect of the series i'd be glad. Would certainly
help to trick me into continuing the thing now, during winter.

Looks like i've another memory leak plug lying around on that tree that
i didn't try to push yet; It's the hunk in gfc_release_symbol() in the
attached brain-dump i think, don't remember and should revisit to
have it fixed for good i suppose..

> 
> > >it
> > >is not possible to use a longer symbol name than that, so it needs to
> > >be
> > >truncated. Uniqueness of the variable name is guaranteed by the var_num
> > >variable.
> > >
> > >If the user puts dummy arguments Supercalifragilisticexpialidociousa
> > >and
> > >Supercalifragilisticexpialidociousb into the argument list of a
> > >procedure, he will have to look at the numbers to differentiate them
> > >:-)
> > >  
> > >> (II) s/__dummy/__intent_in/ for clarity?  
> > >
> > >It's moved away a bit from INTENT(IN) now, because an argument which
> > >cannot be modified (even by passing to a procedure with a dummy
> > >argument
> > >with unknown intent) is now also handled.  
> >
> > So maybe __readonly_ , __rdonly_, __rd_or the like? dummy is really misleading a name in the dumps..  
> 
> Well, dummy is a term with a precise definition in the Fortran
> standard, so in a way it's good so one realizes it has something to do
> with a dummy argument. But yes, it's a bit misleading because it's not
> the dummy argument itself but rather a dereferenced copy of it. I
> suggest __readonly_dereferenced_dummy_copy_yes_this_is_a_really_long_name_.
> :)

:) __rodummy_ then?

but bikeshedding either way, so, Thomas, please go for __dummy_ short of
sensible alternatives.

cheers,
Thomas König Nov. 20, 2019, 9:35 p.m. UTC | #18
Am 20.11.19 um 21:45 schrieb Janne Blomqvist:
> BTW, since this is done for the purpose of optimization, have you done
> testing on some suitable benchmark suite such as polyhedron, whether
> it a) generates any different code b) does it make it go faster?

I haven't run any actual benchmarks.

However, there is a simple example which shows its advantages.
Consider

       subroutine foo(n,m)
       m = 0
       do 100 i=1,100
         call bar
         m = m + n
  100  continue
       end

(I used old-style DO loops just because :-)

Without the optimization, the inner loop is translated to

.L2:
         xorl    %eax, %eax
         call    bar_
         movl    (%r12), %eax
         addl    %eax, 0(%rbp)
         subl    $1, %ebx
         jne     .L2

and with the optimization to

.L2:
         xorl    %eax, %eax
         call    bar_
         addl    %r12d, 0(%rbp)
         subl    $1, %ebx
         jne     .L2

so the load of the address is missing.  (Why do we zero %eax
before each call? It should not be a variadic call right?)

Of course, Fortran language rules specify that the call to bar
cannot do anything to n, but apparently we do not tell the gcc
middle end that this is the case, or maybe that there is
no way to really specify this.

(Actually, I just tried out

       subroutine foo(n,m)
       integer :: dummy_n, dummy_m
       dummy_n = n
       dummy_m = 0
       do 100 i=1,100
          call bar
          dummy_m = dummy_m + dummy_n
  100  continue
       m = dummy_m
       end

This is optimized even further:

.L2:
         xorl    %eax, %eax
         call    bar_
         subl    $1, %ebx
         jne     .L2
         imull   $100, %r12d, %r12d

So, a copy in / copy out for variables where we can not be sure that
no value is assigned?  Does anybody see a downside for that?)

> Is there a risk of performance regressions due to higher register pressure?

I don't think so. Either the compiler realizes that it can
keep the variable in a register (then it makes no difference),
or it has to load it fresh from its address (then there is
one additional register needed).

Regards

	Thomas
Janne Blomqvist Nov. 20, 2019, 10:18 p.m. UTC | #19
On Wed, Nov 20, 2019 at 11:35 PM Thomas König <tk@tkoenig.net> wrote:
>
> Am 20.11.19 um 21:45 schrieb Janne Blomqvist:
> > BTW, since this is done for the purpose of optimization, have you done
> > testing on some suitable benchmark suite such as polyhedron, whether
> > it a) generates any different code b) does it make it go faster?
>
> I haven't run any actual benchmarks.
>
> However, there is a simple example which shows its advantages.
> Consider
>
>        subroutine foo(n,m)
>        m = 0
>        do 100 i=1,100
>          call bar
>          m = m + n
>   100  continue
>        end
>
> (I used old-style DO loops just because :-)
>
> Without the optimization, the inner loop is translated to
>
> .L2:
>          xorl    %eax, %eax
>          call    bar_
>          movl    (%r12), %eax
>          addl    %eax, 0(%rbp)
>          subl    $1, %ebx
>          jne     .L2
>
> and with the optimization to
>
> .L2:
>          xorl    %eax, %eax
>          call    bar_
>          addl    %r12d, 0(%rbp)
>          subl    $1, %ebx
>          jne     .L2
>
> so the load of the address is missing.  (Why do we zero %eax
> before each call? It should not be a variadic call right?)

Not sure. Maybe some belt and suspenders thing? I guess someone better
versed in ABI minutiae knows better. It's not Fortran-specific though,
the C frontend does the same when calling a void function.

AFAIK on reasonably current OoO CPU's xor'ing a register with itself
is handled by the renamer and doesn't consume an execute slot, so it's
in effect a zero-cycle instruction. Still bloats the code slightly,
though.

> Of course, Fortran language rules specify that the call to bar
> cannot do anything to n

Hmm, does it? What about the following modification to your testcase:

module nmod
  integer :: n
end module nmod

subroutine foo(n,m)
  m = 0
  do 100 i=1,100
     call bar
     m = m + n
100  continue
end subroutine foo

subroutine bar()
  use nmod
  n = 0
end subroutine bar

program main
  use nmod
  implicit none
  integer :: m
  n = 1
  m = 0
  call foo(n, m)
  print *, m
end program main


> So, a copy in / copy out for variables where we can not be sure that
> no value is assigned?  Does anybody see a downside for that?)

In principle sounds good, unless my concerns above are real and affect
this case too.

> > Is there a risk of performance regressions due to higher register pressure?
>
> I don't think so. Either the compiler realizes that it can
> keep the variable in a register (then it makes no difference),
> or it has to load it fresh from its address (then there is
> one additional register needed).

Yes, true. Good point.
Tobias Burnus Nov. 20, 2019, 10:19 p.m. UTC | #20
On 11/20/19 10:35 PM, Thomas König wrote:

> I haven't run any actual benchmarks.
I think it would be interesting – I think there can be quite some 
advantages in some cases, while in most cases, it is not really noticable.
> Of course, Fortran language rules specify that the call to bar cannot 
> do anything to n, but apparently we do not tell the gcc middle end 
> that this is the case, or maybe that there is no way to really specify 
> this.

To my knowledge, the ME does not support the Fortran semantics but 
assumes that once a pointer escapes in one procedure call, any other 
procedure might have access to it and modify it as well. This matches 
(effectively) the Fortran semantics of asynchronous. See also PR 49733.

The only thing we can do currently is to specify the intent via 'fn 
spec' which helps a bit – but not if the pointer has already escaped.

Maybe we should think again about handling this properly in the ME.

Cheers,

Tobias
Janne Blomqvist Nov. 20, 2019, 10:31 p.m. UTC | #21
On Thu, Nov 21, 2019 at 12:18 AM Janne Blomqvist
<blomqvist.janne@gmail.com> wrote:
>
> On Wed, Nov 20, 2019 at 11:35 PM Thomas König <tk@tkoenig.net> wrote:
> > (Why do we zero %eax
> > before each call? It should not be a variadic call right?)
>
> Not sure. Maybe some belt and suspenders thing? I guess someone better
> versed in ABI minutiae knows better. It's not Fortran-specific though,
> the C frontend does the same when calling a void function.

Ah, scratch that, it is some varargs-thing, I had forgot that a C
function with no arguments or lacking a prototype is considered a
varargs. The code

void foo();
void bar(void);

void testfoo()
{
foo();
}

void testbar()
{
bar();
}

void testunprototyped()
{
baz();
}


generates code (elided scaffolding):

testfoo:
    xorl %eax, %eax
    jmp foo
testbar:
    jmp bar
testunprototyped:
    xorl %eax, %eax
    jmp baz


So probably this is due to the Fortran procedures lacking an interface
being considered varargs by the caller.  Starts to smell like some
leftover from PR 87689?
Tobias Burnus Nov. 20, 2019, 10:32 p.m. UTC | #22
On 11/20/19 11:18 PM, Janne Blomqvist wrote:

>> Of course, Fortran language rules specify that the call to bar
>> cannot do anything to n
> Hmm, does it? What about the following modification to your testcase:

This code violates (quote from F2018):

"15.5.2.13  Restrictions on entities associated with dummy arguments"
"While an entity is associated with a dummy argument, the following 
restrictions hold."
"[…] (3)   Action that affects the value of the entity or any subobject 
of it shall be taken only through the dummy argument unless"
"(a) the dummy argument has the POINTER attribute, […]"
(Some more related to TARGET attribute and to coarrays, which also do 
not apply here.)

Not listed there, but the asynchronous attribute (Section 8.5.4) would 
be also a way to disable the optimization we are talking about.

Cheers,

Tobias

> module nmod
>    integer :: n
> end module nmod
>
> subroutine foo(n,m)
>    m = 0
>    do 100 i=1,100
>       call bar
>       m = m + n
> 100  continue
> end subroutine foo
>
> subroutine bar()
>    use nmod
>    n = 0
> end subroutine bar
>
> program main
>    use nmod
>    implicit none
>    integer :: m
>    n = 1
>    m = 0
>    call foo(n, m)
>    print *, m
> end program main
>
>
>> So, a copy in / copy out for variables where we can not be sure that
>> no value is assigned?  Does anybody see a downside for that?)
> In principle sounds good, unless my concerns above are real and affect
> this case too.
>
>>> Is there a risk of performance regressions due to higher register pressure?
>> I don't think so. Either the compiler realizes that it can
>> keep the variable in a register (then it makes no difference),
>> or it has to load it fresh from its address (then there is
>> one additional register needed).
> Yes, true. Good point.
>
>
Thomas König Nov. 20, 2019, 10:37 p.m. UTC | #23
Am 20.11.19 um 23:18 schrieb Janne Blomqvist:
> On Wed, Nov 20, 2019 at 11:35 PM Thomas König <tk@tkoenig.net> wrote:
>>
>> Am 20.11.19 um 21:45 schrieb Janne Blomqvist:
>>> BTW, since this is done for the purpose of optimization, have you done
>>> testing on some suitable benchmark suite such as polyhedron, whether
>>> it a) generates any different code b) does it make it go faster?
>>
>> I haven't run any actual benchmarks.
>>
>> However, there is a simple example which shows its advantages.
>> Consider
>>
>>         subroutine foo(n,m)
>>         m = 0
>>         do 100 i=1,100
>>           call bar
>>           m = m + n
>>    100  continue
>>         end
>>
>> (I used old-style DO loops just because :-)
>>
>> Without the optimization, the inner loop is translated to
>>
>> .L2:
>>           xorl    %eax, %eax
>>           call    bar_
>>           movl    (%r12), %eax
>>           addl    %eax, 0(%rbp)
>>           subl    $1, %ebx
>>           jne     .L2
>>
>> and with the optimization to
>>
>> .L2:
>>           xorl    %eax, %eax
>>           call    bar_
>>           addl    %r12d, 0(%rbp)
>>           subl    $1, %ebx
>>           jne     .L2
>>
>> so the load of the address is missing.  (Why do we zero %eax
>> before each call? It should not be a variadic call right?)
> 
> Not sure. Maybe some belt and suspenders thing? I guess someone better
> versed in ABI minutiae knows better. It's not Fortran-specific though,
> the C frontend does the same when calling a void function.

OK, so considering your other e-mail, this is a separate issue that
we can fix another time.

> AFAIK on reasonably current OoO CPU's xor'ing a register with itself
> is handled by the renamer and doesn't consume an execute slot, so it's
> in effect a zero-cycle instruction. Still bloats the code slightly,
> though.
> 
>> Of course, Fortran language rules specify that the call to bar
>> cannot do anything to n
> 
> Hmm, does it? What about the following modification to your testcase:
> 
> module nmod
>    integer :: n
> end module nmod
> 
> subroutine foo(n,m)
>    m = 0
>    do 100 i=1,100
>       call bar
>       m = m + n
> 100  continue
> end subroutine foo
> 
> subroutine bar()
>    use nmod
>    n = 0
> end subroutine bar
> 
> program main
>    use nmod
>    implicit none
>    integer :: m
>    n = 1
>    m = 0
>    call foo(n, m)
>    print *, m
> end program main

That is not allowed:

# 15.5.2.13  Restrictions on entities associated with dummy arguments

[...]

# (3) Action that affects the value of the entity or any subobject of it
# shall be taken only through the dummy argument unless

[none of the restrictions apply].

> 
>> So, a copy in / copy out for variables where we can not be sure that
>> no value is assigned?  Does anybody see a downside for that?)
> 
> In principle sounds good, unless my concerns above are real and affect
> this case too.

So, how to proceed?  Commit the patch with the maximum length for a
mangled symbol, and then maybe try for the copy-out variant in a
follow-up patch?

I agree with Tobias that dealing with this in the middle end is probably
the right thing to do in the long run (especially since we could also
handle arrays and structs this way). Until we get around to doing this
(gcc 11 at earliest), we could still profit somewhat from this
optimization in the meantime.

Regards

	Thomas
Janne Blomqvist Nov. 21, 2019, 9:21 a.m. UTC | #24
On Thu, Nov 21, 2019 at 12:31 AM Janne Blomqvist
<blomqvist.janne@gmail.com> wrote:
>
> On Thu, Nov 21, 2019 at 12:18 AM Janne Blomqvist
> <blomqvist.janne@gmail.com> wrote:
> >
> > On Wed, Nov 20, 2019 at 11:35 PM Thomas König <tk@tkoenig.net> wrote:
> > > (Why do we zero %eax
> > > before each call? It should not be a variadic call right?)
> >
> > Not sure. Maybe some belt and suspenders thing? I guess someone better
> > versed in ABI minutiae knows better. It's not Fortran-specific though,
> > the C frontend does the same when calling a void function.
>
> Ah, scratch that, it is some varargs-thing, I had forgot that a C
> function with no arguments or lacking a prototype is considered a
> varargs. The code
>
> void foo();
> void bar(void);
>
> void testfoo()
> {
> foo();
> }
>
> void testbar()
> {
> bar();
> }
>
> void testunprototyped()
> {
> baz();
> }
>
>
> generates code (elided scaffolding):
>
> testfoo:
>     xorl %eax, %eax
>     jmp foo
> testbar:
>     jmp bar
> testunprototyped:
>     xorl %eax, %eax
>     jmp baz
>
>
> So probably this is due to the Fortran procedures lacking an interface
> being considered varargs by the caller.  Starts to smell like some
> leftover from PR 87689?

Thinking some more about it, I'm thinking we should leave it as is,
even though it's a (small) code bloat. So Fortran calling a procedure
with an implicit interface most closely resembles the C unprototyped
function call. The problem is that we don't know much about the
callee. What if a Fortran procedure calls a C varargs procedure? In
the Fortran caller, we don't know whether the callee is varargs or
not, but if we don't zero %eax all hell could break loose if it
happened to be a varargs function. Yes, we could hide behind Fortran
not supporting varargs, and it's all the users fault, but is it really
worth it? I'd say no.

(in some cases zeroing a register is used to tell the OoO hw to kill a
dependency, so it can be beneficial for performance even if not
strictly required for correctness)
Tobias Burnus Nov. 21, 2019, 9:35 a.m. UTC | #25
On 11/20/19 10:35 PM, Thomas König wrote:
>> Is there a risk of performance regressions due to higher register 
>> pressure?

richi points out (on IRC) that ideally LTO IPA opts would promote the 
call-by reference to call-by value – but is not sure that it indeed 
happens. [In any case, Linux distros have started to compile packages 
with LTO.]

One could try and see whether that indeed happens. – Still, I think the 
real solution is to teach the middle end about the Fortran semantics.

Cheers,

Tobia
Richard Biener Nov. 21, 2019, 12:16 p.m. UTC | #26
On Thu, Nov 21, 2019 at 10:35 AM Tobias Burnus <tobias@codesourcery.com> wrote:
>
> On 11/20/19 10:35 PM, Thomas König wrote:
> >> Is there a risk of performance regressions due to higher register
> >> pressure?
>
> richi points out (on IRC) that ideally LTO IPA opts would promote the
> call-by reference to call-by value – but is not sure that it indeed
> happens. [In any case, Linux distros have started to compile packages
> with LTO.]
>
> One could try and see whether that indeed happens. – Still, I think the
> real solution is to teach the middle end about the Fortran semantics.

OK, so I found it, it's handled via SSA_NAME_POINTS_TO_READONLY_MEMORY
which is initialized during the rewrite into SSA form from the information
given by the "fn spec" attribute:

      for (tree arg = DECL_ARGUMENTS (cfun->decl);
           arg; arg = DECL_CHAIN (arg), ++i)
        {
          if (i >= (unsigned) TREE_STRING_LENGTH (fnspec))
            break;
          if (TREE_STRING_POINTER (fnspec)[i]  == 'R'
              || TREE_STRING_POINTER (fnspec)[i] == 'r')

so when the frontend sets "fn spec" from the intent it should already work.
But the examples I saw above didn't use INTENT(IN) for the scalar parameters.

Richard.

> Cheers,
>
> Tobia
>
>
Tobias Burnus Nov. 21, 2019, 1:16 p.m. UTC | #27
Hi Richard,

On 11/21/19 1:16 PM, Richard Biener wrote:
> OK, so I found it, it's handled via SSA_NAME_POINTS_TO_READONLY_MEMORY 
> which is initialized during the rewrite into SSA form from the 
> information given by the "fn spec" attribute […] so when the frontend 
> sets "fn spec" from the intent it should already work.

Which I can confirm for the following made-up example:

real function foo(n)
   implicit none (type)
   integer, intent(in) :: n
   integer :: i
   foo = 0.5
   if (n /= 0) return
   call bar()
   do i = 1, n
     foo = foo + sin(foo)
   end do
end

The optimized dump shows the following, i.e. GCC nicely optimizes both the loop and the 'sin' call away:

foo (integer(kind=4) & restrict n)
{
   integer(kind=4) _1;
   <bb 2> [local count: 241635843]:
   _1 = *n_9(D);
   if (_1 != 0)
     goto <bb 4>; [51.12%]
   else
     goto <bb 3>; [48.88%]

   <bb 3> [local count: 118111600]:
   bar ();

   <bb 4> [local count: 241635844]:
   return 5.0e-1;
}

I think this optimization permits some crucial optimizations.
I have not fully followed the later versions of the patch whether
they exploit some additional language semantics to permit optimizations
even without intent(in), but the initial discussion started with intent(in).

> But the examples I saw above didn't use INTENT(IN) for the scalar 
> parameters.

I concur that a well-written program should make use of intent(in) where 
sensible.

There are cases, which could be optimized likewise – but based on the 
case that the pointer address cannot escape instead of just assuming 
that the argument cannot change. – The question is how to convey this to 
the middle end.

I wonder whether there is a 'fn attr' which can tell that the first 
argument of 'print_i' does not cause the address of 'i' escape. If so, 
one could mark all procedure arguments such – unless they have the 
pointer, target or asynchronous attribute or are coarrays. [Probably 
needs some fine tuning.]

In this example, variable values do change, but only in a controlled way 
('print_i' could change it, 'bar' cannot). The semantic is also mainly a 
property of the (local) variable (or dummy argument) declaration and not 
of the functions which are called – although, if 'i' has no target 
attribute, print_i's argument cannot have neither – hence, one could 
generated a function interface

real function foo(i, y)
   implicit none (type)
   integer :: i
   real :: y
   foo = 0.0
   call print_i(i)
   i = 5
   call bar()  ! < this prevents optimizing the sin(y) away
   if (i == 5) return
   foo = sin(y)
end

Fortran semantics implies that 'i' can only change after the 'i = 5' if: 
'i' has the TARGET (or POINTER) attribute. Or it is possible if 'i' has 
the ASYNCHRONOUS or VOLATILE attribute – or it is a coarray (where a 
remote image can modify the local value).

For asynchronous, it would be something like "call read_i(i); call 
wait()" which is structurally the same as above.

TARGET: "An object without the TARGET attribute shall not have a pointer 
associated with it."
VOLATILE: "may be referenced, defined, or become undefined, by 
means20not specified by the program"
ASYNCHRONOUS: "An entity with the ASYNCHRONOUS attribute is a variable, 
and may be subject to asynchronous input/output or asynchronous 
communication."

Tobias
Tobias Burnus Nov. 21, 2019, 1:28 p.m. UTC | #28
Hi Richard, hi all,

On 11/21/19 2:16 PM, Tobias Burnus wrote:
> I wonder whether there is a 'fn attr' which can tell that the first 
> argument of 'print_i' does not cause the address of 'i' escape.

I think one needs two attributes, one which tells that the address of 
the object itself does not escape. I think that covers all scalars and, 
hence, things which can then easily optimized.

And another one which tells (for a struct) that this address and all 
pointers in that struct do not escape – or something like that. Here, I 
am many thinking of arrays with array descriptors. There one has a 
struct consisting of the meta data (array bounds) and of one element 
called 'data' which points to the actual array data. There might be 
fewer optimization possibilities in general (as keeping track of a whole 
array is more involved), but there should be still some important ones.

Cheers,

Tobias
Richard Biener Nov. 21, 2019, 2:09 p.m. UTC | #29
On Thu, Nov 21, 2019 at 2:16 PM Tobias Burnus <tobias@codesourcery.com> wrote:
>
> Hi Richard,
>
> On 11/21/19 1:16 PM, Richard Biener wrote:
> > OK, so I found it, it's handled via SSA_NAME_POINTS_TO_READONLY_MEMORY
> > which is initialized during the rewrite into SSA form from the
> > information given by the "fn spec" attribute […] so when the frontend
> > sets "fn spec" from the intent it should already work.
>
> Which I can confirm for the following made-up example:
>
> real function foo(n)
>    implicit none (type)
>    integer, intent(in) :: n
>    integer :: i
>    foo = 0.5
>    if (n /= 0) return
>    call bar()
>    do i = 1, n
>      foo = foo + sin(foo)
>    end do
> end
>
> The optimized dump shows the following, i.e. GCC nicely optimizes both the loop and the 'sin' call away:
>
> foo (integer(kind=4) & restrict n)
> {
>    integer(kind=4) _1;
>    <bb 2> [local count: 241635843]:
>    _1 = *n_9(D);
>    if (_1 != 0)
>      goto <bb 4>; [51.12%]
>    else
>      goto <bb 3>; [48.88%]
>
>    <bb 3> [local count: 118111600]:
>    bar ();
>
>    <bb 4> [local count: 241635844]:
>    return 5.0e-1;
> }
>
> I think this optimization permits some crucial optimizations.
> I have not fully followed the later versions of the patch whether
> they exploit some additional language semantics to permit optimizations
> even without intent(in), but the initial discussion started with intent(in).
>
> > But the examples I saw above didn't use INTENT(IN) for the scalar
> > parameters.
>
> I concur that a well-written program should make use of intent(in) where
> sensible.
>
> There are cases, which could be optimized likewise – but based on the
> case that the pointer address cannot escape instead of just assuming
> that the argument cannot change. – The question is how to convey this to
> the middle end.
>
> I wonder whether there is a 'fn attr' which can tell that the first
> argument of 'print_i' does not cause the address of 'i' escape. If so,
> one could mark all procedure arguments such – unless they have the
> pointer, target or asynchronous attribute or are coarrays. [Probably
> needs some fine tuning.]
>
> In this example, variable values do change, but only in a controlled way
> ('print_i' could change it, 'bar' cannot). The semantic is also mainly a
> property of the (local) variable (or dummy argument) declaration and not
> of the functions which are called – although, if 'i' has no target
> attribute, print_i's argument cannot have neither – hence, one could
> generated a function interface
>
> real function foo(i, y)
>    implicit none (type)
>    integer :: i
>    real :: y
>    foo = 0.0
>    call print_i(i)
>    i = 5
>    call bar()  ! < this prevents optimizing the sin(y) away
>    if (i == 5) return
>    foo = sin(y)
> end
>
> Fortran semantics implies that 'i' can only change after the 'i = 5' if:
> 'i' has the TARGET (or POINTER) attribute. Or it is possible if 'i' has
> the ASYNCHRONOUS or VOLATILE attribute – or it is a coarray (where a
> remote image can modify the local value).

So I think what you'd need to do is make 'i' marked as TYPE_RESTRICT
then.  I think we don't yet handle it but it means that bar() may not
modify 'i' but via 'i' (but it doesn't get 'i' as a parameter).

> For asynchronous, it would be something like "call read_i(i); call
> wait()" which is structurally the same as above.
>
> TARGET: "An object without the TARGET attribute shall not have a pointer
> associated with it."
> VOLATILE: "may be referenced, defined, or become undefined, by
> means20not specified by the program"
> ASYNCHRONOUS: "An entity with the ASYNCHRONOUS attribute is a variable,
> and may be subject to asynchronous input/output or asynchronous
> communication."

I think for GIMPLE everything not obviously on the stack is ASYNCHRONOUS.

Richard.

> Tobias
>
Tobias Burnus Nov. 21, 2019, 2:36 p.m. UTC | #30
Hi Richard,

On 11/21/19 3:09 PM, Richard Biener wrote:
> So I think what you'd need to do is make 'i' marked as TYPE_RESTRICT 
> then. I think we don't yet handle it but it means that bar() may not 
> modify 'i' but via 'i' (but it doesn't get 'i' as a parameter). 

Okay, that would be then the attached patch. – I can confirm that it 
does not work.

Tobias
Tobias Burnus Nov. 22, 2019, 10:17 a.m. UTC | #31
I have now filled https://gcc.gnu.org/PR92628 as ME bug to track this 
(i.e. honor TYPE_RESTRICT for pointer-escape analysis).

Cheers,

Tobias

On 11/21/19 3:36 PM, Tobias Burnus wrote:
> On 11/21/19 3:09 PM, Richard Biener wrote:
>> So I think what you'd need to do is make 'i' marked as TYPE_RESTRICT 
>> then. I think we don't yet handle it but it means that bar() may not 
>> modify 'i' but via 'i' (but it doesn't get 'i' as a parameter). 
> Okay, that would be then the attached patch. – I can confirm that it 
> does not work.

Patch
diff mbox series

Index: dump-parse-tree.c
===================================================================
--- dump-parse-tree.c	(Revision 278025)
+++ dump-parse-tree.c	(Arbeitskopie)
@@ -57,6 +57,15 @@  static void show_attr (symbol_attribute *, const c
 /* Allow dumping of an expression in the debugger.  */
 void gfc_debug_expr (gfc_expr *);
 
+void debug (gfc_namespace *ns)
+{
+  FILE *tmp = dumpfile;
+  dumpfile = stderr;
+  show_namespace (ns);
+  fputc ('\n', dumpfile);
+  dumpfile = tmp;
+}
+
 void debug (symbol_attribute *attr)
 {
   FILE *tmp = dumpfile;
@@ -1889,6 +1898,9 @@  show_code_node (int level, gfc_code *c)
       break;
 
     case EXEC_INIT_ASSIGN:
+      fputs ("INIT_", dumpfile);
+      /* Fallthrough */
+
     case EXEC_ASSIGN:
       fputs ("ASSIGN ", dumpfile);
       show_expr (c->expr1);
Index: frontend-passes.c
===================================================================
--- frontend-passes.c	(Revision 278025)
+++ frontend-passes.c	(Arbeitskopie)
@@ -57,6 +57,7 @@  static int call_external_blas (gfc_code **, int *,
 static int matmul_temp_args (gfc_code **, int *,void *data);
 static int index_interchange (gfc_code **, int*, void *);
 static bool is_fe_temp (gfc_expr *e);
+static void replace_intent_in (gfc_namespace *);
 
 #ifdef CHECKING_P
 static void check_locus (gfc_namespace *);
@@ -1467,6 +1468,7 @@  optimize_namespace (gfc_namespace *ns)
 
   if (flag_frontend_optimize)
     {
+      replace_intent_in (ns);
       gfc_code_walker (&ns->code, simplify_io_impl_do, dummy_expr_callback, NULL);
       gfc_code_walker (&ns->code, convert_do_while, dummy_expr_callback, NULL);
       gfc_code_walker (&ns->code, convert_elseif, dummy_expr_callback, NULL);
@@ -5503,3 +5505,132 @@  gfc_check_externals (gfc_namespace *ns)
 
   gfc_errors_to_warnings (false);
 }
+
+/*  For scalar INTENT(IN) variables or for variables where we know
+    their value is not changed, we can replace them by an auxiliary
+    variable whose value is set on procedure entry.  */
+
+typedef struct sym_replacement
+{
+  gfc_symbol *original;
+  gfc_symtree *replacement_symtree;
+  bool referenced;
+
+} sym_replace;
+
+/* Callback function - replace expression if possible, and set
+   sr->referenced if this was done (so we know we need to generate
+   the assignment statement).  */
+
+static int
+replace_symbol_in_expr (gfc_expr **e, int *walk_subtrees ATTRIBUTE_UNUSED,
+			void *data)
+{
+  gfc_expr *expr = *e;
+  sym_replacement *sr;
+
+  if (expr->expr_type != EXPR_VARIABLE || expr->symtree == NULL)
+    return 0;
+
+  sr = (sym_replacement *) data;
+
+  if (expr->symtree->n.sym == sr->original)
+    {
+      expr->symtree = sr->replacement_symtree;
+      sr->referenced = true;
+    }
+
+  return 0;
+}
+
+/* Replace INTENT(IN) scalar variables by assigning their values to
+   temporary variables.  We really only want to use this for the
+   simplest cases, all the fancy stuff is excluded.  */
+
+static void
+replace_intent_in (gfc_namespace *ns)
+{
+  gfc_formal_arglist *f;
+  gfc_namespace *ns_c;
+
+  if (ns == NULL || ns->proc_name == NULL || gfc_elemental (ns->proc_name)
+      || ns->proc_name->attr.entry_master)
+    return;
+
+  for (f = ns->proc_name->formal; f; f = f->next)
+    {
+      if (f->sym == NULL || f->sym->attr.dimension || f->sym->attr.allocatable
+	  || f->sym->attr.optional || f->sym->attr.pointer
+	  || f->sym->attr.codimension || f->sym->attr.value
+	  || f->sym->attr.proc_pointer || f->sym->attr.target
+	  || f->sym->attr.asynchronous
+	  || f->sym->ts.type == BT_CHARACTER || f->sym->ts.type == BT_DERIVED
+	  || f->sym->ts.type == BT_CLASS)
+	continue;
+
+      /* TODO: It could also be possible to check if the variable can
+	 actually not be changed by appearing in a variable
+	 definition context or by being passed as an argument to a
+	 procedure where it could be changed.  */
+
+      if (f->sym->attr.intent == INTENT_IN)
+	{
+	  gfc_symtree *symtree;
+	  gfc_symbol *replacement;
+	  sym_replace sr;
+
+	  char name[GFC_MAX_SYMBOL_LEN + 1];
+	  snprintf (name, GFC_MAX_SYMBOL_LEN, "__dummy_%d_%s", var_num++,
+		    f->sym->name);
+
+	  if (gfc_get_sym_tree (name, ns, &symtree, false) != 0)
+	    gcc_unreachable ();
+
+	  replacement = symtree->n.sym;
+	  replacement->ts = f->sym->ts;
+	  replacement->attr.flavor = FL_VARIABLE;
+	  replacement->attr.fe_temp = 1;
+	  replacement->attr.referenced = 1;
+	  replacement->declared_at = f->sym->declared_at;
+	  gfc_commit_symbol (replacement);
+
+	  sr.original = f->sym;
+	  sr.replacement_symtree = symtree;
+	  sr.referenced = false;
+
+	  gfc_code_walker (&ns->code, gfc_dummy_code_callback,
+			   replace_symbol_in_expr, (void *) &sr);
+
+	  for (ns_c = ns->contained; ns_c != NULL; ns_c = ns_c->sibling)
+	    gfc_code_walker (&ns_c->code, gfc_dummy_code_callback,
+			     replace_symbol_in_expr, (void *) &sr);
+
+	  if (sr.referenced)
+	    {
+	      gfc_code *n;
+	      gfc_symtree *formal_symtree;
+	      gfc_code **c;
+
+	      /* Generate statement __tmp_42_foo = foo .   */
+	      n = XCNEW (gfc_code);
+	      n->op = EXEC_ASSIGN;
+	      n->expr1 = gfc_lval_expr_from_sym (replacement);
+	      n->expr1->where = f->sym->declared_at;
+	      formal_symtree = gfc_find_symtree (ns->sym_root, f->sym->name);
+	      n->expr2 = gfc_get_variable_expr (formal_symtree);
+	      n->expr2->where = f->sym->declared_at;
+	      n->loc = f->sym->declared_at;
+
+	      /* Put this statement after the initialization
+		 assignment statements.  */
+	      
+	      for (c = &ns->code; *c != NULL && (*c)->op == EXEC_INIT_ASSIGN;
+		   c = &(*c)->next)
+		;
+
+	      n->next = (*c);
+	      (*c) = n;
+	    }
+	}
+    }
+}