diff mbox series

[v2] rs6000: Stackoverflow in optimized code on PPC [PR100799]

Message ID 8e8dad73-43a6-4764-a496-b600e6a220e1@linux.ibm.com
State New
Headers show
Series [v2] rs6000: Stackoverflow in optimized code on PPC [PR100799] | expand

Commit Message

Ajit Agarwal March 22, 2024, 10:15 a.m. UTC
Hello All:

This is version-2 of the patch with review comments addressed.

When using FlexiBLAS with OpenBLAS we noticed corruption of
the parameters passed to OpenBLAS functions. FlexiBLAS
basically provides a BLAS interface where each function
is a stub that forwards the arguments to a real BLAS lib,
like OpenBLAS.

Fixes the corruption of caller frame checking number of
arguments is less than equal to GP_ARG_NUM_REG (8)
excluding hidden unused DECLS.

Bootstrapped and regtested for powerpc64-linux.gnu.

Thanks & Regards
Ajit


rs6000: Stackoverflow in optimized code on PPC [PR100799]

When using FlexiBLAS with OpenBLAS we noticed corruption of
the parameters passed to OpenBLAS functions. FlexiBLAS
basically provides a BLAS interface where each function
is a stub that forwards the arguments to a real BLAS lib,
like OpenBLAS.

Fixes the corruption of caller frame checking number of
arguments is less than equal to GP_ARG_NUM_REG (8)
excluding hidden unused DECLS.

2024-03-22  Ajit Kumar Agarwal  <aagarwa1@linux.ibm.com>

gcc/ChangeLog:

	PR rtk-optimization/100799
	* config/rs6000/rs6000-calls.cc (rs6000_function_arg): Don't
	generate parameter save area if number of arguments passed
	less than equal to GP_ARG_NUM_REG (8) excluding hidden
	parameter.
	(init_cumulative_args): Check for hidden parameter in fortran
	routine and set the flag hidden_string_length and actual
	parameter passed excluding hidden unused DECLS.
	* config/rs6000/rs6000.h (rs6000_args): Add new field
	hidden_string_length and actual_parm_length.
---
 gcc/config/rs6000/rs6000-call.cc | 36 ++++++++++++++++++++++++++++++--
 gcc/config/rs6000/rs6000.h       |  7 +++++++
 2 files changed, 41 insertions(+), 2 deletions(-)

Comments

Peter Bergner March 23, 2024, 4:37 a.m. UTC | #1
On 3/22/24 5:15 AM, Ajit Agarwal wrote:
> When using FlexiBLAS with OpenBLAS we noticed corruption of
> the parameters passed to OpenBLAS functions. FlexiBLAS
> basically provides a BLAS interface where each function
> is a stub that forwards the arguments to a real BLAS lib,
> like OpenBLAS.
> 
> Fixes the corruption of caller frame checking number of
> arguments is less than equal to GP_ARG_NUM_REG (8)
> excluding hidden unused DECLS.

I think the git log entry commentary could be a little more descriptive
of what the problem is. How about something like the following?

  When using FlexiBLAS with OpenBLAS, we noticed corruption of the caller
  stack frame when calling OpenBLAS functions.  This was caused by the
  FlexiBLAS C/C++ caller and OpenBLAS Fortran callee disagreeing on the
  number of function parameters in the callee due to hidden Fortran
  parameters. This can cause problems when the callee believes the caller
  has allocated a parameter save area when the caller has not done so.
  That means any writes by the callee into the non-existent parameter save
  area will corrupt the caller stack frame.

  The workaround implemented here, is for the callee to determine whether
  the caller has allocated a parameter save area or not, by ignoring any
  unused hidden parameters when counting the number of parameters.



> 	PR rtk-optimization/100799

s/rtk/rtl/



> 	* config/rs6000/rs6000-calls.cc (rs6000_function_arg): Don't
> 	generate parameter save area if number of arguments passed
> 	less than equal to GP_ARG_NUM_REG (8) excluding hidden
> 	parameter.

The callee doesn't generate or allocate the parameter save area, the
caller does.  The code here is for the callee trying to determine
whether the caller has done so.  How about saying the following instead?

  Don't assume a parameter save area has been allocated if the number of
  formal parameters, excluding unused hidden parameters, is less than or
  equal to GP_ARG_NUM_REG (8).




> 	(init_cumulative_args): Check for hidden parameter in fortran
> 	routine and set the flag hidden_string_length and actual
> 	parameter passed excluding hidden unused DECLS.

Check for unused hidden Fortran parameters and set hidden_string_length
and actual_parm_length.


> +  /* When the buggy C/C++ wrappers call the function with fewer arguments
> +     than it actually has and doesn't expect the parameter save area on the
> +     caller side because of that while the callee expects it and the callee
> +     actually stores something in the parameter save area, it corrupts
> +     whatever is in the caller stack frame at that location.  */

The wrapper/caller is the one that allocates the parameter save area, so
saying "...doesn't expect the parameter save area on the caller side..."
doesn't make sense, since it knows whether it allocated it or not.
How about saying something like the following instead?

  Check whether this function contains any unused hidden parameters and
  record how many there are for use in rs6000_function_arg() to determine
  whether its callers have allocated a parameter save area or not.
  See PR100799 for details.



> +  unsigned int num_args = 0;
> +  unsigned int hidden_length = 0;
> +
> +  for (tree arg = DECL_ARGUMENTS (current_function_decl);
> +       arg; arg = DECL_CHAIN (arg))
> +    {
> +      num_args++;
> +      if (DECL_HIDDEN_STRING_LENGTH (arg))
> +	{
> +	  tree parmdef = ssa_default_def (cfun, arg);
> +	  if (parmdef == NULL || has_zero_uses (parmdef))
> +	    {
> +	      cum->hidden_string_length = 1;
> +	      hidden_length++;
> +	    }
> +	}
> +   }
> +
> +  cum->actual_parm_length = num_args - hidden_length;

This code looks fine, but do we really need two new fields in rs6000_args?
Can't we just get along with only cum->actual_parm_length by modifying
the rs6000_function_arg() change from:

> +      else if (align_words < GP_ARG_NUM_REG
> +	       || (cum->hidden_string_length
> +	       && cum->actual_parm_length <= GP_ARG_NUM_REG))

to:

+      else if (align_words < GP_ARG_NUM_REG
+	       || cum->actual_parm_length <= GP_ARG_NUM_REG)

???

That said, I have a further comment below on what happens here when 
align_words >= GP_ARG_NUM_REG and cum->actual_parm_length <= GP_ARG_NUM_REG.




> +     /* When the buggy C/C++ wrappers call the function with fewer arguments
> +	than it actually has and doesn't expect the parameter save area on the
> +	caller side because of that while the callee expects it and the callee
> +	actually stores something in the parameter save area, it corrupts
> +	whatever is in the caller stack frame at that location.  */

Same comment as before, so same problem with the comment, but the following
change...

> -      else if (align_words < GP_ARG_NUM_REG)
> +      else if (align_words < GP_ARG_NUM_REG
> +	       || (cum->hidden_string_length
> +	       && cum->actual_parm_length <= GP_ARG_NUM_REG))
        {
          if (TARGET_32BIT && TARGET_POWERPC64)
            return rs6000_mixed_function_arg (mode, type, align_words);

          return gen_rtx_REG (mode, GP_ARG_MIN_REG + align_words);
        }
      else
        return NULL_RTX;

The old code for the unused hidden parameter (which was the 9th param) would
fall thru to the "return NULL_RTX;" which would make the callee assume there
was a parameter save area allocated.  Now instead, we'll return a reg rtx,
probably of r11 (r3 thru r10 are our param regs) and I'm guessing we'll now
see a copy of r11 into a pseudo like we do for the other param regs.
Is that a problem? Given it's an unused parameter, it'll probably get deleted
as dead code, but could it cause any issues?  What if we have more than one
unused hidden parameter and we return r12 and r13 which have specific uses
in our ABIs (eg, r13 is our TCB pointer), so it may not actually look dead.
Have you verified what the callee RTL looks like after expand for these
unused hidden parameters?  Is there a rtx we can return that isn't a NULL_RTX
which triggers the assumption of a parameter save area, but isn't a reg rtx
which might lead to some rtl being generated?  Would a (const_int 0) or
something else work?



> +  /* Actual parameter length ignoring hidden parameter.
> +     This is done to C++ wrapper calling fortran procedures
> +     which has hidden parameter that are not used.  */

I think a simpler comment will suffice:

  /* Actual parameter count ignoring unused hidden parameters.  */


Peter
Ajit Agarwal March 23, 2024, 9:33 a.m. UTC | #2
Hello Peter:

On 23/03/24 10:07 am, Peter Bergner wrote:
> On 3/22/24 5:15 AM, Ajit Agarwal wrote:
>> When using FlexiBLAS with OpenBLAS we noticed corruption of
>> the parameters passed to OpenBLAS functions. FlexiBLAS
>> basically provides a BLAS interface where each function
>> is a stub that forwards the arguments to a real BLAS lib,
>> like OpenBLAS.
>>
>> Fixes the corruption of caller frame checking number of
>> arguments is less than equal to GP_ARG_NUM_REG (8)
>> excluding hidden unused DECLS.
> 
> I think the git log entry commentary could be a little more descriptive
> of what the problem is. How about something like the following?
> 
>   When using FlexiBLAS with OpenBLAS, we noticed corruption of the caller
>   stack frame when calling OpenBLAS functions.  This was caused by the
>   FlexiBLAS C/C++ caller and OpenBLAS Fortran callee disagreeing on the
>   number of function parameters in the callee due to hidden Fortran
>   parameters. This can cause problems when the callee believes the caller
>   has allocated a parameter save area when the caller has not done so.
>   That means any writes by the callee into the non-existent parameter save
>   area will corrupt the caller stack frame.
> 
>   The workaround implemented here, is for the callee to determine whether
>   the caller has allocated a parameter save area or not, by ignoring any
>   unused hidden parameters when counting the number of parameters.
> 
> 
I will address this change in the new version of the patch.
> 
>> 	PR rtk-optimization/100799
> 
> s/rtk/rtl/
> 
> 
> 
I will address this in new version of the patch.
>> 	* config/rs6000/rs6000-calls.cc (rs6000_function_arg): Don't
>> 	generate parameter save area if number of arguments passed
>> 	less than equal to GP_ARG_NUM_REG (8) excluding hidden
>> 	parameter.
> 
> The callee doesn't generate or allocate the parameter save area, the
> caller does.  The code here is for the callee trying to determine
> whether the caller has done so.  How about saying the following instead?
> 
>   Don't assume a parameter save area has been allocated if the number of
>   formal parameters, excluding unused hidden parameters, is less than or
>   equal to GP_ARG_NUM_REG (8).
> 
> 

I will incorporate this change in new version of the patch.

> 
> 
>> 	(init_cumulative_args): Check for hidden parameter in fortran
>> 	routine and set the flag hidden_string_length and actual
>> 	parameter passed excluding hidden unused DECLS.
> 
> Check for unused hidden Fortran parameters and set hidden_string_length
> and actual_parm_length.
> 
>

I will address this change in new version of the patch.

 
>> +  /* When the buggy C/C++ wrappers call the function with fewer arguments
>> +     than it actually has and doesn't expect the parameter save area on the
>> +     caller side because of that while the callee expects it and the callee
>> +     actually stores something in the parameter save area, it corrupts
>> +     whatever is in the caller stack frame at that location.  */
> 
> The wrapper/caller is the one that allocates the parameter save area, so
> saying "...doesn't expect the parameter save area on the caller side..."
> doesn't make sense, since it knows whether it allocated it or not.
> How about saying something like the following instead?
> 
>   Check whether this function contains any unused hidden parameters and
>   record how many there are for use in rs6000_function_arg() to determine
>   whether its callers have allocated a parameter save area or not.
>   See PR100799 for details.
> 
> 

I will incorporate this change in new version of the patch.

> 
>> +  unsigned int num_args = 0;
>> +  unsigned int hidden_length = 0;
>> +
>> +  for (tree arg = DECL_ARGUMENTS (current_function_decl);
>> +       arg; arg = DECL_CHAIN (arg))
>> +    {
>> +      num_args++;
>> +      if (DECL_HIDDEN_STRING_LENGTH (arg))
>> +	{
>> +	  tree parmdef = ssa_default_def (cfun, arg);
>> +	  if (parmdef == NULL || has_zero_uses (parmdef))
>> +	    {
>> +	      cum->hidden_string_length = 1;
>> +	      hidden_length++;
>> +	    }
>> +	}
>> +   }
>> +
>> +  cum->actual_parm_length = num_args - hidden_length;
> 
> This code looks fine, but do we really need two new fields in rs6000_args?
> Can't we just get along with only cum->actual_parm_length by modifying
> the rs6000_function_arg() change from:
> 
>> +      else if (align_words < GP_ARG_NUM_REG
>> +	       || (cum->hidden_string_length
>> +	       && cum->actual_parm_length <= GP_ARG_NUM_REG))
> 
> to:
> 
> +      else if (align_words < GP_ARG_NUM_REG
> +	       || cum->actual_parm_length <= GP_ARG_NUM_REG)
> 
> ???
>

Yes we can do that. I will address this in new version of the patch.

 
> That said, I have a further comment below on what happens here when 
> align_words >= GP_ARG_NUM_REG and cum->actual_parm_length <= GP_ARG_NUM_REG.
> 
>

If we exceed the align_words >= GP_ARG_NUM_REG then there could be hidden unused DECL paramter in align_words which is greater than 8 then it will return NULL_RTX. Hence in the above condition it should not return NULL_RTX. For the above condition it will not return NULL_RTX instead it will return a rtx reg. 
 
> 
> 
>> +     /* When the buggy C/C++ wrappers call the function with fewer arguments
>> +	than it actually has and doesn't expect the parameter save area on the
>> +	caller side because of that while the callee expects it and the callee
>> +	actually stores something in the parameter save area, it corrupts
>> +	whatever is in the caller stack frame at that location.  */
> 
> Same comment as before, so same problem with the comment, but the following
> change...
> 
>> -      else if (align_words < GP_ARG_NUM_REG)
>> +      else if (align_words < GP_ARG_NUM_REG
>> +	       || (cum->hidden_string_length
>> +	       && cum->actual_parm_length <= GP_ARG_NUM_REG))
>         {
>           if (TARGET_32BIT && TARGET_POWERPC64)
>             return rs6000_mixed_function_arg (mode, type, align_words);
> 
>           return gen_rtx_REG (mode, GP_ARG_MIN_REG + align_words);
>         }
>       else
>         return NULL_RTX;
> 
> The old code for the unused hidden parameter (which was the 9th param) would
> fall thru to the "return NULL_RTX;" which would make the callee assume there
> was a parameter save area allocated.  Now instead, we'll return a reg rtx,
> probably of r11 (r3 thru r10 are our param regs) and I'm guessing we'll now
> see a copy of r11 into a pseudo like we do for the other param regs.
> Is that a problem? Given it's an unused parameter, it'll probably get deleted
> as dead code, but could it cause any issues?  What if we have more than one
> unused hidden parameter and we return r12 and r13 which have specific uses
> in our ABIs (eg, r13 is our TCB pointer), so it may not actually look dead.
> Have you verified what the callee RTL looks like after expand for these
> unused hidden parameters?  Is there a rtx we can return that isn't a NULL_RTX
> which triggers the assumption of a parameter save area, but isn't a reg rtx
> which might lead to some rtl being generated?  Would a (const_int 0) or
> something else work?
> 
> 
For the above use case it will return 

(reg:DI 5 %r5) and below check entry_parm = 
(reg:DI 5 %r5) and the following check will not return TRUE and hence parameter save area will not be allocated.

It will not generate any rtx in the callee rtl code but it just used to check whether to allocate parameter save area or not when number of args > 8.

/* If there is no incoming register, we need a stack.  */
  entry_parm = rs6000_function_arg (args_so_far, arg);
  if (entry_parm == NULL)
    return true;

  /* Likewise if we need to pass both in registers and on the stack.  */
  if (GET_CODE (entry_parm) == PARALLEL
      && XEXP (XVECEXP (entry_parm, 0, 0), 0) == NULL_RTX)
    return true;

Thanks & Regards
Ajit
> 
>> +  /* Actual parameter length ignoring hidden parameter.
>> +     This is done to C++ wrapper calling fortran procedures
>> +     which has hidden parameter that are not used.  */
> 
> I think a simpler comment will suffice:
> 
>   /* Actual parameter count ignoring unused hidden parameters.  */
> 
> 
> Peter
> 
> 
>
Ajit Agarwal March 23, 2024, 2:28 p.m. UTC | #3
Hello Peter:

Sent version-3 of the patch addressing below review comments.

Thanks & Regards
Ajit

On 23/03/24 3:03 pm, Ajit Agarwal wrote:
> Hello Peter:
> 
> On 23/03/24 10:07 am, Peter Bergner wrote:
>> On 3/22/24 5:15 AM, Ajit Agarwal wrote:
>>> When using FlexiBLAS with OpenBLAS we noticed corruption of
>>> the parameters passed to OpenBLAS functions. FlexiBLAS
>>> basically provides a BLAS interface where each function
>>> is a stub that forwards the arguments to a real BLAS lib,
>>> like OpenBLAS.
>>>
>>> Fixes the corruption of caller frame checking number of
>>> arguments is less than equal to GP_ARG_NUM_REG (8)
>>> excluding hidden unused DECLS.
>>
>> I think the git log entry commentary could be a little more descriptive
>> of what the problem is. How about something like the following?
>>
>>   When using FlexiBLAS with OpenBLAS, we noticed corruption of the caller
>>   stack frame when calling OpenBLAS functions.  This was caused by the
>>   FlexiBLAS C/C++ caller and OpenBLAS Fortran callee disagreeing on the
>>   number of function parameters in the callee due to hidden Fortran
>>   parameters. This can cause problems when the callee believes the caller
>>   has allocated a parameter save area when the caller has not done so.
>>   That means any writes by the callee into the non-existent parameter save
>>   area will corrupt the caller stack frame.
>>
>>   The workaround implemented here, is for the callee to determine whether
>>   the caller has allocated a parameter save area or not, by ignoring any
>>   unused hidden parameters when counting the number of parameters.
>>
>>
> I will address this change in the new version of the patch.
>>
>>> 	PR rtk-optimization/100799
>>
>> s/rtk/rtl/
>>
>>
>>
> I will address this in new version of the patch.
>>> 	* config/rs6000/rs6000-calls.cc (rs6000_function_arg): Don't
>>> 	generate parameter save area if number of arguments passed
>>> 	less than equal to GP_ARG_NUM_REG (8) excluding hidden
>>> 	parameter.
>>
>> The callee doesn't generate or allocate the parameter save area, the
>> caller does.  The code here is for the callee trying to determine
>> whether the caller has done so.  How about saying the following instead?
>>
>>   Don't assume a parameter save area has been allocated if the number of
>>   formal parameters, excluding unused hidden parameters, is less than or
>>   equal to GP_ARG_NUM_REG (8).
>>
>>
> 
> I will incorporate this change in new version of the patch.
> 
>>
>>
>>> 	(init_cumulative_args): Check for hidden parameter in fortran
>>> 	routine and set the flag hidden_string_length and actual
>>> 	parameter passed excluding hidden unused DECLS.
>>
>> Check for unused hidden Fortran parameters and set hidden_string_length
>> and actual_parm_length.
>>
>>
> 
> I will address this change in new version of the patch.
> 
>  
>>> +  /* When the buggy C/C++ wrappers call the function with fewer arguments
>>> +     than it actually has and doesn't expect the parameter save area on the
>>> +     caller side because of that while the callee expects it and the callee
>>> +     actually stores something in the parameter save area, it corrupts
>>> +     whatever is in the caller stack frame at that location.  */
>>
>> The wrapper/caller is the one that allocates the parameter save area, so
>> saying "...doesn't expect the parameter save area on the caller side..."
>> doesn't make sense, since it knows whether it allocated it or not.
>> How about saying something like the following instead?
>>
>>   Check whether this function contains any unused hidden parameters and
>>   record how many there are for use in rs6000_function_arg() to determine
>>   whether its callers have allocated a parameter save area or not.
>>   See PR100799 for details.
>>
>>
> 
> I will incorporate this change in new version of the patch.
> 
>>
>>> +  unsigned int num_args = 0;
>>> +  unsigned int hidden_length = 0;
>>> +
>>> +  for (tree arg = DECL_ARGUMENTS (current_function_decl);
>>> +       arg; arg = DECL_CHAIN (arg))
>>> +    {
>>> +      num_args++;
>>> +      if (DECL_HIDDEN_STRING_LENGTH (arg))
>>> +	{
>>> +	  tree parmdef = ssa_default_def (cfun, arg);
>>> +	  if (parmdef == NULL || has_zero_uses (parmdef))
>>> +	    {
>>> +	      cum->hidden_string_length = 1;
>>> +	      hidden_length++;
>>> +	    }
>>> +	}
>>> +   }
>>> +
>>> +  cum->actual_parm_length = num_args - hidden_length;
>>
>> This code looks fine, but do we really need two new fields in rs6000_args?
>> Can't we just get along with only cum->actual_parm_length by modifying
>> the rs6000_function_arg() change from:
>>
>>> +      else if (align_words < GP_ARG_NUM_REG
>>> +	       || (cum->hidden_string_length
>>> +	       && cum->actual_parm_length <= GP_ARG_NUM_REG))
>>
>> to:
>>
>> +      else if (align_words < GP_ARG_NUM_REG
>> +	       || cum->actual_parm_length <= GP_ARG_NUM_REG)
>>
>> ???
>>
> 
> Yes we can do that. I will address this in new version of the patch.
> 
>  
>> That said, I have a further comment below on what happens here when 
>> align_words >= GP_ARG_NUM_REG and cum->actual_parm_length <= GP_ARG_NUM_REG.
>>
>>
> 
> If we exceed the align_words >= GP_ARG_NUM_REG then there could be hidden unused DECL paramter in align_words which is greater than 8 then it will return NULL_RTX. Hence in the above condition it should not return NULL_RTX. For the above condition it will not return NULL_RTX instead it will return a rtx reg. 
>  
>>
>>
>>> +     /* When the buggy C/C++ wrappers call the function with fewer arguments
>>> +	than it actually has and doesn't expect the parameter save area on the
>>> +	caller side because of that while the callee expects it and the callee
>>> +	actually stores something in the parameter save area, it corrupts
>>> +	whatever is in the caller stack frame at that location.  */
>>
>> Same comment as before, so same problem with the comment, but the following
>> change...
>>
>>> -      else if (align_words < GP_ARG_NUM_REG)
>>> +      else if (align_words < GP_ARG_NUM_REG
>>> +	       || (cum->hidden_string_length
>>> +	       && cum->actual_parm_length <= GP_ARG_NUM_REG))
>>         {
>>           if (TARGET_32BIT && TARGET_POWERPC64)
>>             return rs6000_mixed_function_arg (mode, type, align_words);
>>
>>           return gen_rtx_REG (mode, GP_ARG_MIN_REG + align_words);
>>         }
>>       else
>>         return NULL_RTX;
>>
>> The old code for the unused hidden parameter (which was the 9th param) would
>> fall thru to the "return NULL_RTX;" which would make the callee assume there
>> was a parameter save area allocated.  Now instead, we'll return a reg rtx,
>> probably of r11 (r3 thru r10 are our param regs) and I'm guessing we'll now
>> see a copy of r11 into a pseudo like we do for the other param regs.
>> Is that a problem? Given it's an unused parameter, it'll probably get deleted
>> as dead code, but could it cause any issues?  What if we have more than one
>> unused hidden parameter and we return r12 and r13 which have specific uses
>> in our ABIs (eg, r13 is our TCB pointer), so it may not actually look dead.
>> Have you verified what the callee RTL looks like after expand for these
>> unused hidden parameters?  Is there a rtx we can return that isn't a NULL_RTX
>> which triggers the assumption of a parameter save area, but isn't a reg rtx
>> which might lead to some rtl being generated?  Would a (const_int 0) or
>> something else work?
>>
>>
> For the above use case it will return 
> 
> (reg:DI 5 %r5) and below check entry_parm = 
> (reg:DI 5 %r5) and the following check will not return TRUE and hence parameter save area will not be allocated.
> 
> It will not generate any rtx in the callee rtl code but it just used to check whether to allocate parameter save area or not when number of args > 8.
> 
> /* If there is no incoming register, we need a stack.  */
>   entry_parm = rs6000_function_arg (args_so_far, arg);
>   if (entry_parm == NULL)
>     return true;
> 
>   /* Likewise if we need to pass both in registers and on the stack.  */
>   if (GET_CODE (entry_parm) == PARALLEL
>       && XEXP (XVECEXP (entry_parm, 0, 0), 0) == NULL_RTX)
>     return true;
> 
> Thanks & Regards
> Ajit
>>
>>> +  /* Actual parameter length ignoring hidden parameter.
>>> +     This is done to C++ wrapper calling fortran procedures
>>> +     which has hidden parameter that are not used.  */
>>
>> I think a simpler comment will suffice:
>>
>>   /* Actual parameter count ignoring unused hidden parameters.  */
>>
>>
>> Peter
>>
>>
>>
Peter Bergner March 23, 2024, 4:03 p.m. UTC | #4
On 3/23/24 4:33 AM, Ajit Agarwal wrote:
>>> -      else if (align_words < GP_ARG_NUM_REG)
>>> +      else if (align_words < GP_ARG_NUM_REG
>>> +	       || (cum->hidden_string_length
>>> +	       && cum->actual_parm_length <= GP_ARG_NUM_REG))
>>         {
>>           if (TARGET_32BIT && TARGET_POWERPC64)
>>             return rs6000_mixed_function_arg (mode, type, align_words);
>>
>>           return gen_rtx_REG (mode, GP_ARG_MIN_REG + align_words);
>>         }
>>       else
>>         return NULL_RTX;
>>
>> The old code for the unused hidden parameter (which was the 9th param) would
>> fall thru to the "return NULL_RTX;" which would make the callee assume there
>> was a parameter save area allocated.  Now instead, we'll return a reg rtx,
>> probably of r11 (r3 thru r10 are our param regs) and I'm guessing we'll now
>> see a copy of r11 into a pseudo like we do for the other param regs.
>> Is that a problem? Given it's an unused parameter, it'll probably get deleted
>> as dead code, but could it cause any issues?  What if we have more than one
>> unused hidden parameter and we return r12 and r13 which have specific uses
>> in our ABIs (eg, r13 is our TCB pointer), so it may not actually look dead.
>> Have you verified what the callee RTL looks like after expand for these
>> unused hidden parameters?  Is there a rtx we can return that isn't a NULL_RTX
>> which triggers the assumption of a parameter save area, but isn't a reg rtx
>> which might lead to some rtl being generated?  Would a (const_int 0) or
>> something else work?
>>
>>
> For the above use case it will return 
> 
> (reg:DI 5 %r5) and below check entry_parm = 
> (reg:DI 5 %r5) and the following check will not return TRUE and hence
>                parameter save area will not be allocated.

Why r5?!?!   The 8th (integer) param would return r10, so I'd assume if
the next param was a hidden param, then it'd get the next gpr, so r11.
How does it jump back to r5 which may have been used by the 3rd param?





> It will not generate any rtx in the callee rtl code but it just used to
> check whether to allocate parameter save area or not when number of args > 8.
> 
> /* If there is no incoming register, we need a stack.  */
>   entry_parm = rs6000_function_arg (args_so_far, arg);
>   if (entry_parm == NULL)
>     return true;
> 
>   /* Likewise if we need to pass both in registers and on the stack.  */
>   if (GET_CODE (entry_parm) == PARALLEL
>       && XEXP (XVECEXP (entry_parm, 0, 0), 0) == NULL_RTX)
>     return true;

Yes, this code in rs6000_parm_needs_stack() uses the rs6000_function_arg()
return value as a boolean to tell us whether a parameter save area is required
so what we return is unimportant other than to know it's not NULL_RTX.

I'm more concerned about the use of the target hook targetm.calls.function_arg
used in the generic parts of the compiler.  What will that code do differently
now that we return a reg rtx rather than NULL_RTX?  Might that code use
the reg rtx to emit something?  I'd feel better if you could verify what
happens in that code when we return a reg rtx for that 9th hidden param which
isn't really being passed in a register.


Peter
Ajit Agarwal March 23, 2024, 6:37 p.m. UTC | #5
On 23/03/24 9:33 pm, Peter Bergner wrote:
> On 3/23/24 4:33 AM, Ajit Agarwal wrote:
>>>> -      else if (align_words < GP_ARG_NUM_REG)
>>>> +      else if (align_words < GP_ARG_NUM_REG
>>>> +	       || (cum->hidden_string_length
>>>> +	       && cum->actual_parm_length <= GP_ARG_NUM_REG))
>>>         {
>>>           if (TARGET_32BIT && TARGET_POWERPC64)
>>>             return rs6000_mixed_function_arg (mode, type, align_words);
>>>
>>>           return gen_rtx_REG (mode, GP_ARG_MIN_REG + align_words);
>>>         }
>>>       else
>>>         return NULL_RTX;
>>>
>>> The old code for the unused hidden parameter (which was the 9th param) would
>>> fall thru to the "return NULL_RTX;" which would make the callee assume there
>>> was a parameter save area allocated.  Now instead, we'll return a reg rtx,
>>> probably of r11 (r3 thru r10 are our param regs) and I'm guessing we'll now
>>> see a copy of r11 into a pseudo like we do for the other param regs.
>>> Is that a problem? Given it's an unused parameter, it'll probably get deleted
>>> as dead code, but could it cause any issues?  What if we have more than one
>>> unused hidden parameter and we return r12 and r13 which have specific uses
>>> in our ABIs (eg, r13 is our TCB pointer), so it may not actually look dead.
>>> Have you verified what the callee RTL looks like after expand for these
>>> unused hidden parameters?  Is there a rtx we can return that isn't a NULL_RTX
>>> which triggers the assumption of a parameter save area, but isn't a reg rtx
>>> which might lead to some rtl being generated?  Would a (const_int 0) or
>>> something else work?
>>>
>>>
>> For the above use case it will return 
>>
>> (reg:DI 5 %r5) and below check entry_parm = 
>> (reg:DI 5 %r5) and the following check will not return TRUE and hence
>>                parameter save area will not be allocated.
> 
> Why r5?!?!   The 8th (integer) param would return r10, so I'd assume if
> the next param was a hidden param, then it'd get the next gpr, so r11.
> How does it jump back to r5 which may have been used by the 3rd param?
> 
> 
My mistake its r11 only for hidden param.
> 
> 
> 
>> It will not generate any rtx in the callee rtl code but it just used to
>> check whether to allocate parameter save area or not when number of args > 8.
>>
>> /* If there is no incoming register, we need a stack.  */
>>   entry_parm = rs6000_function_arg (args_so_far, arg);
>>   if (entry_parm == NULL)
>>     return true;
>>
>>   /* Likewise if we need to pass both in registers and on the stack.  */
>>   if (GET_CODE (entry_parm) == PARALLEL
>>       && XEXP (XVECEXP (entry_parm, 0, 0), 0) == NULL_RTX)
>>     return true;
> 
> Yes, this code in rs6000_parm_needs_stack() uses the rs6000_function_arg()
> return value as a boolean to tell us whether a parameter save area is required
> so what we return is unimportant other than to know it's not NULL_RTX.
> 
> I'm more concerned about the use of the target hook targetm.calls.function_arg
> used in the generic parts of the compiler.  What will that code do differently
> now that we return a reg rtx rather than NULL_RTX?  Might that code use
> the reg rtx to emit something?  I'd feel better if you could verify what
> happens in that code when we return a reg rtx for that 9th hidden param which
> isn't really being passed in a register.
> 

As per my understanding and debugging openBLAS code testcase I see that reg_rtx returned inside the below IF condition is used for check whether paramter save area is needed or not. 

In the generic code where targetm.calls.function_arg is called 
in calls.cc returned rtx is used for PARALLEL case so that we can
check if we need to pass both in registers and stack then they emit
store with respect to return rtx. If we identify that we need only
registers for argument then it emits nothing.

Thanks & Regards
Ajit
> 
> Peter
> 
>
Kewen.Lin April 2, 2024, 6:12 a.m. UTC | #6
Hi!

on 2024/3/24 02:37, Ajit Agarwal wrote:
> 
> 
> On 23/03/24 9:33 pm, Peter Bergner wrote:
>> On 3/23/24 4:33 AM, Ajit Agarwal wrote:
>>>>> -      else if (align_words < GP_ARG_NUM_REG)
>>>>> +      else if (align_words < GP_ARG_NUM_REG
>>>>> +	       || (cum->hidden_string_length
>>>>> +	       && cum->actual_parm_length <= GP_ARG_NUM_REG))
>>>>         {
>>>>           if (TARGET_32BIT && TARGET_POWERPC64)
>>>>             return rs6000_mixed_function_arg (mode, type, align_words);
>>>>
>>>>           return gen_rtx_REG (mode, GP_ARG_MIN_REG + align_words);
>>>>         }
>>>>       else
>>>>         return NULL_RTX;
>>>>
>>>> The old code for the unused hidden parameter (which was the 9th param) would
>>>> fall thru to the "return NULL_RTX;" which would make the callee assume there
>>>> was a parameter save area allocated.  Now instead, we'll return a reg rtx,
>>>> probably of r11 (r3 thru r10 are our param regs) and I'm guessing we'll now
>>>> see a copy of r11 into a pseudo like we do for the other param regs.
>>>> Is that a problem? Given it's an unused parameter, it'll probably get deleted
>>>> as dead code, but could it cause any issues?  What if we have more than one

I think Peter raised one good point, not sure it would really cause some issues,
but the assigned reg goes beyond GP_ARG_MAX_REG, at least it is confusing to people
especially without DCE like at -O0.  Can we aggressively remove these candidates
from DECL_ARGUMENTS chain?  Does it cause any assertion to fail?

BR,
Kewen


>>>> unused hidden parameter and we return r12 and r13 which have specific uses
>>>> in our ABIs (eg, r13 is our TCB pointer), so it may not actually look dead.
>>>> Have you verified what the callee RTL looks like after expand for these
>>>> unused hidden parameters?  Is there a rtx we can return that isn't a NULL_RTX
>>>> which triggers the assumption of a parameter save area, but isn't a reg rtx
>>>> which might lead to some rtl being generated?  Would a (const_int 0) or
>>>> something else work?
>>>>
>>>>
>>> For the above use case it will return 
>>>
>>> (reg:DI 5 %r5) and below check entry_parm = 
>>> (reg:DI 5 %r5) and the following check will not return TRUE and hence
>>>                parameter save area will not be allocated.
>>
>> Why r5?!?!   The 8th (integer) param would return r10, so I'd assume if
>> the next param was a hidden param, then it'd get the next gpr, so r11.
>> How does it jump back to r5 which may have been used by the 3rd param?
>>
>>
> My mistake its r11 only for hidden param.
>>
>>
>>
>>> It will not generate any rtx in the callee rtl code but it just used to
>>> check whether to allocate parameter save area or not when number of args > 8.
>>>
>>> /* If there is no incoming register, we need a stack.  */
>>>   entry_parm = rs6000_function_arg (args_so_far, arg);
>>>   if (entry_parm == NULL)
>>>     return true;
>>>
>>>   /* Likewise if we need to pass both in registers and on the stack.  */
>>>   if (GET_CODE (entry_parm) == PARALLEL
>>>       && XEXP (XVECEXP (entry_parm, 0, 0), 0) == NULL_RTX)
>>>     return true;
>>
>> Yes, this code in rs6000_parm_needs_stack() uses the rs6000_function_arg()
>> return value as a boolean to tell us whether a parameter save area is required
>> so what we return is unimportant other than to know it's not NULL_RTX.
>>
>> I'm more concerned about the use of the target hook targetm.calls.function_arg
>> used in the generic parts of the compiler.  What will that code do differently
>> now that we return a reg rtx rather than NULL_RTX?  Might that code use
>> the reg rtx to emit something?  I'd feel better if you could verify what
>> happens in that code when we return a reg rtx for that 9th hidden param which
>> isn't really being passed in a register.
>>
> 
> As per my understanding and debugging openBLAS code testcase I see that reg_rtx returned inside the below IF condition is used for check whether paramter save area is needed or not. 
> 
> In the generic code where targetm.calls.function_arg is called 
> in calls.cc returned rtx is used for PARALLEL case so that we can
> check if we need to pass both in registers and stack then they emit
> store with respect to return rtx. If we identify that we need only
> registers for argument then it emits nothing.
> 
> Thanks & Regards
> Ajit
>>
>> Peter
>>
>>
Jakub Jelinek April 2, 2024, 8:03 a.m. UTC | #7
On Tue, Apr 02, 2024 at 02:12:04PM +0800, Kewen.Lin wrote:
> >>>> The old code for the unused hidden parameter (which was the 9th param) would
> >>>> fall thru to the "return NULL_RTX;" which would make the callee assume there
> >>>> was a parameter save area allocated.  Now instead, we'll return a reg rtx,
> >>>> probably of r11 (r3 thru r10 are our param regs) and I'm guessing we'll now
> >>>> see a copy of r11 into a pseudo like we do for the other param regs.
> >>>> Is that a problem? Given it's an unused parameter, it'll probably get deleted
> >>>> as dead code, but could it cause any issues?  What if we have more than one
> 
> I think Peter raised one good point, not sure it would really cause some issues,
> but the assigned reg goes beyond GP_ARG_MAX_REG, at least it is confusing to people
> especially without DCE like at -O0.  Can we aggressively remove these candidates
> from DECL_ARGUMENTS chain?  Does it cause any assertion to fail?

I'd prefer not to remove DECL_ARGUMENTS chains, they are valid arguments that just some
invalid code doesn't pass.  By removing them you basically always create an
invalid case, this time in the other direction, valid caller passes more
arguments than the callee (invalidly) expects.

	Jakub
Kewen.Lin April 3, 2024, 5:18 a.m. UTC | #8
Hi Jakub,

on 2024/4/2 16:03, Jakub Jelinek wrote:
> On Tue, Apr 02, 2024 at 02:12:04PM +0800, Kewen.Lin wrote:
>>>>>> The old code for the unused hidden parameter (which was the 9th param) would
>>>>>> fall thru to the "return NULL_RTX;" which would make the callee assume there
>>>>>> was a parameter save area allocated.  Now instead, we'll return a reg rtx,
>>>>>> probably of r11 (r3 thru r10 are our param regs) and I'm guessing we'll now
>>>>>> see a copy of r11 into a pseudo like we do for the other param regs.
>>>>>> Is that a problem? Given it's an unused parameter, it'll probably get deleted
>>>>>> as dead code, but could it cause any issues?  What if we have more than one
>>
>> I think Peter raised one good point, not sure it would really cause some issues,
>> but the assigned reg goes beyond GP_ARG_MAX_REG, at least it is confusing to people
>> especially without DCE like at -O0.  Can we aggressively remove these candidates
>> from DECL_ARGUMENTS chain?  Does it cause any assertion to fail?
> 
> I'd prefer not to remove DECL_ARGUMENTS chains, they are valid arguments that just some
> invalid code doesn't pass.  By removing them you basically always create an
> invalid case, this time in the other direction, valid caller passes more
> arguments than the callee (invalidly) expects.

Thanks for the comments, do you mean it can affect the arguments validation when there
is explicit function declaration with interface?  Then can we strip them when we are
going to expand them (like checking currently_expanding_function_start)?  since from the
perspective of resulted assembly, with this workaround, the callee can:
  1) pass the hidden args in unexpected GPR like r11, ... at -O0;
  2) get rid of such hidden args as they are unused at -O2;
This proposal aims to make the assembly at -O0 not to pass with r11... (same as -O2),
comparing to the assembly at O2, the mismatch isn't actually changed.

BR,
Kewen
Jakub Jelinek April 3, 2024, 8:35 a.m. UTC | #9
On Wed, Apr 03, 2024 at 01:18:54PM +0800, Kewen.Lin wrote:
> > I'd prefer not to remove DECL_ARGUMENTS chains, they are valid arguments that just some
> > invalid code doesn't pass.  By removing them you basically always create an
> > invalid case, this time in the other direction, valid caller passes more
> > arguments than the callee (invalidly) expects.
> 
> Thanks for the comments, do you mean it can affect the arguments validation when there
> is explicit function declaration with interface?  Then can we strip them when we are
> going to expand them (like checking currently_expanding_function_start)?

I'd prefer not stripping them at all; they are clearly marked as perhaps not
passed in buggy programs (the DECL_HIDDEN_STRING_LENGTH argument) and
removing them implies the decl is a throw away, that after expansion
nothing will actually look at it anymore.  I believe that is the case of
function bodies, we expand them into RTL and throw away the GIMPLE, and
after final clear the bodies, but it is not the case of the FUNCTION_DECL
or its DECL_ARGUMENTs etc.  E.g. GIMPLE optimizations or expansion of
callers could be looking at those as well.

> since from the
> perspective of resulted assembly, with this workaround, the callee can:
>   1) pass the hidden args in unexpected GPR like r11, ... at -O0;
>   2) get rid of such hidden args as they are unused at -O2;
> This proposal aims to make the assembly at -O0 not to pass with r11... (same as -O2),
> comparing to the assembly at O2, the mismatch isn't actually changed.

The aim for the workaround was just avoid assuming there is a argument save
area in the caller stack when it is sometimes missing.
If you are looking for optimizations where nothing actually passes the
unneeded arguments and nothing expects them to be passed, then it is a task
for IPA optimizations and should be done solely if IPA determines that all
callers can be adjusted together with the callee; I think IPA already does
that in that case for years, regardless if it is DECL_HIDDEN_STRING_LENGTH
PARM_DECL or not.

	Jakub
Kewen.Lin April 3, 2024, 9:02 a.m. UTC | #10
Hi Jakub,

on 2024/4/3 16:35, Jakub Jelinek wrote:
> On Wed, Apr 03, 2024 at 01:18:54PM +0800, Kewen.Lin wrote:
>>> I'd prefer not to remove DECL_ARGUMENTS chains, they are valid arguments that just some
>>> invalid code doesn't pass.  By removing them you basically always create an
>>> invalid case, this time in the other direction, valid caller passes more
>>> arguments than the callee (invalidly) expects.
>>
>> Thanks for the comments, do you mean it can affect the arguments validation when there
>> is explicit function declaration with interface?  Then can we strip them when we are
>> going to expand them (like checking currently_expanding_function_start)?
> 
> I'd prefer not stripping them at all; they are clearly marked as perhaps not
> passed in buggy programs (the DECL_HIDDEN_STRING_LENGTH argument) and
> removing them implies the decl is a throw away, that after expansion

Yes, IMHO it's safe as they are unused.

> nothing will actually look at it anymore.  I believe that is the case of
> function bodies, we expand them into RTL and throw away the GIMPLE, and
> after final clear the bodies, but it is not the case of the FUNCTION_DECL
> or its DECL_ARGUMENTs etc.  E.g. GIMPLE optimizations or expansion of
> callers could be looking at those as well.

At expand time GIMPLE optimizations should already finish, so it should be
safe to strip them at that time?  It would surprise me if expansions of
callers will look at callee's information, it's more like what should be
done in IPA analysis instead?

> 
>> since from the
>> perspective of resulted assembly, with this workaround, the callee can:
>>   1) pass the hidden args in unexpected GPR like r11, ... at -O0;
>>   2) get rid of such hidden args as they are unused at -O2;
>> This proposal aims to make the assembly at -O0 not to pass with r11... (same as -O2),
>> comparing to the assembly at O2, the mismatch isn't actually changed.
> 
> The aim for the workaround was just avoid assuming there is a argument save
> area in the caller stack when it is sometimes missing.

Yeah, understood.

> If you are looking for optimizations where nothing actually passes the
> unneeded arguments and nothing expects them to be passed, then it is a task
> for IPA optimizations and should be done solely if IPA determines that all
> callers can be adjusted together with the callee; I think IPA already does
> that in that case for years, regardless if it is DECL_HIDDEN_STRING_LENGTH
> PARM_DECL or not.

No, it's not what I was looking for.  Peter's comments made me feel it's not
good to have assembly at O0 like:

        std %r3,112(%r31)
        std %r4,120(%r31)
        std %r5,128(%r31)
        std %r6,136(%r31)
        std %r7,144(%r31)
        std %r8,152(%r31)
        std %r9,160(%r31)
        std %r10,168(%r31)
        std %r11,176(%r31) // this mislead people that we pass 9th arg via r11,
                           // it would be nice not to have it.

so I was thinking if there is some way to get rid of it.

BR,
Kewen
Jakub Jelinek April 3, 2024, 9:23 a.m. UTC | #11
On Wed, Apr 03, 2024 at 05:02:40PM +0800, Kewen.Lin wrote:
> on 2024/4/3 16:35, Jakub Jelinek wrote:
> > On Wed, Apr 03, 2024 at 01:18:54PM +0800, Kewen.Lin wrote:
> >>> I'd prefer not to remove DECL_ARGUMENTS chains, they are valid arguments that just some
> >>> invalid code doesn't pass.  By removing them you basically always create an
> >>> invalid case, this time in the other direction, valid caller passes more
> >>> arguments than the callee (invalidly) expects.
> >>
> >> Thanks for the comments, do you mean it can affect the arguments validation when there
> >> is explicit function declaration with interface?  Then can we strip them when we are
> >> going to expand them (like checking currently_expanding_function_start)?
> > 
> > I'd prefer not stripping them at all; they are clearly marked as perhaps not
> > passed in buggy programs (the DECL_HIDDEN_STRING_LENGTH argument) and
> > removing them implies the decl is a throw away, that after expansion
> 
> Yes, IMHO it's safe as they are unused.

But they are still passed in the usual case.

> > nothing will actually look at it anymore.  I believe that is the case of
> > function bodies, we expand them into RTL and throw away the GIMPLE, and
> > after final clear the bodies, but it is not the case of the FUNCTION_DECL
> > or its DECL_ARGUMENTs etc.  E.g. GIMPLE optimizations or expansion of
> > callers could be looking at those as well.
> 
> At expand time GIMPLE optimizations should already finish, so it should be
> safe to strip them at that time?

No.
The IPA/post IPA behavior is that IPA optimizations are performed and then
cgraph finalizes one function at a time, going there from modifications
needed from IPA passes, post IPA GIMPLE optimizations, expansion to RTL,
RTL optimizations, emitting assembly, throwing away the body, then picking
another function and repeating that etc.
So, when one function makes it to expansion, if you modify its
DECL_ARGUMENTS etc., all the post IPA GIMPLE optimization passes of other
functions might still see such changes.

>  It would surprise me if expansions of
> callers will look at callee's information, it's more like what should be
> done in IPA analysis instead?

Depends on what exactly it is.  E.g. C non-prototyped functions have
just DECL_ARGUMENTS to check how many arguments the call should have vs.
what is actually passed.

> No, it's not what I was looking for.  Peter's comments made me feel it's not
> good to have assembly at O0 like:
> 
>         std %r3,112(%r31)
>         std %r4,120(%r31)
>         std %r5,128(%r31)
>         std %r6,136(%r31)
>         std %r7,144(%r31)
>         std %r8,152(%r31)
>         std %r9,160(%r31)
>         std %r10,168(%r31)
>         std %r11,176(%r31) // this mislead people that we pass 9th arg via r11,
>                            // it would be nice not to have it.
> 
> so I was thinking if there is some way to get rid of it.

You want to optimize at -O0?  Don't.
That will screw up debugging.  The function does have that argument, it
should show up in debug info; it should show up also at -O2 in debug info
etc.  If you remove chains from DECL_ARGUMENTS, because we have early dwarf
these days, DW_TAG_formal_parameter nodes should have been already created,
but it would mean that DW_AT_location for those arguments likely isn't
filled.  Now, for -O2 it might be the case that the argument has useful
location only at the start of the function, could have
DW_OP_entry_value(%r11) afterwards, but at -O0 it better should have some
stack slot into which the argument is saved and DW_AT_location should be
that stack slot.  All that should change with the workaround is that if the
stack slot would be normally in the argument save area in the caller's
frame, if such argument save area can't be counted on, then it needs to be
saved in some other stack slot, like arguments are saved to when there are
only <= 8 arguments.

Now, sure, if IPA optimizes a call because it can prove it sees a callee and
all its callers and can modify them all, it can optimize away passing of
that argument but 1) it isn't done at -O0/-Og 2) it ensures debug info
reflects the case that the argument isn't passed and arranges for the
callers to provide DW_OP_GNU_parameter_ref/call site info value 3) it
doesn't modify the FUNCTION_DECL itself or its DECL_ARGUMENTS, but makes
a clone of the FUNCTION_DECL and modifies the clones DECL_ARGUMENTS

	Jakub
Kewen.Lin April 3, 2024, 11:01 a.m. UTC | #12
Hi!

on 2024/4/3 17:23, Jakub Jelinek wrote:
> On Wed, Apr 03, 2024 at 05:02:40PM +0800, Kewen.Lin wrote:
>> on 2024/4/3 16:35, Jakub Jelinek wrote:
>>> On Wed, Apr 03, 2024 at 01:18:54PM +0800, Kewen.Lin wrote:
>>>>> I'd prefer not to remove DECL_ARGUMENTS chains, they are valid arguments that just some
>>>>> invalid code doesn't pass.  By removing them you basically always create an
>>>>> invalid case, this time in the other direction, valid caller passes more
>>>>> arguments than the callee (invalidly) expects.
>>>>
>>>> Thanks for the comments, do you mean it can affect the arguments validation when there
>>>> is explicit function declaration with interface?  Then can we strip them when we are
>>>> going to expand them (like checking currently_expanding_function_start)?
>>>
>>> I'd prefer not stripping them at all; they are clearly marked as perhaps not
>>> passed in buggy programs (the DECL_HIDDEN_STRING_LENGTH argument) and
>>> removing them implies the decl is a throw away, that after expansion
>>
>> Yes, IMHO it's safe as they are unused.
> 
> But they are still passed in the usual case.
> 
>>> nothing will actually look at it anymore.  I believe that is the case of
>>> function bodies, we expand them into RTL and throw away the GIMPLE, and
>>> after final clear the bodies, but it is not the case of the FUNCTION_DECL
>>> or its DECL_ARGUMENTs etc.  E.g. GIMPLE optimizations or expansion of
>>> callers could be looking at those as well.
>>
>> At expand time GIMPLE optimizations should already finish, so it should be
>> safe to strip them at that time?
> 
> No.
> The IPA/post IPA behavior is that IPA optimizations are performed and then
> cgraph finalizes one function at a time, going there from modifications
> needed from IPA passes, post IPA GIMPLE optimizations, expansion to RTL,
> RTL optimizations, emitting assembly, throwing away the body, then picking
> another function and repeating that etc.
> So, when one function makes it to expansion, if you modify its
> DECL_ARGUMENTS etc., all the post IPA GIMPLE optimization passes of other
> functions might still see such changes.

Thanks for explaining, I agree it's risky from this perspective.

> 
>>  It would surprise me if expansions of
>> callers will look at callee's information, it's more like what should be
>> done in IPA analysis instead?
> 
> Depends on what exactly it is.  E.g. C non-prototyped functions have
> just DECL_ARGUMENTS to check how many arguments the call should have vs.
> what is actually passed.

OK.

> 
>> No, it's not what I was looking for.  Peter's comments made me feel it's not
>> good to have assembly at O0 like:
>>
>>         std %r3,112(%r31)
>>         std %r4,120(%r31)
>>         std %r5,128(%r31)
>>         std %r6,136(%r31)
>>         std %r7,144(%r31)
>>         std %r8,152(%r31)
>>         std %r9,160(%r31)
>>         std %r10,168(%r31)
>>         std %r11,176(%r31) // this mislead people that we pass 9th arg via r11,
>>                            // it would be nice not to have it.
>>
>> so I was thinking if there is some way to get rid of it.
> 
> You want to optimize at -O0?  Don't.

I don't really want optimization but try to get rid of the unreasonable
assembly code.  :)

> That will screw up debugging.  The function does have that argument, it
> should show up in debug info; it should show up also at -O2 in debug info
> etc.  If you remove chains from DECL_ARGUMENTS, because we have early dwarf
> these days, DW_TAG_formal_parameter nodes should have been already created,
> but it would mean that DW_AT_location for those arguments likely isn't
> filled.  Now, for -O2 it might be the case that the argument has useful
> location only at the start of the function, could have
> DW_OP_entry_value(%r11) afterwards, but at -O0 it better should have some
> stack slot into which the argument is saved and DW_AT_location should be
> that stack slot.  All that should change with the workaround is that if the
> stack slot would be normally in the argument save area in the caller's
> frame, if such argument save area can't be counted on, then it needs to be
> saved in some other stack slot, like arguments are saved to when there are
> only <= 8 arguments.

Thanks for the details on debugging support, but IIUC with this workaround
being adopted, the debuggability on hidden args are already broken, aren't?
Since with a usual caller, the actual argument is passed in argument save
area, but the callee debug information says the location is %r11 or some
other stack slot.

I think the difficulty is that: with this workaround, for some arguments we
are lying they are not passed in argument save area, then we have to pretend
they are passed in r11,r12..., but in fact these registers are not valid to
pass arguments, so it's unreasonable and confusing.  With your explanation,
I agree that stripping DECL_ARGUMENTS chains isn't a good way to eliminate
this confusion, maybe always using GP_ARG_MIN_REG/GP_ARG_MAX_REG for things
exceeding GP_ARG_MAX_REG can reduce the unreasonableness (but still confusing
IMHO).

> 
> Now, sure, if IPA optimizes a call because it can prove it sees a callee and
> all its callers and can modify them all, it can optimize away passing of
> that argument but 1) it isn't done at -O0/-Og 2) it ensures debug info
> reflects the case that the argument isn't passed and arranges for the
> callers to provide DW_OP_GNU_parameter_ref/call site info value 3) it
> doesn't modify the FUNCTION_DECL itself or its DECL_ARGUMENTS, but makes
> a clone of the FUNCTION_DECL and modifies the clones DECL_ARGUMENTS

Yes, but it doesn't help our case.

BR,
Kewen
Jakub Jelinek April 3, 2024, 11:18 a.m. UTC | #13
On Wed, Apr 03, 2024 at 07:01:50PM +0800, Kewen.Lin wrote:
> Thanks for the details on debugging support, but IIUC with this workaround
> being adopted, the debuggability on hidden args are already broken, aren't?

No.
In the correct program case, which should be the usual case, the caller will
pass the arguments and one should be able to see the values in the debugger
even if the function doesn't actually use those arguments.
If the caller is buggy and doesn't pass those arguments, one should be able
to see garbage values for those arguments and perhaps that way figure out
that the program is buggy and fix it.

> Since with a usual caller, the actual argument is passed in argument save
> area, but the callee debug information says the location is %r11 or some
> other stack slot.
> 
> I think the difficulty is that: with this workaround, for some arguments we
> are lying they are not passed in argument save area, then we have to pretend
> they are passed in r11,r12..., but in fact these registers are not valid to
> pass arguments, so it's unreasonable and confusing.  With your explanation,
> I agree that stripping DECL_ARGUMENTS chains isn't a good way to eliminate
> this confusion, maybe always using GP_ARG_MIN_REG/GP_ARG_MAX_REG for things
> exceeding GP_ARG_MAX_REG can reduce the unreasonableness (but still confusing
> IMHO).

If those arguments aren't passed in r11/r12, but in memory, the workaround
shouldn't pretend they are passed somewhere where they aren't actually
passed.
Instead, it should load them from the memory where they are actually
normally passed.
What needs to be ensured though is that those arguments are for -O0 loaded
from those stack slots and saved to different stack slots (inside of the
callee frame, rather than in caller's frame), for -O1+ just not loaded at
all and pretended to just live in the caller's frame, and most importantly
ensure that the callee doesn't try to think there is a parameter save area
in the caller's frame which it can use for random saving related or
unrelated data.  So, e.g. REG_EQUAL/REG_EQUIV shouldn't be used, nor tell
that the 1st-8th arguments could be saved to the parameter save area.
So, for the 1st-8th arguments it really should act as if there is no
parameter save area and for the DECL_HIDDEN_STRING_LENGTH ones after it
as it those are passed in memory, but as if that memory is owned by the
caller, not callee, so it is not correct to modify that memory.

	Jakub
Kewen.Lin April 3, 2024, 12:18 p.m. UTC | #14
on 2024/4/3 19:18, Jakub Jelinek wrote:
> On Wed, Apr 03, 2024 at 07:01:50PM +0800, Kewen.Lin wrote:
>> Thanks for the details on debugging support, but IIUC with this workaround
>> being adopted, the debuggability on hidden args are already broken, aren't?
> 
> No.
> In the correct program case, which should be the usual case, the caller will
> pass the arguments and one should be able to see the values in the debugger
> even if the function doesn't actually use those arguments.
> If the caller is buggy and doesn't pass those arguments, one should be able
> to see garbage values for those arguments and perhaps that way figure out
> that the program is buggy and fix it.

But it's not true with Ajit's current implementation that is lying args are
passed in r11..., so whatever the caller is usual (in argument save area) or
not (not such value), the values are all broken.

> 
>> Since with a usual caller, the actual argument is passed in argument save
>> area, but the callee debug information says the location is %r11 or some
>> other stack slot.
>>
>> I think the difficulty is that: with this workaround, for some arguments we
>> are lying they are not passed in argument save area, then we have to pretend
>> they are passed in r11,r12..., but in fact these registers are not valid to
>> pass arguments, so it's unreasonable and confusing.  With your explanation,
>> I agree that stripping DECL_ARGUMENTS chains isn't a good way to eliminate
>> this confusion, maybe always using GP_ARG_MIN_REG/GP_ARG_MAX_REG for things
>> exceeding GP_ARG_MAX_REG can reduce the unreasonableness (but still confusing
>> IMHO).
> 
> If those arguments aren't passed in r11/r12, but in memory, the workaround
> shouldn't pretend they are passed somewhere where they aren't actually
> passed.

Unfortunately the current implementation doesn't conform this, I misunderstood
you knew that.

> Instead, it should load them from the memory where they are actually
> normally passed.
> What needs to be ensured though is that those arguments are for -O0 loaded
> from those stack slots and saved to different stack slots (inside of the
> callee frame, rather than in caller's frame), for -O1+ just not loaded at
> all and pretended to just live in the caller's frame, and most importantly
> ensure that the callee doesn't try to think there is a parameter save area
> in the caller's frame which it can use for random saving related or
> unrelated data.  So, e.g. REG_EQUAL/REG_EQUIV shouldn't be used, nor tell
> that the 1st-8th arguments could be saved to the parameter save area.
> So, for the 1st-8th arguments it really should act as if there is no
> parameter save area and for the DECL_HIDDEN_STRING_LENGTH ones after it
> as it those are passed in memory, but as if that memory is owned by the
> caller, not callee, so it is not correct to modify that memory.

Now I got your points.  I like this proposal and also believe it makes more
sense on both the resulted assembly and the debuggability support, though
it sounds the implementation has to be more complicated than what's done.

Thanks for all the inputs!!

BR,
Kewen
diff mbox series

Patch

diff --git a/gcc/config/rs6000/rs6000-call.cc b/gcc/config/rs6000/rs6000-call.cc
index 1f8f93a2ee7..fd823c66ea2 100644
--- a/gcc/config/rs6000/rs6000-call.cc
+++ b/gcc/config/rs6000/rs6000-call.cc
@@ -64,7 +64,7 @@ 
 #include "ppc-auxv.h"
 #include "targhooks.h"
 #include "opts.h"
-
+#include "tree-dfa.h"
 #include "rs6000-internal.h"
 
 #ifndef TARGET_PROFILE_KERNEL
@@ -584,6 +584,31 @@  init_cumulative_args (CUMULATIVE_ARGS *cum, tree fntype,
   if (incoming || cum->prototype)
     cum->nargs_prototype = n_named_args;
 
+  /* When the buggy C/C++ wrappers call the function with fewer arguments
+     than it actually has and doesn't expect the parameter save area on the
+     caller side because of that while the callee expects it and the callee
+     actually stores something in the parameter save area, it corrupts
+     whatever is in the caller stack frame at that location.  */
+  unsigned int num_args = 0;
+  unsigned int hidden_length = 0;
+
+  for (tree arg = DECL_ARGUMENTS (current_function_decl);
+       arg; arg = DECL_CHAIN (arg))
+    {
+      num_args++;
+      if (DECL_HIDDEN_STRING_LENGTH (arg))
+	{
+	  tree parmdef = ssa_default_def (cfun, arg);
+	  if (parmdef == NULL || has_zero_uses (parmdef))
+	    {
+	      cum->hidden_string_length = 1;
+	      hidden_length++;
+	    }
+	}
+   }
+
+  cum->actual_parm_length = num_args - hidden_length;
+
   /* Check for a longcall attribute.  */
   if ((!fntype && rs6000_default_long_calls)
       || (fntype
@@ -1857,7 +1882,14 @@  rs6000_function_arg (cumulative_args_t cum_v, const function_arg_info &arg)
 
 	  return rs6000_finish_function_arg (mode, rvec, k);
 	}
-      else if (align_words < GP_ARG_NUM_REG)
+     /* When the buggy C/C++ wrappers call the function with fewer arguments
+	than it actually has and doesn't expect the parameter save area on the
+	caller side because of that while the callee expects it and the callee
+	actually stores something in the parameter save area, it corrupts
+	whatever is in the caller stack frame at that location.  */
+      else if (align_words < GP_ARG_NUM_REG
+	       || (cum->hidden_string_length
+	       && cum->actual_parm_length <= GP_ARG_NUM_REG))
 	{
 	  if (TARGET_32BIT && TARGET_POWERPC64)
 	    return rs6000_mixed_function_arg (mode, type, align_words);
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 68bc45d65ba..60f23f33879 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -1490,6 +1490,13 @@  typedef struct rs6000_args
   int named;			/* false for varargs params */
   int escapes;			/* if function visible outside tu */
   int libcall;			/* If this is a compiler generated call.  */
+  /* Actual parameter length ignoring hidden parameter.
+     This is done to C++ wrapper calling fortran procedures
+     which has hidden parameter that are not used.  */
+  unsigned int actual_parm_length;
+  /* Set if there is hidden parameters while calling C++ wrapper to
+     fortran procedure.  */
+  unsigned int hidden_string_length : 1;
 } CUMULATIVE_ARGS;
 
 /* Initialize a variable CUM of type CUMULATIVE_ARGS