diff mbox series

Fix wrong code with small structure return on PowerPC

Message ID 14904386.2vhstX5eeo@polaris
State New
Headers show
Series Fix wrong code with small structure return on PowerPC | expand

Commit Message

Eric Botcazou Sept. 11, 2017, 10:26 a.m. UTC
Hi,

this is a bug originally reported in Ada on 32-bit PowerPC with the SVR4 ABI 
(hence not Linux) and reproducible in C with the attached testcase at -O1.

The problem lies in function 'foo':

  struct S ret;
  char r, s, c1, c2;
  char *p = &r;

  s = bar (&p);
  if (s)
    c2 = *p;
  c1 = 0;

  ret.c1 = c1;
  ret.c2 = c2;
  return ret;

Since the call to bar returns 0, c2 is uninitialized at run time (but this 
doesn't matter for correctness since its value is never read).  Now both c1 
and c2 are represented at the RTL level by unsigned promoted subregs, i.e. 
SUBREGs verifying SUBREG_PROMOTED_VAR_P and SUBREG_PROMOTED_UNSIGNED_P

As a consequence, when store_fixed_bit_field_1 stores the 8-bit values into 
the 'ret' object, it considers that the underlying 32-bit objects have only 8 
bits set and the rest cleared so it doesn't mask the other 24 bits.  That's 
OK for c1 but not for c2, since c2 is uninitialized so the bits are random.

This appears to be an inherent weakness of the promoted subregs mechanism, but 
I don't think that we want to go for an upheaval at this point.  So the patch 
addresses the problem in an ad-hoc manner directly in store_fixed_bit_field_1 
and yields the expected 8-bit insertion:

@@ -26,7 +26,7 @@
        lwz 9,12(1)
        lbz 31,0(9)
 .L3:
-       slwi 3,31,16
+       rlwinm 3,31,16,8,15
        lwz 0,36(1)
        mtlr 0
        lwz 31,28(1)

Tested on x86-64/Linux and PowerPC64/Linux, OK for the mainline?


2017-09-11  Eric Botcazou  <ebotcazou@adacore.com>

	* expmed.c (store_fixed_bit_field_1): Force the masking if the value
 	is an unsigned promoted SUBREG of a sufficiently large object.


2017-09-11  Eric Botcazou  <ebotcazou@adacore.com>

	* gcc.c-torture/execute/20170911-1.c: New test.

Comments

Eric Botcazou Sept. 11, 2017, 10:37 a.m. UTC | #1
> this is a bug originally reported in Ada on 32-bit PowerPC with the SVR4 ABI
> (hence not Linux) and reproducible in C with the attached testcase at -O1.

With the right C testcase this time...
Richard Biener Sept. 11, 2017, 12:05 p.m. UTC | #2
On September 11, 2017 12:26:09 PM GMT+02:00, Eric Botcazou <ebotcazou@adacore.com> wrote:
>Hi,
>
>this is a bug originally reported in Ada on 32-bit PowerPC with the
>SVR4 ABI 
>(hence not Linux) and reproducible in C with the attached testcase at
>-O1.
>
>The problem lies in function 'foo':
>
>  struct S ret;
>  char r, s, c1, c2;
>  char *p = &r;
>
>  s = bar (&p);
>  if (s)
>    c2 = *p;
>  c1 = 0;
>
>  ret.c1 = c1;
>  ret.c2 = c2;
>  return ret;
>
>Since the call to bar returns 0, c2 is uninitialized at run time (but
>this 
>doesn't matter for correctness since its value is never read).  Now
>both c1 
>and c2 are represented at the RTL level by unsigned promoted subregs,
>i.e. 
>SUBREGs verifying SUBREG_PROMOTED_VAR_P and SUBREG_PROMOTED_UNSIGNED_P
>
>As a consequence, when store_fixed_bit_field_1 stores the 8-bit values
>into 
>the 'ret' object, it considers that the underlying 32-bit objects have
>only 8 
>bits set and the rest cleared so it doesn't mask the other 24 bits. 
>That's 
>OK for c1 but not for c2, since c2 is uninitialized so the bits are
>random.
>
>This appears to be an inherent weakness of the promoted subregs
>mechanism, but 
>I don't think that we want to go for an upheaval at this point.  So the
>patch 
>addresses the problem in an ad-hoc manner directly in
>store_fixed_bit_field_1 
>and yields the expected 8-bit insertion:
>
>@@ -26,7 +26,7 @@
>        lwz 9,12(1)
>        lbz 31,0(9)
> .L3:
>-       slwi 3,31,16
>+       rlwinm 3,31,16,8,15
>        lwz 0,36(1)
>        mtlr 0
>        lwz 31,28(1)
>
>Tested on x86-64/Linux and PowerPC64/Linux, OK for the mainline?

I think the issue is that we set SUBREG_PROMOTED_* on something that is possibly not so (aka uninitialized in this case). We may only set it if either the ABI or a previous operation forced it to. Maybe this is also the reason why we need this zero init pass in some cases (though that isn't a full solution either). 

Richard. 

>
>2017-09-11  Eric Botcazou  <ebotcazou@adacore.com>
>
>	* expmed.c (store_fixed_bit_field_1): Force the masking if the value
> 	is an unsigned promoted SUBREG of a sufficiently large object.
>
>
>2017-09-11  Eric Botcazou  <ebotcazou@adacore.com>
>
>	* gcc.c-torture/execute/20170911-1.c: New test.
Eric Botcazou Sept. 11, 2017, 7:59 p.m. UTC | #3
> I think the issue is that we set SUBREG_PROMOTED_* on something that is
> possibly not so (aka uninitialized in this case).

Yes, that's what I called inherent weakness of the promoted subregs mechanism.

> We may only set it if either the ABI or a previous operation forced it to.
> Maybe this is also the reason why we need this zero init pass in some cases
> (though that isn't a full solution either).

Do you think that we should zero-init all the unsigned promoted subregs (and 
sign-extend-init all the signed promoted subregs)?  That sounds like a big 
hammer to me, but I can give it a try.
Richard Biener Sept. 13, 2017, 9:17 a.m. UTC | #4
On Mon, Sep 11, 2017 at 9:59 PM, Eric Botcazou <ebotcazou@adacore.com> wrote:
>> I think the issue is that we set SUBREG_PROMOTED_* on something that is
>> possibly not so (aka uninitialized in this case).
>
> Yes, that's what I called inherent weakness of the promoted subregs mechanism.
>
>> We may only set it if either the ABI or a previous operation forced it to.
>> Maybe this is also the reason why we need this zero init pass in some cases
>> (though that isn't a full solution either).
>
> Do you think that we should zero-init all the unsigned promoted subregs (and
> sign-extend-init all the signed promoted subregs)?  That sounds like a big
> hammer to me, but I can give it a try.

My suggestion would be to not set SUBREG_PROMOTED_* on "everything"
(which we seem to do).  zero-initing looks like the easier way to deal with
the situation though.

ISTR SUBREG_PROMOTED_* gets set by RTL expansion, the problematic
one should be the one in expr.c setting it on all SSA_NAMEs, maybe it is
enough to avoid setting it for SSA_NAME_IS_DEFAULT_DEF ones?

Richard.

> --
> Eric Botcazou
Eric Botcazou Sept. 25, 2017, 9:35 a.m. UTC | #5
> ISTR SUBREG_PROMOTED_* gets set by RTL expansion, the problematic
> one should be the one in expr.c setting it on all SSA_NAMEs, maybe it is
> enough to avoid setting it for SSA_NAME_IS_DEFAULT_DEF ones?

No, it's not enough in this case, you need to work a little harder and look at 
the arguments of a PHI node because the SSA_NAME_IS_DEFAULT_DEF name is never 
materialized in the RTL; so the attached patch works in this case.


	* expr.c (expand_expr_real_1) <expand_decl_rtl>: For a SSA_NAME, do
	not set SUBREG_PROMOTED_VAR_P on the sub-register if it may contain
	uninitialized bits.
Richard Biener Sept. 25, 2017, 10:04 a.m. UTC | #6
On Mon, Sep 25, 2017 at 11:35 AM, Eric Botcazou <ebotcazou@adacore.com> wrote:
>> ISTR SUBREG_PROMOTED_* gets set by RTL expansion, the problematic
>> one should be the one in expr.c setting it on all SSA_NAMEs, maybe it is
>> enough to avoid setting it for SSA_NAME_IS_DEFAULT_DEF ones?
>
> No, it's not enough in this case, you need to work a little harder and look at
> the arguments of a PHI node because the SSA_NAME_IS_DEFAULT_DEF name is never
> materialized in the RTL; so the attached patch works in this case.
>
>
>         * expr.c (expand_expr_real_1) <expand_decl_rtl>: For a SSA_NAME, do
>         not set SUBREG_PROMOTED_VAR_P on the sub-register if it may contain
>         uninitialized bits.

Reading the patch I think that it gets conservativeness wrong -- shouldn't it be

  if (is_definitely_initialized)
   {
      SUBREG_PROMOTED_VAR_P (temp) = 1;
      SUBREG_PROMOTED_SET (temp, unsignedp);
   }

?  Of course it's not easy to compute is_definitely_initialized conservatively
in an ad-hoc way at this place.  It should be relatively straight-forward to
do a conservative computation somewhere in cfgexpand.c by propagating
across SSA edges and recording a flag on SSA names though.  I assume
we can take load destinations as fully initialized (should extend properly)
as well as call results (the ABI should extend, eventually we can query
the target if it does), likewise for function arguments.

On your patch:

+             /* Try to detect if the register contains uninitialized bits.  */
+             if (SSA_NAME_IS_DEFAULT_DEF (ssa_name))
+               maybe_uninitialized = true;

if you use ssa_undefined_value_p (ssa_name[, true]) you'd get function
paramters not undefined (which is probably desired?).  Likewise
partial initialized complex would get uninitialized (if passed , true).

Same inside the loop over PHI args though I wonder how pessimizing
it would be to simply do

  if (ssa_undefined_value_p (ssa_name, true)
      || gimple_code (SSA_NAME_DEF_STMT (ssa_name)) == GIMPLE_PHI)
    maybe_uninitialized = true;

thus make all PHIs possibly uninitialized (your code wouldn't catch a
chained PHI with undef arg).

As said, a better solution would be to do a definitely-initialized
mini-propagation
at RTL expansion time.

I don't stand in the way of "improving" things with your patch (besides the
ssa_undefined_value_p use).

Thanks,
Richard.

> --
> Eric Botcazou
Eric Botcazou Oct. 3, 2017, 6:39 p.m. UTC | #7
> Reading the patch I think that it gets conservativeness wrong -- shouldn't
> it be
> 
>   if (is_definitely_initialized)
>    {
>       SUBREG_PROMOTED_VAR_P (temp) = 1;
>       SUBREG_PROMOTED_SET (temp, unsignedp);
>    }
> 
> ?  Of course it's not easy to compute is_definitely_initialized
> conservatively in an ad-hoc way at this place.  It should be relatively
> straight-forward to do a conservative computation somewhere in cfgexpand.c
> by propagating across SSA edges and recording a flag on SSA names though. 
> I assume we can take load destinations as fully initialized (should extend
> properly) as well as call results (the ABI should extend, eventually we can
> query the target if it does), likewise for function arguments.

Yes, that's why the comment read "Try to detect if " and not "Detect if ".

> On your patch:
> 
> +             /* Try to detect if the register contains uninitialized bits. 
> */ +             if (SSA_NAME_IS_DEFAULT_DEF (ssa_name))
> +               maybe_uninitialized = true;
> 
> if you use ssa_undefined_value_p (ssa_name[, true]) you'd get function
> paramters not undefined (which is probably desired?).  Likewise
> partial initialized complex would get uninitialized (if passed , true).

Ah, yes, I overlooked that.

> Same inside the loop over PHI args though I wonder how pessimizing
> it would be to simply do
> 
>   if (ssa_undefined_value_p (ssa_name, true)
> 
>       || gimple_code (SSA_NAME_DEF_STMT (ssa_name)) == GIMPLE_PHI)
> 
>     maybe_uninitialized = true;
> 
> thus make all PHIs possibly uninitialized (your code wouldn't catch a
> chained PHI with undef arg).

Too big a hammer for such a very rare bug I think.

> As said, a better solution would be to do a definitely-initialized
> mini-propagation at RTL expansion time.

I'm not sure if we really need to propagate.  What about the attached patch?  
It computes at expansion time whether the partition the SSA name is a member 
of contains an undefined value and, if so, doesn't set the promoted bit for 
the SUBREG.  My gut feeling is that it's sufficient in practice.


	* tree-outof-ssa.h (ssaexpand): Add partitions_for_undefined_values.
	(always_initialized_rtx_for_ssa_name_p): New predicate.
	* tree-outof-ssa.c (remove_ssa_form): Initialize new field of SA.
	(finish_out_of_ssa): Free new field of SA.
	* tree-ssa-coalesce.h (get_undefined_value_partitions): Declare.
	* tree-ssa-coalesce.c: Include tree-ssa.h.
	(get_parm_default_def_partitions): Remove extern keyword.
	(get_undefined_value_partitions): New function.
	* expr.c (expand_expr_real_1) <expand_decl_rtl>: For a SSA_NAME, do
	not set SUBREG_PROMOTED_VAR_P on the sub-register if it may contain
	uninitialized bits.
Richard Biener Oct. 5, 2017, 8:57 a.m. UTC | #8
On Tue, Oct 3, 2017 at 8:39 PM, Eric Botcazou <ebotcazou@adacore.com> wrote:
>> Reading the patch I think that it gets conservativeness wrong -- shouldn't
>> it be
>>
>>   if (is_definitely_initialized)
>>    {
>>       SUBREG_PROMOTED_VAR_P (temp) = 1;
>>       SUBREG_PROMOTED_SET (temp, unsignedp);
>>    }
>>
>> ?  Of course it's not easy to compute is_definitely_initialized
>> conservatively in an ad-hoc way at this place.  It should be relatively
>> straight-forward to do a conservative computation somewhere in cfgexpand.c
>> by propagating across SSA edges and recording a flag on SSA names though.
>> I assume we can take load destinations as fully initialized (should extend
>> properly) as well as call results (the ABI should extend, eventually we can
>> query the target if it does), likewise for function arguments.
>
> Yes, that's why the comment read "Try to detect if " and not "Detect if ".
>
>> On your patch:
>>
>> +             /* Try to detect if the register contains uninitialized bits.
>> */ +             if (SSA_NAME_IS_DEFAULT_DEF (ssa_name))
>> +               maybe_uninitialized = true;
>>
>> if you use ssa_undefined_value_p (ssa_name[, true]) you'd get function
>> paramters not undefined (which is probably desired?).  Likewise
>> partial initialized complex would get uninitialized (if passed , true).
>
> Ah, yes, I overlooked that.
>
>> Same inside the loop over PHI args though I wonder how pessimizing
>> it would be to simply do
>>
>>   if (ssa_undefined_value_p (ssa_name, true)
>>
>>       || gimple_code (SSA_NAME_DEF_STMT (ssa_name)) == GIMPLE_PHI)
>>
>>     maybe_uninitialized = true;
>>
>> thus make all PHIs possibly uninitialized (your code wouldn't catch a
>> chained PHI with undef arg).
>
> Too big a hammer for such a very rare bug I think.
>
>> As said, a better solution would be to do a definitely-initialized
>> mini-propagation at RTL expansion time.
>
> I'm not sure if we really need to propagate.  What about the attached patch?
> It computes at expansion time whether the partition the SSA name is a member
> of contains an undefined value and, if so, doesn't set the promoted bit for
> the SUBREG.  My gut feeling is that it's sufficient in practice.

I'll take your word for it ;)

The patch is ok.

Thanks,
Richard.

>
>         * tree-outof-ssa.h (ssaexpand): Add partitions_for_undefined_values.
>         (always_initialized_rtx_for_ssa_name_p): New predicate.
>         * tree-outof-ssa.c (remove_ssa_form): Initialize new field of SA.
>         (finish_out_of_ssa): Free new field of SA.
>         * tree-ssa-coalesce.h (get_undefined_value_partitions): Declare.
>         * tree-ssa-coalesce.c: Include tree-ssa.h.
>         (get_parm_default_def_partitions): Remove extern keyword.
>         (get_undefined_value_partitions): New function.
>         * expr.c (expand_expr_real_1) <expand_decl_rtl>: For a SSA_NAME, do
>         not set SUBREG_PROMOTED_VAR_P on the sub-register if it may contain
>         uninitialized bits.
>
> --
> Eric Botcazou
Eric Botcazou Oct. 8, 2017, 9:15 p.m. UTC | #9
> I'll take your word for it ;)

Thanks.  Testing on PowerPC64 revealed a couple of nits:

 1. SSA names with zero uses need to be excluded from the computation, because 
the first SSA name in a function returning a GIMPLE type is associated with 
the RESULT_DECL and is undefined, so it would propagate the undefinedness to 
every SSA name collapsed with the RESULT_DECL, i.e. unduly pessimization.

 2. Removing the SUBREG_PROMOTED_VAR_P flag disables the trick present in 
expand_gimple_stmt_1 for the LHS of assignment statements, so you end up with 
(more) SUBREGs on the LHS of moves in RTL, hence the fixlet for loop-iv.c.

I bootstrapped/regtested it on x86-64, SPARC64, Aarch64 and PowerPC64/Linux, 
and compared the code generated for the tests in gcc.c-torture/compile at -O2: 
very few changes for the first 3, but around 30 tests changed for PowerPC64, 
all with uninitialized variables AFAICS.

Applied on the mainline.


2017-10-08  Eric Botcazou  <ebotcazou@adacore.com>

	* tree-outof-ssa.h (ssaexpand): Add partitions_for_undefined_values.
	(always_initialized_rtx_for_ssa_name_p): New predicate.
	* tree-outof-ssa.c (remove_ssa_form): Initialize new field of SA.
	(finish_out_of_ssa): Free new field of SA.
	* tree-ssa-coalesce.h (get_undefined_value_partitions): Declare.
	* tree-ssa-coalesce.c: Include tree-ssa.h.
	(get_parm_default_def_partitions): Remove extern keyword.
	(get_undefined_value_partitions): New function.
	* expr.c (expand_expr_real_1) <expand_decl_rtl>: For a SSA_NAME, do
	not set SUBREG_PROMOTED_VAR_P on the sub-register if it may contain
	uninitialized bits.
	* loop-iv.c (iv_get_reaching_def): Disqualify all subregs.


2017-10-08  Eric Botcazou  <ebotcazou@adacore.com>

	* gcc.c-torture/execute/20171008-1.c: New test.
diff mbox series

Patch

Index: expmed.c
===================================================================
--- expmed.c	(revision 251906)
+++ expmed.c	(working copy)
@@ -1213,13 +1213,25 @@  store_fixed_bit_field_1 (rtx op0, scalar
     }
   else
     {
-      int must_and = (GET_MODE_BITSIZE (value_mode) != bitsize
-		      && bitnum + bitsize != GET_MODE_BITSIZE (mode));
+      /* Compute whether we must mask the value in order to make sure that the
+	 bits outside the bitfield are all cleared.  Note that, even though the
+	 value has the right size, if it has a mode different from MODE, then
+	 converting it to MODE may unmask uninitialized bits in the case where
+	 it is an unsigned promoted SUBREG of a sufficiently large object.  */
+      const bool must_mask
+	= (GET_MODE_BITSIZE (value_mode) != bitsize
+	   || (value_mode != mode
+	       && GET_CODE (value) == SUBREG
+	       && SUBREG_PROMOTED_VAR_P (value)
+	       && SUBREG_PROMOTED_UNSIGNED_P (value)
+	       && GET_MODE_SIZE (GET_MODE (SUBREG_REG (value)))
+		  >= GET_MODE_SIZE (mode)))
+	  && bitnum + bitsize != GET_MODE_BITSIZE (mode);
 
       if (value_mode != mode)
 	value = convert_to_mode (mode, value, 1);
 
-      if (must_and)
+      if (must_mask)
 	value = expand_binop (mode, and_optab, value,
 			      mask_rtx (mode, 0, bitsize, 0),
 			      NULL_RTX, 1, OPTAB_LIB_WIDEN);