diff mbox

[0/7] Type promotion pass and elimination of zext/sext

Message ID CAFiYyc3aRPokVTdvWU8FkcorSekOKYeuJ2VAfU7Lg6Crp+UwLA@mail.gmail.com
State New
Headers show

Commit Message

Richard Biener Nov. 3, 2015, 2:40 p.m. UTC
On Mon, Nov 2, 2015 at 10:17 AM, Kugan
<kugan.vivekanandarajah@linaro.org> wrote:
>
>
> On 29/10/15 02:45, Richard Biener wrote:
>> On Tue, Oct 27, 2015 at 1:50 AM, kugan
>> <kugan.vivekanandarajah@linaro.org> wrote:
>>>
>>>
>>> On 23/10/15 01:23, Richard Biener wrote:
>>>>
>>>> On Thu, Oct 22, 2015 at 12:50 PM, Kugan
>>>> <kugan.vivekanandarajah@linaro.org> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 21/10/15 23:45, Richard Biener wrote:
>>>>>>
>>>>>> On Tue, Oct 20, 2015 at 10:03 PM, Kugan
>>>>>> <kugan.vivekanandarajah@linaro.org> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 07/09/15 12:53, Kugan wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> This a new version of the patch posted in
>>>>>>>> https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00226.html. I have done
>>>>>>>> more testing and spitted the patch to make it more easier to review.
>>>>>>>> There are still couple of issues to be addressed and I am working on
>>>>>>>> them.
>>>>>>>>
>>>>>>>> 1. AARCH64 bootstrap now fails with the commit
>>>>>>>> 94f92c36a83d66a893c3bc6f00a038ba3dbe2a6f. simplify-rtx.c is
>>>>>>>> mis-compiled
>>>>>>>> in stage2 and fwprop.c is failing. It looks to me that there is a
>>>>>>>> latent
>>>>>>>> issue which gets exposed my patch. I can also reproduce this in x86_64
>>>>>>>> if I use the same PROMOTE_MODE which is used in aarch64 port. For the
>>>>>>>> time being, I am using  patch
>>>>>>>> 0006-temporary-workaround-for-bootstrap-failure-due-to-co.patch as a
>>>>>>>> workaround. This meeds to be fixed before the patches are ready to be
>>>>>>>> committed.
>>>>>>>>
>>>>>>>> 2. vector-compare-1.c from c-c++-common/torture fails to assemble with
>>>>>>>> -O3 -g Error: unaligned opcodes detected in executable segment. It
>>>>>>>> works
>>>>>>>> fine if I remove the -g. I am looking into it and needs to be fixed as
>>>>>>>> well.
>>>>>>>
>>>>>>>
>>>>>>> Hi Richard,
>>>>>>>
>>>>>>> Now that stage 1 is going to close, I would like to get these patches
>>>>>>> accepted for stage1. I will try my best to address your review comments
>>>>>>> ASAP.
>>>>>>
>>>>>>
>>>>>> Ok, can you make the whole patch series available so I can poke at the
>>>>>> implementation a bit?  Please state the revision it was rebased on
>>>>>> (or point me to a git/svn branch the work resides on).
>>>>>>
>>>>>
>>>>> Thanks. Please find the patched rebated against trunk@229156. I have
>>>>> skipped the test-case readjustment patches.
>>>>
>>>>
>>>> Some quick observations.  On x86_64 when building
>>>
>>>
>>> Hi Richard,
>>>
>>> Thanks for the review.
>>>
>>>>
>>>> short bar (short y);
>>>> int foo (short x)
>>>> {
>>>>    short y = bar (x) + 15;
>>>>    return y;
>>>> }
>>>>
>>>> with -m32 -O2 -mtune=pentiumpro (which ends up promoting HImode regs)
>>>> I get
>>>>
>>>>    <bb 2>:
>>>>    _1 = (int) x_10(D);
>>>>    _2 = (_1) sext (16);
>>>>    _11 = bar (_2);
>>>>    _5 = (int) _11;
>>>>    _12 = (unsigned int) _5;
>>>>    _6 = _12 & 65535;
>>>>    _7 = _6 + 15;
>>>>    _13 = (int) _7;
>>>>    _8 = (_13) sext (16);
>>>>    _9 = (_8) sext (16);
>>>>    return _9;
>>>>
>>>> which looks fine but the VRP optimization doesn't trigger for the
>>>> redundant sext
>>>> (ranges are computed correctly but the 2nd extension is not removed).
>
> Thanks for the comments. Please fond the attached patches with which I
> am now getting
> cat .192t.optimized
>
> ;; Function foo (foo, funcdef_no=0, decl_uid=1406, cgraph_uid=0,
> symbol_order=0)
>
> foo (short int x)
> {
>   signed int _1;
>   int _2;
>   signed int _5;
>   unsigned int _6;
>   unsigned int _7;
>   signed int _8;
>   int _9;
>   short int _11;
>   unsigned int _12;
>   signed int _13;
>
>   <bb 2>:
>   _1 = (signed int) x_10(D);
>   _2 = _1;
>   _11 = bar (_2);
>   _5 = (signed int) _11;
>   _12 = (unsigned int) _11;
>   _6 = _12 & 65535;
>   _7 = _6 + 15;
>   _13 = (signed int) _7;
>   _8 = (_13) sext (16);
>   _9 = _8;
>   return _9;
>
> }
>
>
> There are still some redundancies. The asm difference after RTL
> optimizations is
>
> -       addl    $15, %eax
> +       addw    $15, %ax
>
>
>>>>
>>>> This also makes me notice trivial match.pd patterns are missing, like
>>>> for example
>>>>
>>>> (simplify
>>>>   (sext (sext@2 @0 @1) @3)
>>>>   (if (tree_int_cst_compare (@1, @3) <= 0)
>>>>    @2
>>>>    (sext @0 @3)))
>>>>
>>>> as VRP doesn't run at -O1 we must rely on those to remove rendudant
>>>> extensions,
>>>> otherwise generated code might get worse compared to without the pass(?)
>>>
>>>
>>> Do you think that we should enable this pass only when vrp is enabled.
>>> Otherwise, even when we do the simple optimizations you mentioned below, we
>>> might not be able to remove all the redundancies.
>>>
>>>>
>>>> I also notice that the 'short' argument does not get it's sign-extension
>>>> removed
>>>> as redundand either even though we have
>>>>
>>>> _1 = (int) x_8(D);
>>>> Found new range for _1: [-32768, 32767]
>>>>
>>>
>>> I am looking into it.
>>>
>>>> In the end I suspect that keeping track of the "simple" cases in the
>>>> promotion
>>>> pass itself (by keeping a lattice) might be a good idea (after we fix VRP
>>>> to do
>>>> its work).  In some way whether the ABI guarantees promoted argument
>>>> registers might need some other target hook queries.
>
> I tried adding it in the attached patch with record_visit_stmt to track
> whether an ssa would have value overflow or properly zero/sign extended
> in promoted mode. We can use this to eliminate some of the zero/sign
> extension at gimple level. As it is, it doesn't do much. If this is what
> you had in mind, I will extend it based on your feedback.
>
>
>>>>
>>>> Now onto the 0002 patch.
>>>>
>>>> +static bool
>>>> +type_precision_ok (tree type)
>>>> +{
>>>> +  return (TYPE_PRECISION (type)  == 8
>>>> +         || TYPE_PRECISION (type) == 16
>>>> +         || TYPE_PRECISION (type) == 32);
>>>> +}
>>>>
>>>> that's a weird function to me.  You probably want
>>>> TYPE_PRECISION (type) == GET_MODE_PRECISION (TYPE_MODE (type))
>>>> here?  And guard that thing with POINTER_TYPE_P || INTEGRAL_TYPE_P?
>>>>
>>>
>>> I will change this. (I have a patch which I am testing with other changes
>>> you have asked for)
>>>
>>>
>>>> +/* Return the promoted type for TYPE.  */
>>>> +static tree
>>>> +get_promoted_type (tree type)
>>>> +{
>>>> +  tree promoted_type;
>>>> +  enum machine_mode mode;
>>>> +  int uns;
>>>> +  if (POINTER_TYPE_P (type)
>>>> +      || !INTEGRAL_TYPE_P (type)
>>>> +      || !type_precision_ok (type))
>>>> +    return type;
>>>> +
>>>> +  mode = TYPE_MODE (type);
>>>> +#ifdef PROMOTE_MODE
>>>> +  uns = TYPE_SIGN (type);
>>>> +  PROMOTE_MODE (mode, uns, type);
>>>> +#endif
>>>> +  uns = TYPE_SIGN (type);
>>>> +  promoted_type = lang_hooks.types.type_for_mode (mode, uns);
>>>> +  if (promoted_type
>>>> +      && (TYPE_PRECISION (promoted_type) > TYPE_PRECISION (type)))
>>>> +    type = promoted_type;
>>>>
>>>> I think what you want to verify is that TYPE_PRECISION (promoted_type)
>>>> == GET_MODE_PRECISION (mode).
>>>> And to not even bother with this simply use
>>>>
>>>> promoted_type = build_nonstandard_integer_type (GET_MODE_PRECISION (mode),
>>>> uns);
>>>>
>>>
>>> I am changing this too.
>>>
>>>> You use a domwalk but also might create new basic-blocks during it
>>>> (insert_on_edge_immediate), that's a
>>>> no-no, commit edge inserts after the domwalk.
>>>
>>>
>>> I am sorry, I dont understand "commit edge inserts after the domwalk" Is
>>> there a way to do this in the current implementation?
>>
>> Yes, simply use gsi_insert_on_edge () and after the domwalk is done do
>> gsi_commit_edge_inserts ().
>>
>>>> ssa_sets_higher_bits_bitmap looks unused and
>>>> we generally don't free dominance info, so please don't do that.
>>>>
>>>> I fired off a bootstrap on ppc64-linux which fails building stage1 libgcc
>>>> with
>>>>
>>>> /abuild/rguenther/obj/./gcc/xgcc -B/abuild/rguenther/obj/./gcc/
>>>> -B/usr/local/powerpc64-unknown-linux-gnu/bin/
>>>> -B/usr/local/powerpc64-unknown-linux-gnu/lib/ -isystem
>>>> /usr/local/powerpc64-unknown-linux-gnu/include -isystem
>>>> /usr/local/powerpc64-unknown-linux-gnu/sys-include    -g -O2 -O2  -g
>>>> -O2 -DIN_GCC    -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual
>>>> -Wno-format -Wstrict-prototypes -Wmissing-prototypes
>>>> -Wold-style-definition  -isystem ./include   -fPIC -mlong-double-128
>>>> -mno-minimal-toc -g -DIN_LIBGCC2 -fbuilding-libgcc
>>>> -fno-stack-protector   -fPIC -mlong-double-128 -mno-minimal-toc -I.
>>>> -I. -I../.././gcc -I../../../trunk/libgcc -I../../../trunk/libgcc/.
>>>> -I../../../trunk/libgcc/../gcc -I../../../trunk/libgcc/../include
>>>> -I../../../trunk/libgcc/../libdecnumber/dpd
>>>> -I../../../trunk/libgcc/../libdecnumber -DHAVE_CC_TLS  -o _divdi3.o
>>>> -MT _divdi3.o -MD -MP -MF _divdi3.dep -DL_divdi3 -c
>>>> ../../../trunk/libgcc/libgcc2.c \
>>>>            -fexceptions -fnon-call-exceptions -fvisibility=hidden
>>>> -DHIDE_EXPORTS
>>>> In file included from ../../../trunk/libgcc/libgcc2.c:56:0:
>>>> ../../../trunk/libgcc/libgcc2.c: In function ‘__divti3’:
>>>> ../../../trunk/libgcc/libgcc2.h:193:20: internal compiler error: in
>>>> expand_debug_locations, at cfgexpand.c:5277
>>>>
>
> With the attached patch, now I am running into Bootstrap comparison
> failure. I am looking into it. Please review this version so that I can
> address them while fixing this issue.

I notice


The basic "structure" thing still remains.  You walk over all uses and
defs in all stmts
in promote_all_stmts which ends up calling promote_ssa_if_not_promoted on all
uses and defs which in turn promotes (the "def") and then fixes up all
uses in all stmts.

Instead of this you should, in promote_all_stmts, walk over all uses doing what
fixup_uses does and then walk over all defs, doing what promote_ssa does.

+    case GIMPLE_NOP:
+       {
+         if (SSA_NAME_VAR (def) == NULL)
+           {
+             /* Promote def by fixing its type for anonymous def.  */
+             TREE_TYPE (def) = promoted_type;
+           }
+         else
+           {
+             /* Create a promoted copy of parameters.  */
+             bb = single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun));

I think the uninitialized vars are somewhat tricky and it would be best
to create a new uninit anonymous SSA name for them.  You can
have SSA_NAME_VAR != NULL and def _not_ being a parameter
btw.

+/* Return true if it is safe to promote the defined SSA_NAME in the STMT
+   itself.  */
+static bool
+safe_to_promote_def_p (gimple *stmt)
+{
+  enum tree_code code = gimple_assign_rhs_code (stmt);
+  if (gimple_vuse (stmt) != NULL_TREE
+      || gimple_vdef (stmt) != NULL_TREE
+      || code == ARRAY_REF
+      || code == LROTATE_EXPR
+      || code == RROTATE_EXPR
+      || code == VIEW_CONVERT_EXPR
+      || code == BIT_FIELD_REF
+      || code == REALPART_EXPR
+      || code == IMAGPART_EXPR
+      || code == REDUC_MAX_EXPR
+      || code == REDUC_PLUS_EXPR
+      || code == REDUC_MIN_EXPR)
+    return false;
+  return true;

huh, I think this function has an odd name, maybe
can_promote_operation ()?  Please
use TREE_CODE_CLASS (code) == tcc_reference for all _REF trees.

Note that as followup things like the rotates should be "expanded" like
we'd do on RTL (open-coding the thing).  And we'd need a way to
specify zero-/sign-extended loads.

+/* Return true if it is safe to promote the use in the STMT.  */
+static bool
+safe_to_promote_use_p (gimple *stmt)
+{
+  enum tree_code code = gimple_assign_rhs_code (stmt);
+  tree lhs = gimple_assign_lhs (stmt);
+
+  if (gimple_vuse (stmt) != NULL_TREE
+      || gimple_vdef (stmt) != NULL_TREE

I think the vuse/vdef check is bogus, you can have a use of 'i_3' in say
_2 = a[i_3];

+      || code == VIEW_CONVERT_EXPR
+      || code == LROTATE_EXPR
+      || code == RROTATE_EXPR
+      || code == CONSTRUCTOR
+      || code == BIT_FIELD_REF
+      || code == COMPLEX_EXPR
+      || code == ASM_EXPR
+      || VECTOR_TYPE_P (TREE_TYPE (lhs)))
+    return false;
+  return true;

ASM_EXPR can never appear here.  I think PROMOTE_MODE never
promotes vector types - what cases did you need to add VECTOR_TYPE_P for?

+/* Return true if the SSA_NAME has to be truncated to preserve the
+   semantics.  */
+static bool
+truncate_use_p (gimple *stmt)
+{
+  enum tree_code code = gimple_assign_rhs_code (stmt);

I think the description can be improved.  This is about stray bits set
beyond the original type, correct?

Please use NOP_EXPR wherever you use CONVERT_EXPR right how.

+                 if (TREE_CODE_CLASS (code)
+                     == tcc_comparison)
+                   promote_cst_in_stmt (stmt, promoted_type, true);

don't you always need to promote constant operands?

Richard.
diff mbox

Patch

diff --git a/gcc/tree-ssanames.c b/gcc/tree-ssanames.c
index 82fd4a1..80fcf70 100644
--- a/gcc/tree-ssanames.c
+++ b/gcc/tree-ssanames.c
@@ -207,7 +207,8 @@  set_range_info (tree name, enum value_range_type range_type,
   unsigned int precision = TYPE_PRECISION (TREE_TYPE (name));

   /* Allocate if not available.  */
-  if (ri == NULL)
+  if (ri == NULL
+      || (precision != ri->get_min ().get_precision ()))

and I think you need to clear range info on promoted SSA vars in the
promotion pass.