Patchwork [RFC] Bug handling SUBREG (MEM) - MEM having side-effects?

login
register
mail settings
Submitter Tejas Belagod
Date Oct. 16, 2012, 4:29 p.m.
Message ID <507D8B74.4090900@arm.com>
Download mbox | patch
Permalink /patch/191828/
State New
Headers show

Comments

Tejas Belagod - Oct. 16, 2012, 4:29 p.m.
Hi,

I seem to have uncovered what seems to be a bug with handling SUBREG (MEM) with
the MEM having side-effects while testing aarch64-4.7. This bug seems to be
latent on trunk.

Here is a test case reduced from gcc.c-torture/execute/scal-to-vec1.c.

#define vector(elcount, type)  \
__attribute__((vector_size((elcount)*sizeof(type)))) type

#define vidx(type, vec, idx) (*((type *) &(vec) + idx))

#define operl(a, b, op) (a op b)
#define operr(a, b, op) (b op a)

#define check(type, count, vec0, vec1, num, op, lr) \
do {\
        int __i; \
        for (__i = 0; __i < count; __i++) {\
            if (vidx (type, vec1, __i) != oper##lr (num, vidx (type, vec0, __i),
op)) \
                __builtin_abort (); \
        }\
} while (0)

#define veccompare(type, count, v0, v1) \
do {\
        int __i; \
        for (__i = 0; __i < count; __i++) { \
            if (vidx (type, v0, __i) != vidx (type, v1, __i)) \
                __builtin_abort (); \
        } \
} while (0)

volatile int one = 1;

int main (int argc, char *argv[]) {

        vector(8, short) v0 = {one, 1, 2, 3, 4, 5, 6, 7};
        vector(8, short) v1;
        v1 = 2 << v0;   check (short, 8, v0, v1, 2, <<, l);
        return 0;
}

When compiled with -O1, the combiner tries to combine these two instructions:

(insn 87 86 89 4 (set (reg:SI 154 [ MEM[base: D.2294_40, offset: 0B] ])
            (sign_extend:SI (mem:HI (pre_inc:DI (reg:DI 113 [ ivtmp.23 ])) [0
MEM[base: D.2294_40, offset: 0B]+0 S2 A16]))) scal-to-vec1.c:35 54
{*extendhisi2_aarch64}
         (expr_list:REG_INC (reg:DI 113 [ ivtmp.23 ])
            (nil)))

(insn 89 87 90 4 (set (reg:SI 155)
            (ashift:SI (reg:SI 158)
                (subreg:QI (reg:SI 154 [ MEM[base: D.2294_40, offset: 0B] ]) 0)))
scal-to-vec1.c:35 331 {*ashlsi3_insn}
         (expr_list:REG_DEAD (reg:SI 154 [ MEM[base: D.2294_40, offset: 0B] ])
            (nil)))

It tries to forward-propagate reg 154 to insn 89 and the intermediate RTL looks
like this:

(insn 89 87 90 4 (set (reg:SI 155)
            (ashift:SI (reg:SI 158)
                (subreg:QI ((sign_extend:SI (mem:HI (pre_inc:DI (reg:DI 113
						[ivtmp.23 ])) [0 MEM[base:
					D.2294_40, offset: 0B]+0 S2 A16])))

The combiner then recursively tries to simplify this RTL. During simplifying the
subreg part of the RTL i.e.

(subreg:QI ((sign_extend:SI (mem:HI (pre_inc:DI (reg:DI 113 ivtmp.23 ])) [0
						MEM[base: D.2294_40, offset:
							0B]+0 S2 A16])))

combine_simplify_rtx () calls simplify_subreg () and this returns

(subreg:QI (mem:HI (pre_inc:DI (reg:DI 113 ivtmp.23 ])) [0 		
				MEM[base: D.2294_40, offset: 0B]+0 S2 A16])))

The code that does this is in combine.c:combine_simplify_rtx ()

...
	rtx temp;
	temp = simplify_subreg (mode, SUBREG_REG (x), op0_mode,
				SUBREG_BYTE (x));
	if (temp)
	  return temp;
          }
...

So, the above tree is returned as is and insn 87 gets combined into insn 89 to
become

(insn 89 87 90 4 (set (reg:SI 155)
            (ashift:SI (reg:SI 158)
	   (subreg:QI (mem:HI (pre_inc:DI (reg:DI 113 ivtmp.23 ])) [0 		
				MEM[base: D.2294_40, offset: 0B]+0 S2 A16])))

This ends up in reload and in cleanup_subreg_operands () the subreg:QI (mem:HI)
gets narrowed to mem:QI, but the interesting bit is that the pre_inc stays the
same, so becomes a pre_inc for a QI mode address rather than the original HI
mode address and therefore instead of addressing a byte and incrementing the
address by 2 to get the next half word, the address gets incremented by 1!

Also, I feel this is a latent bug on the trunk. This is because while the
combiner takes the same path on the trunk, the address expression is not a
pre_inc there - the trunk wants to keep the addressing to a mem (plus (reg)
(const)) and therefore combine_simplify_rtx () simplifies the subreg expression
right down to the narrowed memref i.e. mem:QI (plus:DI (reg:DI) (const)). This
gets combined to insn 89 to become

(insn 89 87 90 4 (set (reg:SI 155)
            (ashift:SI (reg:SI 158) (mem:QI (plus:DI (reg:DI) (const))))

This then, gets later checked with recog () and since the predicate for operand
2 does not allow memory operands for shifts in aarch64.md, does not get combined
and all is well.

After the discussions Richard Earnshaw had on IRC with Andrew Pinski, it was
felt that it was best to fix the standard predicate
'general_operand' to not allow SUBREG (MEM) with side-effects as it has known
issues associated with it - particularly reload not being able to deal with such
cases. 'general_operand' currently outlaws cases like paradoxical SUBREG (MEM)
on targets with insn scheduling and SUBREG (MEM) with non-zero SUBREG_BYTE. This
is another case it should not allow. Here is a patch that tightens
general_operands to not allow SUBREG (MEM) with MEM having side-effects.

I would like to hear some thoughts on this.

Thanks,
Tejas Belagod
ARM.

Patch

diff --git a/gcc/recog.c b/gcc/recog.c
index 39a9dda..fa323df 100644
--- a/gcc/recog.c
+++ b/gcc/recog.c
@@ -984,6 +984,11 @@  general_operand (rtx op, enum machine_mode mode)
    	  && MEM_P (sub))
    	return 0;

+      /* Similarly, avoid SUBREG (MEM) with side effects (e.g. volatile
+	 mem, PRE_INC address etc).  */
+      if (!reload_completed && MEM_P (sub) && side_effects_p (sub))
+	return 0;
+
          /* FLOAT_MODE subregs can't be paradoxical.  Combine will occasionally
    	 create such rtl, and we must reject it.  */
          if (SCALAR_FLOAT_MODE_P (GET_MODE (op))