diff mbox

Fix for PRs 36043, 58744 and 65408

Message ID 20150314130238.GD16488@bubble.grove.modra.org
State New
Headers show

Commit Message

Alan Modra March 14, 2015, 1:02 p.m. UTC
This is Richi's prototype patch in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36043#c23 with fixes for
blocks larger than one reg, big-endian, and BLOCK_REG_PADDING.
I also removed the operand_subword_force since we may as well let
narrow_bit_field_mem in extract_bit_field do that for us.  It is
necessary to do the BLOCK_REG_PADDING shift after we've loaded the
block or else repeat the bit-field extraction in that case.

Bootstrapped and regression tested (-m32 and -m64) x86_64-linux and
powerpc64-linux.  OK to apply?

I'll also throw together a testcase or three.  For execute tests I'm
thinking of using sbrk to locate an odd sized struct such that access
past the end segfaults, rather than mmap/munmap as was done in the
pr36043 testcase.  Does that sound reasonable?

	PR target/65408
	PR target/58744
	PR middle-end/36043
	* calls.c (load_register_parameters): Don't load past end of
	mem unless suitably aligned.

Comments

H.J. Lu March 14, 2015, 1:14 p.m. UTC | #1
On Sat, Mar 14, 2015 at 6:02 AM, Alan Modra <amodra@gmail.com> wrote:
> This is Richi's prototype patch in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36043#c23 with fixes for
> blocks larger than one reg, big-endian, and BLOCK_REG_PADDING.
> I also removed the operand_subword_force since we may as well let
> narrow_bit_field_mem in extract_bit_field do that for us.  It is
> necessary to do the BLOCK_REG_PADDING shift after we've loaded the
> block or else repeat the bit-field extraction in that case.
>
> Bootstrapped and regression tested (-m32 and -m64) x86_64-linux and
> powerpc64-linux.  OK to apply?
>
> I'll also throw together a testcase or three.  For execute tests I'm
> thinking of using sbrk to locate an odd sized struct such that access
> past the end segfaults, rather than mmap/munmap as was done in the
> pr36043 testcase.  Does that sound reasonable?
>
>         PR target/65408
>         PR target/58744
>         PR middle-end/36043
>         * calls.c (load_register_parameters): Don't load past end of
>         mem unless suitably aligned.
>

Can you add a testcase in

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36043
Alan Modra March 14, 2015, 1:55 p.m. UTC | #2
On Sat, Mar 14, 2015 at 06:14:40AM -0700, H.J. Lu wrote:
> On Sat, Mar 14, 2015 at 6:02 AM, Alan Modra <amodra@gmail.com> wrote:
> > I'll also throw together a testcase or three.  For execute tests I'm
> > thinking of using sbrk to locate an odd sized struct such that access
> > past the end segfaults, rather than mmap/munmap as was done in the
> > pr36043 testcase.  Does that sound reasonable?
> 
> Can you add a testcase in
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36043

I was thinking that mmap/munmap is less portable than sbrk.  Hmm, a
grep over the testsuite says mmap is already used and
do-do run { target mmap } is available.  OK, I'm happy to jump that
way too.
Bernhard Reutner-Fischer March 14, 2015, 1:58 p.m. UTC | #3
On March 14, 2015 2:02:38 PM GMT+01:00, Alan Modra <amodra@gmail.com> wrote:

>I'll also throw together a testcase or three.  For execute tests I'm
>thinking of using sbrk to locate an odd sized struct such that access
>past the end segfaults, rather than mmap/munmap as was done in the
>pr36043 testcase.  Does that sound reasonable?

Well since sbrk was marked LEGACY in SUSv2 and was removed in SUSv3 (and still is in 1003.1-2008) I'm not sure it is wise to use it in new code.. Using it will bite testing on legacy-free setups, fwiw.

Cheers,
Mike Stump March 14, 2015, 5:51 p.m. UTC | #4
On Mar 14, 2015, at 6:58 AM, Bernhard Reutner-Fischer <rep.dot.nop@gmail.com> wrote:
> On March 14, 2015 2:02:38 PM GMT+01:00, Alan Modra <amodra@gmail.com> wrote:
> 
>> I'll also throw together a testcase or three.  For execute tests I'm
>> thinking of using sbrk to locate an odd sized struct such that access
>> past the end segfaults, rather than mmap/munmap as was done in the
>> pr36043 testcase.  Does that sound reasonable?
> 
> Well since sbrk was marked LEGACY in SUSv2 and was removed in SUSv3 (and still is in 1003.1-2008) I'm not sure it is wise to use it in new code.. Using it will bite testing on legacy-free setups, fwiw.

newlib doesn’t have mmap.  Indeed, some machines will never have mmap.  newlib has sbrk.
Jakub Jelinek March 14, 2015, 5:56 p.m. UTC | #5
On Sat, Mar 14, 2015 at 10:51:28AM -0700, Mike Stump wrote:
> On Mar 14, 2015, at 6:58 AM, Bernhard Reutner-Fischer <rep.dot.nop@gmail.com> wrote:
> > On March 14, 2015 2:02:38 PM GMT+01:00, Alan Modra <amodra@gmail.com> wrote:
> > 
> >> I'll also throw together a testcase or three.  For execute tests I'm
> >> thinking of using sbrk to locate an odd sized struct such that access
> >> past the end segfaults, rather than mmap/munmap as was done in the
> >> pr36043 testcase.  Does that sound reasonable?
> > 
> > Well since sbrk was marked LEGACY in SUSv2 and was removed in SUSv3 (and still is in 1003.1-2008) I'm not sure it is wise to use it in new code.. Using it will bite testing on legacy-free setups, fwiw.
> 
> newlib doesn’t have mmap.  Indeed, some machines will never have mmap.  newlib has sbrk.

Still, I think it is preferrable to test with mmap...

	Jakub
Mike Stump March 14, 2015, 6:27 p.m. UTC | #6
On Mar 14, 2015, at 10:56 AM, Jakub Jelinek <jakub@redhat.com> wrote:
>> newlib doesn’t have mmap.  Indeed, some machines will never have mmap.  newlib has sbrk.
> 
> Still, I think it is preferrable to test with mmap…

I don’t see anything wrong with going the target mmap direction…  my post was just to provide information…  not decide which is better.  I’d rather leave the issue to those with an opinion.
Jeff Law March 17, 2015, 7:28 p.m. UTC | #7
On 03/14/2015 07:02 AM, Alan Modra wrote:
> This is Richi's prototype patch in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36043#c23 with fixes for
> blocks larger than one reg, big-endian, and BLOCK_REG_PADDING.
> I also removed the operand_subword_force since we may as well let
> narrow_bit_field_mem in extract_bit_field do that for us.  It is
> necessary to do the BLOCK_REG_PADDING shift after we've loaded the
> block or else repeat the bit-field extraction in that case.
>
> Bootstrapped and regression tested (-m32 and -m64) x86_64-linux and
> powerpc64-linux.  OK to apply?
>
> I'll also throw together a testcase or three.  For execute tests I'm
> thinking of using sbrk to locate an odd sized struct such that access
> past the end segfaults, rather than mmap/munmap as was done in the
> pr36043 testcase.  Does that sound reasonable?
>
> 	PR target/65408
> 	PR target/58744
> 	PR middle-end/36043
> 	* calls.c (load_register_parameters): Don't load past end of
> 	mem unless suitably aligned.
I think this is probably a stage1 item.  Richi, Jakub, Joseph, do any of 
you think we should try to push this into gcc-5?

jeff
Alan Modra March 18, 2015, 4:22 a.m. UTC | #8
On Tue, Mar 17, 2015 at 01:28:41PM -0600, Jeff Law wrote:
> On 03/14/2015 07:02 AM, Alan Modra wrote:
> >	PR target/65408
> >	PR target/58744
> >	PR middle-end/36043
> >	* calls.c (load_register_parameters): Don't load past end of
> >	mem unless suitably aligned.
> I think this is probably a stage1 item.  Richi, Jakub, Joseph, do any of you
> think we should try to push this into gcc-5?

Some (severity) background to PR65408.  The bug came from SAP HANA
(en.wikipedia.org/wiki/SAP_HANA), a crash that happens on powerpc64
and powerpc64le.  aarch64 would also be susceptible to the crash since
it also loads 16 bytes for the 12-byte struct.  x86_64 only loads 12
bytes (i386.c:construct_container generates a parallel with a DImode
and SImode load).  However the underlying bug is there and hits x86_64
too for the pr58744 and pr36043 testcases..
Richard Biener March 18, 2015, 11:12 a.m. UTC | #9
On Wed, Mar 18, 2015 at 5:22 AM, Alan Modra <amodra@gmail.com> wrote:
> On Tue, Mar 17, 2015 at 01:28:41PM -0600, Jeff Law wrote:
>> On 03/14/2015 07:02 AM, Alan Modra wrote:
>> >     PR target/65408
>> >     PR target/58744
>> >     PR middle-end/36043
>> >     * calls.c (load_register_parameters): Don't load past end of
>> >     mem unless suitably aligned.
>> I think this is probably a stage1 item.  Richi, Jakub, Joseph, do any of you
>> think we should try to push this into gcc-5?
>
> Some (severity) background to PR65408.  The bug came from SAP HANA
> (en.wikipedia.org/wiki/SAP_HANA), a crash that happens on powerpc64
> and powerpc64le.  aarch64 would also be susceptible to the crash since
> it also loads 16 bytes for the 12-byte struct.  x86_64 only loads 12
> bytes (i386.c:construct_container generates a parallel with a DImode
> and SImode load).  However the underlying bug is there and hits x86_64
> too for the pr58744 and pr36043 testcases..

It's a very very very old bug though.  I'd be interested in any odd
code-generation difference for compiling, say, the linux kernel
(you _can_ get quite ugly code generated because of this fix).

I'm leaning towards waiting for stage1 and then consider a backport
to 5.1.

I'm sure the HAHA guys can work-around by forcing an extra temporary
on the stack and passing that.

Richard.

> --
> Alan Modra
> Australia Development Lab, IBM
Alan Modra April 13, 2015, 4:48 a.m. UTC | #10
On Wed, Mar 18, 2015 at 12:12:17PM +0100, Richard Biener wrote:
> On Wed, Mar 18, 2015 at 5:22 AM, Alan Modra <amodra@gmail.com> wrote:
> > On Tue, Mar 17, 2015 at 01:28:41PM -0600, Jeff Law wrote:
> >> On 03/14/2015 07:02 AM, Alan Modra wrote:
> >> >     PR target/65408
> >> >     PR target/58744
> >> >     PR middle-end/36043
> >> >     * calls.c (load_register_parameters): Don't load past end of
> >> >     mem unless suitably aligned.
> >> I think this is probably a stage1 item.  Richi, Jakub, Joseph, do any of you
> >> think we should try to push this into gcc-5?
> >
> > Some (severity) background to PR65408.  The bug came from SAP HANA
> > (en.wikipedia.org/wiki/SAP_HANA), a crash that happens on powerpc64
> > and powerpc64le.  aarch64 would also be susceptible to the crash since
> > it also loads 16 bytes for the 12-byte struct.  x86_64 only loads 12
> > bytes (i386.c:construct_container generates a parallel with a DImode
> > and SImode load).  However the underlying bug is there and hits x86_64
> > too for the pr58744 and pr36043 testcases..
> 
> It's a very very very old bug though.  I'd be interested in any odd
> code-generation difference for compiling, say, the linux kernel
> (you _can_ get quite ugly code generated because of this fix).

Yes, all those byte loads..  As far as the kernel goes, x86_64 shows
some differences in calls to rgb_{fore,back}ground in
drivers/tty/vt/vt.c.  Both before and after look ugly to me.  :)

    4d73:       8b 82 24 02 00 00       mov    0x224(%rdx),%eax
    4d79:       48 89 df                mov    %rbx,%rdi
    4d7c:       41 83 c4 04             add    $0x4,%r12d
    4d80:       88 45 b2                mov    %al,-0x4e(%rbp)
    4d83:       8b 82 28 02 00 00       mov    0x228(%rdx),%eax
    4d89:       88 45 b3                mov    %al,-0x4d(%rbp)
    4d8c:       8b 82 2c 02 00 00       mov    0x22c(%rdx),%eax
    4d92:       88 45 b4                mov    %al,-0x4c(%rbp)
    4d95:       48 8b 75 b2             mov    -0x4e(%rbp),%rsi
    4d99:       e8 00 00 00 00          callq  <rgb_background>
 vs.
    4d73:       8b 82 24 02 00 00       mov    0x224(%rdx),%eax
    4d79:       0f b6 b2 2c 02 00 00    movzbl 0x22c(%rdx),%esi
    4d80:       48 89 df                mov    %rbx,%rdi
    4d83:       41 83 c4 04             add    $0x4,%r12d
    4d87:       88 45 b2                mov    %al,-0x4e(%rbp)
    4d8a:       8b 82 28 02 00 00       mov    0x228(%rdx),%eax
    4d90:       48 c1 e6 10             shl    $0x10,%rsi
    4d94:       88 45 b3                mov    %al,-0x4d(%rbp)
    4d97:       0f b7 45 b2             movzwl -0x4e(%rbp),%eax
    4d9b:       48 09 c6                or     %rax,%rsi
    4d9e:       e8 00 00 00 00          callq  <rgb_background>

Is the patch OK for stage1?
Richard Biener April 13, 2015, 7:47 a.m. UTC | #11
On Mon, Apr 13, 2015 at 6:48 AM, Alan Modra <amodra@gmail.com> wrote:
> On Wed, Mar 18, 2015 at 12:12:17PM +0100, Richard Biener wrote:
>> On Wed, Mar 18, 2015 at 5:22 AM, Alan Modra <amodra@gmail.com> wrote:
>> > On Tue, Mar 17, 2015 at 01:28:41PM -0600, Jeff Law wrote:
>> >> On 03/14/2015 07:02 AM, Alan Modra wrote:
>> >> >     PR target/65408
>> >> >     PR target/58744
>> >> >     PR middle-end/36043
>> >> >     * calls.c (load_register_parameters): Don't load past end of
>> >> >     mem unless suitably aligned.
>> >> I think this is probably a stage1 item.  Richi, Jakub, Joseph, do any of you
>> >> think we should try to push this into gcc-5?
>> >
>> > Some (severity) background to PR65408.  The bug came from SAP HANA
>> > (en.wikipedia.org/wiki/SAP_HANA), a crash that happens on powerpc64
>> > and powerpc64le.  aarch64 would also be susceptible to the crash since
>> > it also loads 16 bytes for the 12-byte struct.  x86_64 only loads 12
>> > bytes (i386.c:construct_container generates a parallel with a DImode
>> > and SImode load).  However the underlying bug is there and hits x86_64
>> > too for the pr58744 and pr36043 testcases..
>>
>> It's a very very very old bug though.  I'd be interested in any odd
>> code-generation difference for compiling, say, the linux kernel
>> (you _can_ get quite ugly code generated because of this fix).
>
> Yes, all those byte loads..  As far as the kernel goes, x86_64 shows
> some differences in calls to rgb_{fore,back}ground in
> drivers/tty/vt/vt.c.  Both before and after look ugly to me.  :)
>
>     4d73:       8b 82 24 02 00 00       mov    0x224(%rdx),%eax
>     4d79:       48 89 df                mov    %rbx,%rdi
>     4d7c:       41 83 c4 04             add    $0x4,%r12d
>     4d80:       88 45 b2                mov    %al,-0x4e(%rbp)
>     4d83:       8b 82 28 02 00 00       mov    0x228(%rdx),%eax
>     4d89:       88 45 b3                mov    %al,-0x4d(%rbp)
>     4d8c:       8b 82 2c 02 00 00       mov    0x22c(%rdx),%eax
>     4d92:       88 45 b4                mov    %al,-0x4c(%rbp)
>     4d95:       48 8b 75 b2             mov    -0x4e(%rbp),%rsi
>     4d99:       e8 00 00 00 00          callq  <rgb_background>
>  vs.
>     4d73:       8b 82 24 02 00 00       mov    0x224(%rdx),%eax
>     4d79:       0f b6 b2 2c 02 00 00    movzbl 0x22c(%rdx),%esi
>     4d80:       48 89 df                mov    %rbx,%rdi
>     4d83:       41 83 c4 04             add    $0x4,%r12d
>     4d87:       88 45 b2                mov    %al,-0x4e(%rbp)
>     4d8a:       8b 82 28 02 00 00       mov    0x228(%rdx),%eax
>     4d90:       48 c1 e6 10             shl    $0x10,%rsi
>     4d94:       88 45 b3                mov    %al,-0x4d(%rbp)
>     4d97:       0f b7 45 b2             movzwl -0x4e(%rbp),%eax
>     4d9b:       48 09 c6                or     %rax,%rsi
>     4d9e:       e8 00 00 00 00          callq  <rgb_background>
>
> Is the patch OK for stage1?

Yes.

Thanks,
Richard.

> --
> Alan Modra
> Australia Development Lab, IBM
diff mbox

Patch

Index: gcc/calls.c
===================================================================
--- gcc/calls.c	(revision 221435)
+++ gcc/calls.c	(working copy)
@@ -2090,6 +2090,26 @@  load_register_parameters (struct arg_data *args, i
 					   (XEXP (args[i].value, 0), size)))
 		*sibcall_failure = 1;
 
+	      if (size % UNITS_PER_WORD == 0
+		  || MEM_ALIGN (mem) % BITS_PER_WORD == 0)
+		move_block_to_reg (REGNO (reg), mem, nregs, args[i].mode);
+	      else
+		{
+		  if (nregs > 1)
+		    move_block_to_reg (REGNO (reg), mem, nregs - 1,
+				       args[i].mode);
+		  rtx dest = gen_rtx_REG (word_mode, REGNO (reg) + nregs - 1);
+		  unsigned int bitoff = (nregs - 1) * BITS_PER_WORD;
+		  unsigned int bitsize = size * BITS_PER_UNIT - bitoff;
+		  rtx x = extract_bit_field (mem, bitsize, bitoff, 1,
+					     dest, word_mode, word_mode);
+		  if (BYTES_BIG_ENDIAN)
+		    x = expand_shift (LSHIFT_EXPR, word_mode, x,
+				      BITS_PER_WORD - bitsize, dest, 1);
+		  if (x != dest)
+		    emit_move_insn (dest, x);
+		}
+
 	      /* Handle a BLKmode that needs shifting.  */
 	      if (nregs == 1 && size < UNITS_PER_WORD
 #ifdef BLOCK_REG_PADDING
@@ -2097,22 +2117,18 @@  load_register_parameters (struct arg_data *args, i
 #else
 		  && BYTES_BIG_ENDIAN
 #endif
-		 )
+		  )
 		{
-		  rtx tem = operand_subword_force (mem, 0, args[i].mode);
-		  rtx ri = gen_rtx_REG (word_mode, REGNO (reg));
-		  rtx x = gen_reg_rtx (word_mode);
+		  rtx dest = gen_rtx_REG (word_mode, REGNO (reg));
 		  int shift = (UNITS_PER_WORD - size) * BITS_PER_UNIT;
-		  enum tree_code dir = BYTES_BIG_ENDIAN ? RSHIFT_EXPR
-							: LSHIFT_EXPR;
+		  enum tree_code dir = (BYTES_BIG_ENDIAN
+					? RSHIFT_EXPR : LSHIFT_EXPR);
+		  rtx x;
 
-		  emit_move_insn (x, tem);
-		  x = expand_shift (dir, word_mode, x, shift, ri, 1);
-		  if (x != ri)
-		    emit_move_insn (ri, x);
+		  x = expand_shift (dir, word_mode, dest, shift, dest, 1);
+		  if (x != dest)
+		    emit_move_insn (dest, x);
 		}
-	      else
-		move_block_to_reg (REGNO (reg), mem, nregs, args[i].mode);
 	    }
 
 	  /* When a parameter is a block, and perhaps in other cases, it is