[00/29,arm] Rewrite DImode arithmetic support

Message ID	20191018194900.34795-1-Richard.Earnshaw@arm.com
Headers	show Return-Path: <gcc-patches-return-511309-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type; q=dns; s=default; b=vkUdWrK5gQIW7fRMA0TzJTZ6Bqhc7qtF071FmWI6U7aQ/sf3zg n1xUlmLhKNpzwuYZ4a5+NSoQSkcin9agXb+nFVsCcQCsa5SVGmWpMuZNDchqX3HF KOfTgksxMCObO99on0UPZIg5WTtiau5PiBZewPms5fTompXCFSJpiOwIc= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org From: Richard Earnshaw <Richard.Earnshaw@arm.com> To: gcc-patches@gcc.gnu.org Cc: Richard Earnshaw <Richard.Earnshaw@arm.com> Subject: [PATCH 00/29] [arm] Rewrite DImode arithmetic support Date: Fri, 18 Oct 2019 20:48:31 +0100 Message-Id: <20191018194900.34795-1-Richard.Earnshaw@arm.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------------2.17.1"
Series	Rewrite DImode arithmetic support \| expand [00/29,arm] Rewrite DImode arithmetic support [01/29,arm] Rip out DImode addition and subtraction splits. [02/29,arm] Perform early splitting of adddi3. [03/29,arm] Early split zero- and sign-extension [04/29,arm] Rewrite addsi3_carryin_shift_<optab> in canonical form [05/29,arm] fix constraints on addsi3_carryin_alt2 [06/29,arm] Early split subdi3 [07/29,arm] Remove redundant DImode subtract patterns [08/29,arm] Introduce arm_carry_operation [09/29,arm] Correctly cost addition with a carry-in [10/29,arm] Correct cost calculations involving borrow for subtracts. [11/29,arm] Reduce cost of insns that are simple reg-reg moves. [12/29,arm] Implement negscc using SBC when appropriate. [13/29,arm] Add alternative canonicalizations for subtract-with-carry + shift [14/29,arm] Early split simple DImode equality comparisons [15/29,arm] Improve handling of DImode comparisions against constants. [16/29,arm] early split most DImode comparison operations. [17/29,arm] Handle some constant comparisons using rsbs+rscs [18/29,arm] Cleanup dead code - old support for DImode comparisons [19/29,arm] Handle immediate values in uaddvsi4 [20/29,arm] Early expansion of uaddvdi4. [21/29,arm] Improve code generation for addvsi4. [22/29,arm] Allow the summation result of signed add-with-overflow to be discarded. [23/29,arm] Early split addvdi4 [24/29,arm] Improve constant handling for usubvsi4. [25/29,arm] Early expansion of usubvdi4. [26/29,arm] Improve constant handling for subvsi4. [27/29,arm] Early expansion of subvdi4 [28/29,arm] Improvements to negvsi4 and negvdi4. [29/29,arm] Fix testsuite nit when compiling for thumb2

Richard Earnshaw (lists) Oct. 18, 2019, 7:48 p.m. UTC

This series of patches rewrites all the DImode arithmetic patterns for
the Arm backend when compiling for Arm or Thumb2 to split the
operations during expand (the thumb1 code is unchanged and cannot
benefit from early splitting as we are unable to expose the carry
flag).

This has a number of benefits:
 - register allocation has more freedom to use independent
   registers for the upper and lower halves of the register
 - we can make better use of combine for spotting insn merge
   opportunities without needing many additional patterns that are
   only used for DImode
 - we eliminate a number of bugs in the machine description where
   the carry calculations were not correctly propagated by the
   split patterns (we mostly got away with this because the
   splitting previously happened only after most of the important
   optimization passes had been run).

The patch series starts by paring back all the DImode arithmetic
support to a very simple form without any splitting at all and then
progressively re-implementing the patterns with early split
operations.  This proved to be the only sane way of untangling the
existing code due to a number of latent bugs which would have been
exposed if a different approach had been taken.

Each patch should produce a working compiler (it did when it was
originally written), though since the patch set has been re-ordered
slightly there is a possibility that some of the intermediate steps
may have missing test updates that are only cleaned up later.
However, only the end of the series should be considered complete.
I've kept the patch as a series to permit easier regression hunting
should that prove necessary.

R.

Richard Earnshaw (29):
  [arm] Rip out DImode addition and subtraction splits.
  [arm] Perform early splitting of adddi3.
  [arm] Early split zero- and sign-extension
  [arm] Rewrite addsi3_carryin_shift_<optab> in canonical form
  [arm] fix constraints on addsi3_carryin_alt2
  [arm] Early split subdi3
  [arm] Remove redundant DImode subtract patterns
  [arm] Introduce arm_carry_operation
  [arm] Correctly cost addition with a carry-in
  [arm] Correct cost calculations involving borrow for subtracts.
  [arm] Reduce cost of insns that are simple reg-reg moves.
  [arm] Implement negscc using SBC when appropriate.
  [arm] Add alternative canonicalizations for subtract-with-carry +
    shift
  [arm] Early split simple DImode equality comparisons
  [arm] Improve handling of DImode comparisions against constants.
  [arm] early split most DImode comparison operations.
  [arm] Handle some constant comparisons using rsbs+rscs
  [arm] Cleanup dead code - old support for DImode comparisons
  [arm] Handle immediate values in uaddvsi4
  [arm] Early expansion of uaddvdi4.
  [arm] Improve code generation for addvsi4.
  [arm] Allow the summation result of signed add-with-overflow to be
    discarded.
  [arm] Early split addvdi4
  [arm] Improve constant handling for usubvsi4.
  [arm] Early expansion of usubvdi4.
  [arm] Improve constant handling for subvsi4.
  [arm] Early expansion of subvdi4
  [arm] Improvements to negvsi4 and negvdi4.
  [arm] Fix testsuite nit when compiling for thumb2

 gcc/config/arm/arm-modes.def                  |   19 +-
 gcc/config/arm/arm-protos.h                   |    1 +
 gcc/config/arm/arm.c                          |  598 ++++-
 gcc/config/arm/arm.md                         | 2020 ++++++++++-------
 gcc/config/arm/iterators.md                   |   15 +-
 gcc/config/arm/predicates.md                  |   29 +-
 gcc/config/arm/thumb2.md                      |    8 +-
 .../gcc.dg/builtin-arith-overflow-3.c         |   41 +
 gcc/testsuite/gcc.target/arm/negdi-3.c        |    4 +-
 9 files changed, 1757 insertions(+), 978 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/builtin-arith-overflow-3.c

Segher Boessenkool Oct. 19, 2019, 4:31 p.m. UTC | #1

Hi Richard,

On Fri, Oct 18, 2019 at 08:48:31PM +0100, Richard Earnshaw wrote:
> 
> This series of patches rewrites all the DImode arithmetic patterns for
> the Arm backend when compiling for Arm or Thumb2 to split the
> operations during expand (the thumb1 code is unchanged and cannot
> benefit from early splitting as we are unable to expose the carry
> flag).

Very nice :-)

I have a bunch of testcases from when I did something similar for PowerPC
that I wanted to test...  But I cannot get your series to apply.  Do you
have a git repo I can pull from?

Here is one test case (it's a bit geared towards what our ISA can do):

===
typedef unsigned int u32;
typedef unsigned long long u64;

u64 add(u64 a, u64 b) { return a + b; }
u64 add1(u64 a) { return a + 1; }
u64 add42(u64 a) { return a + 42; }
u64 addm1(u64 a) { return a - 1; }
u64 addff(u64 a) { return a + 0xffffffffULL; }
u64 addH(u64 a) { return a + 0x123400005678ULL; }
u64 addH0(u64 a) { return a + 0x123400000000ULL; }
u64 addS(u64 a, u32 b) { return a + b; }
u64 addSH(u64 a, u32 b) { return a + ((u64)b << 32); }
u64 addB1(u64 a) { return a + 0x100000000ULL; }
u64 addB8(u64 a) { return a + 0x800000000ULL; }

u64 addSH42(u64 a, u32 b) { return a + ((u64)b << 32) + 42; }
u64 addSHm1(u64 a, u32 b) { return a + ((u64)b << 32) - 1; }
u64 addSHff(u64 a, u32 b) { return a + ((u64)b << 32) + 0xffffffffULL; }
===

rs6000 -m32 currently has non-optimal code for addm1, addSHm1; trunk arm
has non-optimal code for addH0, addSH, addB1, addB8, addSH42, addSHm1, and
addSHff if I understand well enough.  So I'd love to see what it does with
your series applied :-)

Segher

Richard Earnshaw (lists) Oct. 20, 2019, 11:21 a.m. UTC | #2

On 19/10/2019 17:31, Segher Boessenkool wrote:
> Hi Richard,
> 
> On Fri, Oct 18, 2019 at 08:48:31PM +0100, Richard Earnshaw wrote:
>>
>> This series of patches rewrites all the DImode arithmetic patterns for
>> the Arm backend when compiling for Arm or Thumb2 to split the
>> operations during expand (the thumb1 code is unchanged and cannot
>> benefit from early splitting as we are unable to expose the carry
>> flag).
> 
> Very nice :-)
> 
> I have a bunch of testcases from when I did something similar for PowerPC
> that I wanted to test...  But I cannot get your series to apply.  Do you
> have a git repo I can pull from?
> 

Perhaps because it's already committed to trunk?


> Here is one test case (it's a bit geared towards what our ISA can do):
> 
> ===
> typedef unsigned int u32;
> typedef unsigned long long u64;
> 
> u64 add(u64 a, u64 b) { return a + b; }
> u64 add1(u64 a) { return a + 1; }
> u64 add42(u64 a) { return a + 42; }
> u64 addm1(u64 a) { return a - 1; }
> u64 addff(u64 a) { return a + 0xffffffffULL; }
> u64 addH(u64 a) { return a + 0x123400005678ULL; }
> u64 addH0(u64 a) { return a + 0x123400000000ULL; }
> u64 addS(u64 a, u32 b) { return a + b; }
> u64 addSH(u64 a, u32 b) { return a + ((u64)b << 32); }
> u64 addB1(u64 a) { return a + 0x100000000ULL; }
> u64 addB8(u64 a) { return a + 0x800000000ULL; }
> 
> u64 addSH42(u64 a, u32 b) { return a + ((u64)b << 32) + 42; }
> u64 addSHm1(u64 a, u32 b) { return a + ((u64)b << 32) - 1; }
> u64 addSHff(u64 a, u32 b) { return a + ((u64)b << 32) + 0xffffffffULL; }
> ===
> 
> rs6000 -m32 currently has non-optimal code for addm1, addSHm1; trunk arm
> has non-optimal code for addH0, addSH, addB1, addB8, addSH42, addSHm1, and
> addSHff if I understand well enough.  So I'd love to see what it does with
> your series applied :-)
> 
> 
> Segher
> 

We do pretty well on this.  Only addSHm1 needs three insns (except where
the constant isn't valid for arm), and I think that's the minimum for
this case anyway.  Several of the tests only need one insn.

R.
.arch armv8-a
	.eabi_attribute 20, 1
	.eabi_attribute 21, 1
	.eabi_attribute 23, 3
	.eabi_attribute 24, 1
	.eabi_attribute 25, 1
	.eabi_attribute 26, 1
	.eabi_attribute 30, 2
	.eabi_attribute 34, 1
	.eabi_attribute 18, 4
	.file	"lltest.c"
	.text
	.align	2
	.global	add
	.syntax unified
	.arm
	.fpu softvfp
	.type	add, %function
add:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	adds	r0, r0, r2
	adc	r1, r1, r3
	bx	lr
	.size	add, .-add
	.align	2
	.global	add1
	.syntax unified
	.arm
	.fpu softvfp
	.type	add1, %function
add1:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	adds	r0, r0, #1
	adc	r1, r1, #0
	bx	lr
	.size	add1, .-add1
	.align	2
	.global	add42
	.syntax unified
	.arm
	.fpu softvfp
	.type	add42, %function
add42:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	adds	r0, r0, #42
	adc	r1, r1, #0
	bx	lr
	.size	add42, .-add42
	.align	2
	.global	addm1
	.syntax unified
	.arm
	.fpu softvfp
	.type	addm1, %function
addm1:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	subs	r0, r0, #1
	sbc	r1, r1, #0
	bx	lr
	.size	addm1, .-addm1
	.align	2
	.global	addff
	.syntax unified
	.arm
	.fpu softvfp
	.type	addff, %function
addff:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	subs	r0, r0, #1
	adc	r1, r1, #0
	bx	lr
	.size	addff, .-addff
	.align	2
	.global	addH
	.syntax unified
	.arm
	.fpu softvfp
	.type	addH, %function
addH:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	movw	r3, #22136
	adds	r0, r0, r3
	movw	r3, #4660
	adc	r1, r3, r1
	bx	lr
	.size	addH, .-addH
	.align	2
	.global	addH0
	.syntax unified
	.arm
	.fpu softvfp
	.type	addH0, %function
addH0:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	add	r1, r1, #4608
	add	r1, r1, #52
	bx	lr
	.size	addH0, .-addH0
	.align	2
	.global	addS
	.syntax unified
	.arm
	.fpu softvfp
	.type	addS, %function
addS:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	adds	r0, r2, r0
	adc	r1, r1, #0
	bx	lr
	.size	addS, .-addS
	.align	2
	.global	addSH
	.syntax unified
	.arm
	.fpu softvfp
	.type	addSH, %function
addSH:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	add	r1, r2, r1
	bx	lr
	.size	addSH, .-addSH
	.align	2
	.global	addB1
	.syntax unified
	.arm
	.fpu softvfp
	.type	addB1, %function
addB1:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	add	r1, r1, #1
	bx	lr
	.size	addB1, .-addB1
	.align	2
	.global	addB8
	.syntax unified
	.arm
	.fpu softvfp
	.type	addB8, %function
addB8:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	add	r1, r1, #8
	bx	lr
	.size	addB8, .-addB8
	.align	2
	.global	addSH42
	.syntax unified
	.arm
	.fpu softvfp
	.type	addSH42, %function
addSH42:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	adds	r0, r0, #42
	adc	r1, r1, r2
	bx	lr
	.size	addSH42, .-addSH42
	.align	2
	.global	addSHm1
	.syntax unified
	.arm
	.fpu softvfp
	.type	addSHm1, %function
addSHm1:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	subs	r0, r0, #1
	sbc	r1, r1, #0
	add	r1, r2, r1
	bx	lr
	.size	addSHm1, .-addSHm1
	.align	2
	.global	addSHff
	.syntax unified
	.arm
	.fpu softvfp
	.type	addSHff, %function
addSHff:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	subs	r0, r0, #1
	adc	r1, r1, r2
	bx	lr
	.size	addSHff, .-addSHff
	.ident	"GCC: (trunk) 10.0.0 20191018 (experimental) [master revision 54f1e150a38:1ddbabe127b:e55e9d95a5ab8397197a5e358ba0185f9471f043]"

Ramana Radhakrishnan Oct. 20, 2019, 12:23 p.m. UTC | #3

On Fri, Oct 18, 2019 at 8:49 PM Richard Earnshaw
<Richard.Earnshaw@arm.com> wrote:
>
>
> This series of patches rewrites all the DImode arithmetic patterns for
> the Arm backend when compiling for Arm or Thumb2 to split the
> operations during expand (the thumb1 code is unchanged and cannot
> benefit from early splitting as we are unable to expose the carry
> flag).
>
> This has a number of benefits:
>  - register allocation has more freedom to use independent
>    registers for the upper and lower halves of the register
>  - we can make better use of combine for spotting insn merge
>    opportunities without needing many additional patterns that are
>    only used for DImode
>  - we eliminate a number of bugs in the machine description where
>    the carry calculations were not correctly propagated by the
>    split patterns (we mostly got away with this because the
>    splitting previously happened only after most of the important
>    optimization passes had been run).
>
> The patch series starts by paring back all the DImode arithmetic
> support to a very simple form without any splitting at all and then
> progressively re-implementing the patterns with early split
> operations.  This proved to be the only sane way of untangling the
> existing code due to a number of latent bugs which would have been
> exposed if a different approach had been taken.
>
> Each patch should produce a working compiler (it did when it was
> originally written), though since the patch set has been re-ordered
> slightly there is a possibility that some of the intermediate steps
> may have missing test updates that are only cleaned up later.
> However, only the end of the series should be considered complete.
> I've kept the patch as a series to permit easier regression hunting
> should that prove necessary.

Yay ! it's quite nice to see this go in.

Ramana


>
> R.
>
> Richard Earnshaw (29):
>   [arm] Rip out DImode addition and subtraction splits.
>   [arm] Perform early splitting of adddi3.
>   [arm] Early split zero- and sign-extension
>   [arm] Rewrite addsi3_carryin_shift_<optab> in canonical form
>   [arm] fix constraints on addsi3_carryin_alt2
>   [arm] Early split subdi3
>   [arm] Remove redundant DImode subtract patterns
>   [arm] Introduce arm_carry_operation
>   [arm] Correctly cost addition with a carry-in
>   [arm] Correct cost calculations involving borrow for subtracts.
>   [arm] Reduce cost of insns that are simple reg-reg moves.
>   [arm] Implement negscc using SBC when appropriate.
>   [arm] Add alternative canonicalizations for subtract-with-carry +
>     shift
>   [arm] Early split simple DImode equality comparisons
>   [arm] Improve handling of DImode comparisions against constants.
>   [arm] early split most DImode comparison operations.
>   [arm] Handle some constant comparisons using rsbs+rscs
>   [arm] Cleanup dead code - old support for DImode comparisons
>   [arm] Handle immediate values in uaddvsi4
>   [arm] Early expansion of uaddvdi4.
>   [arm] Improve code generation for addvsi4.
>   [arm] Allow the summation result of signed add-with-overflow to be
>     discarded.
>   [arm] Early split addvdi4
>   [arm] Improve constant handling for usubvsi4.
>   [arm] Early expansion of usubvdi4.
>   [arm] Improve constant handling for subvsi4.
>   [arm] Early expansion of subvdi4
>   [arm] Improvements to negvsi4 and negvdi4.
>   [arm] Fix testsuite nit when compiling for thumb2
>
>  gcc/config/arm/arm-modes.def                  |   19 +-
>  gcc/config/arm/arm-protos.h                   |    1 +
>  gcc/config/arm/arm.c                          |  598 ++++-
>  gcc/config/arm/arm.md                         | 2020 ++++++++++-------
>  gcc/config/arm/iterators.md                   |   15 +-
>  gcc/config/arm/predicates.md                  |   29 +-
>  gcc/config/arm/thumb2.md                      |    8 +-
>  .../gcc.dg/builtin-arith-overflow-3.c         |   41 +
>  gcc/testsuite/gcc.target/arm/negdi-3.c        |    4 +-
>  9 files changed, 1757 insertions(+), 978 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/builtin-arith-overflow-3.c
>

Segher Boessenkool Oct. 21, 2019, 10:06 a.m. UTC | #4

On Sun, Oct 20, 2019 at 12:21:21PM +0100, Richard Earnshaw (lists) wrote:
> On 19/10/2019 17:31, Segher Boessenkool wrote:
> > I have a bunch of testcases from when I did something similar for PowerPC
> > that I wanted to test...  But I cannot get your series to apply.  Do you
> > have a git repo I can pull from?
> 
> Perhaps because it's already committed to trunk?

Oh probably.  Duh.  Thanks :-)

> > u64 addH(u64 a) { return a + 0x123400005678ULL; }
> > u64 addH0(u64 a) { return a + 0x123400000000ULL; }

(If you change those to 0x3400000078ULL etc. it'll test as meant on arm
as well: to see if it uses immediates in the insn where it can.  It looks
like it'll work fine fwiw).

> We do pretty well on this.  Only addSHm1 needs three insns (except where
> the constant isn't valid for arm), and I think that's the minimum for
> this case anyway.  Several of the tests only need one insn.

Yeah, very nice :-)


Segher

Christophe Lyon Oct. 21, 2019, 11:51 a.m. UTC | #5

On 18/10/2019 21:48, Richard Earnshaw wrote:
> Each patch should produce a working compiler (it did when it was
> originally written), though since the patch set has been re-ordered
> slightly there is a possibility that some of the intermediate steps
> may have missing test updates that are only cleaned up later.
> However, only the end of the series should be considered complete.
> I've kept the patch as a series to permit easier regression hunting
> should that prove necessary.

Thanks for this information: my validation system was designed in such a way that it will run the GCC testsuite after each of your patches, so I'll keep in mind not to report regressions (I've noticed several already).

I can perform a manual validation taking your 29 patches as a single one and compare the results with those of the revision preceding the one were you committed patch #1. Do you think it would be useful?

Christophe

Richard Earnshaw (lists) Oct. 21, 2019, 12:24 p.m. UTC | #6

On 21/10/2019 12:51, Christophe Lyon wrote:
> On 18/10/2019 21:48, Richard Earnshaw wrote:
>> Each patch should produce a working compiler (it did when it was
>> originally written), though since the patch set has been re-ordered
>> slightly there is a possibility that some of the intermediate steps
>> may have missing test updates that are only cleaned up later.
>> However, only the end of the series should be considered complete.
>> I've kept the patch as a series to permit easier regression hunting
>> should that prove necessary.
> 
> Thanks for this information: my validation system was designed in such a 
> way that it will run the GCC testsuite after each of your patches, so 
> I'll keep in mind not to report regressions (I've noticed several already).
> 
> 
> I can perform a manual validation taking your 29 patches as a single one 
> and compare the results with those of the revision preceding the one 
> were you committed patch #1. Do you think it would be useful?
> 
> 
> Christophe
> 
> 

I think if you can filter out any that are removed by later patches and 
then report against the patch that caused the regression itself then 
that would be the best.  But I realise that would be more work for you, 
so a round-up against the combined set would be OK.

BTW, I'm aware of an issue with the compiler now generating

	<alu> reg, reg, shift <reg>

in Thumb2; no need to report that again.

Thanks,
R.

Christophe Lyon Oct. 23, 2019, 8:28 a.m. UTC | #7

On 21/10/2019 14:24, Richard Earnshaw (lists) wrote:
> On 21/10/2019 12:51, Christophe Lyon wrote:
>> On 18/10/2019 21:48, Richard Earnshaw wrote:
>>> Each patch should produce a working compiler (it did when it was
>>> originally written), though since the patch set has been re-ordered
>>> slightly there is a possibility that some of the intermediate steps
>>> may have missing test updates that are only cleaned up later.
>>> However, only the end of the series should be considered complete.
>>> I've kept the patch as a series to permit easier regression hunting
>>> should that prove necessary.
>>
>> Thanks for this information: my validation system was designed in such a way that it will run the GCC testsuite after each of your patches, so I'll keep in mind not to report regressions (I've noticed several already).
>>
>>
>> I can perform a manual validation taking your 29 patches as a single one and compare the results with those of the revision preceding the one were you committed patch #1. Do you think it would be useful?
>>
>>
>> Christophe
>>
>>
> 
> I think if you can filter out any that are removed by later patches and then report against the patch that caused the regression itself then that would be the best.  But I realise that would be more work for you, so a round-up against the combined set would be OK.
> 
> BTW, I'm aware of an issue with the compiler now generating
> 
>      <alu> reg, reg, shift <reg>
> 
> in Thumb2; no need to report that again.
> 
> Thanks,
> R.
> .
> 


Hi Richard,

The validation of the whole set shows 1 regression, which was also reported by the validation of r277179 (early split most DImode comparison operations)

When GCC is configured as:
--target arm-none-eabi
--with-mode default
--with-cpu default
--with-fpu default
(that is, no --with-mode, --with-cpu, --with-fpu option)
I'm using binutils-2.28 and newlib-3.1.0

I can see:
FAIL: g++.dg/opt/pr36449.C  -std=gnu++14 execution test
(whatever -std=gnu++XX option)

I'm executing the tests using qemu-4.1.0 -cpu arm926
The qemu traces shows that code enters main, then _Znwj (operator new), then _malloc_r
The qemu traces end with:
IN: _malloc_r^M
0x00019224:  e3a00ffe  mov      r0, #0x3f8^M
0x00019228:  e3a0c07f  mov      ip, #0x7f^M
0x0001922c:  e3a0e07e  mov      lr, #0x7e^M
0x00019230:  eafffe41  b        #0x18b3c^M
^M
R00=00049418 R01=00000000 R02=00000554 R03=00040000^M
R04=00000000 R05=08000008 R06=00049418 R07=00000000^M
R08=00000000 R09=00000000 R10=000492d8 R11=fffeb4b4^M
R12=00000060 R13=fffeb460 R14=00018b14 R15=00019224^M
PSR=20000010 --C- A usr32^M
----------------^M
IN: _malloc_r^M
0x00018b3c:  e59f76f8  ldr      r7, [pc, #0x6f8]^M
0x00018b40:  e0870000  add      r0, r7, r0^M
0x00018b44:  e5903004  ldr      r3, [r0, #4]^M
0x00018b48:  e2400008  sub      r0, r0, #8^M
0x00018b4c:  e1500003  cmp      r0, r3^M
0x00018b50:  1a000005  bne      #0x18b6c^M
^M
R00=000003f8 R01=00000000 R02=00000554 R03=00040000^M
R04=00000000 R05=08000008 R06=00049418 R07=00000000^M
R08=00000000 R09=00000000 R10=000492d8 R11=fffeb4b4^M
R12=0000007f R13=fffeb460 R14=0000007e R15=00018b3c^M
PSR=20000010 --C- A usr32^M
R00=00049c30 R01=00000000 R02=00000554 R03=00049c30^M
R04=00000000 R05=08000008 R06=00049418 R07=00049840^M
R08=00000000 R09=00000000 R10=000492d8 R11=fffeb4b4^M
R12=0000007f R13=fffeb460 R14=0000007e R15=00018b54^M
PSR=60000010 -ZC- A usr32^M
----------------^M
IN: _malloc_r^M
0x00019120:  e1a02a0b  lsl      r2, fp, #0x14^M
0x00019124:  e1a02a22  lsr      r2, r2, #0x14^M
0x00019128:  e3520000  cmp      r2, #0^M
0x0001912c:  1afffee7  bne      #0x18cd0^M
^M
R00=0004b000 R01=08002108 R02=00049e40 R03=0004b000^M
R04=0004a8e0 R05=08000008 R06=00049418 R07=00049840^M
R08=08001000 R09=00000720 R10=00049e0c R11=0004b000^M
R12=0000007f R13=fffeb460 R14=00018ca0 R15=00019120^M
PSR=60000010 -ZC- A usr32^M
----------------^M
IN: _malloc_r^M
0x00019130:  e5974008  ldr      r4, [r7, #8]^M
0x00019134:  e0898008  add      r8, sb, r8^M
0x00019138:  e3888001  orr      r8, r8, #1^M
0x0001913c:  e5848004  str      r8, [r4, #4]^M
0x00019140:  eaffff14  b        #0x18d98^M
^M
R00=0004b000 R01=08002108 R02=00000000 R03=0004b000^M
R04=0004a8e0 R05=08000008 R06=00049418 R07=00049840^M
R08=08001000 R09=00000720 R10=00049e0c R11=0004b000^M
R12=0000007f R13=fffeb460 R14=00018ca0 R15=00019130^M
PSR=60000010 -ZC- A usr32^M
R00=0004b000 R01=08002108 R02=00000000 R03=0004b000^M
R04=0004a8e0 R05=08000008 R06=00049418 R07=00049840^M
R08=08001721 R09=00000720 R10=00049e0c R11=0004b000^M
R12=0000007f R13=fffeb460 R14=00018ca0 R15=00018d98^M
PSR=60000010 -ZC- A usr32^M

Christophe

Richard Earnshaw (lists) Oct. 23, 2019, 1:21 p.m. UTC | #8

On 23/10/2019 09:28, Christophe Lyon wrote:
> On 21/10/2019 14:24, Richard Earnshaw (lists) wrote:
>> On 21/10/2019 12:51, Christophe Lyon wrote:
>>> On 18/10/2019 21:48, Richard Earnshaw wrote:
>>>> Each patch should produce a working compiler (it did when it was
>>>> originally written), though since the patch set has been re-ordered
>>>> slightly there is a possibility that some of the intermediate steps
>>>> may have missing test updates that are only cleaned up later.
>>>> However, only the end of the series should be considered complete.
>>>> I've kept the patch as a series to permit easier regression hunting
>>>> should that prove necessary.
>>>
>>> Thanks for this information: my validation system was designed in 
>>> such a way that it will run the GCC testsuite after each of your 
>>> patches, so I'll keep in mind not to report regressions (I've noticed 
>>> several already).
>>>
>>>
>>> I can perform a manual validation taking your 29 patches as a single 
>>> one and compare the results with those of the revision preceding the 
>>> one were you committed patch #1. Do you think it would be useful?
>>>
>>>
>>> Christophe
>>>
>>>
>>
>> I think if you can filter out any that are removed by later patches 
>> and then report against the patch that caused the regression itself 
>> then that would be the best.  But I realise that would be more work 
>> for you, so a round-up against the combined set would be OK.
>>
>> BTW, I'm aware of an issue with the compiler now generating
>>
>>      <alu> reg, reg, shift <reg>
>>
>> in Thumb2; no need to report that again.
>>
>> Thanks,
>> R.
>> .
>>
> 
> 
> Hi Richard,
> 
> The validation of the whole set shows 1 regression, which was also 
> reported by the validation of r277179 (early split most DImode 
> comparison operations)
> 
> When GCC is configured as:
> --target arm-none-eabi
> --with-mode default
> --with-cpu default
> --with-fpu default
> (that is, no --with-mode, --with-cpu, --with-fpu option)
> I'm using binutils-2.28 and newlib-3.1.0
> 
> I can see:
> FAIL: g++.dg/opt/pr36449.C  -std=gnu++14 execution test
> (whatever -std=gnu++XX option)

That's strange.  The assembler code generated for that test is unchanged 
from before the patch series, so I can't see how it can't be a problem 
in the test itself.  What's more, I can't seem to reproduce this myself.

Similarly, in my build the code for _Znwj, malloc, malloc_r and free_r 
are also unchanged, while the malloc_[un]lock functions are empty stubs 
(not surprising as we aren't multi-threaded).

So the only thing that looks to have really changed are the linker 
offsets (some of the library code has changed, but I don't think it's 
really reached in practice, so shouldn't be relevant).

> 
> I'm executing the tests using qemu-4.1.0 -cpu arm926
> The qemu traces shows that code enters main, then _Znwj (operator new), 
> then _malloc_r
> The qemu traces end with:

What do you mean by 'end with'?  What's the failure mode of the test?  A 
crash, or the test exiting with a failure code?

> IN: _malloc_r^M
> 0x00019224:  e3a00ffe  mov      r0, #0x3f8^M
> 0x00019228:  e3a0c07f  mov      ip, #0x7f^M
> 0x0001922c:  e3a0e07e  mov      lr, #0x7e^M
> 0x00019230:  eafffe41  b        #0x18b3c^M
> ^M
> R00=00049418 R01=00000000 R02=00000554 R03=00040000^M
> R04=00000000 R05=08000008 R06=00049418 R07=00000000^M
> R08=00000000 R09=00000000 R10=000492d8 R11=fffeb4b4^M
> R12=00000060 R13=fffeb460 R14=00018b14 R15=00019224^M
> PSR=20000010 --C- A usr32^M
> ----------------^M
> IN: _malloc_r^M
> 0x00018b3c:  e59f76f8  ldr      r7, [pc, #0x6f8]^M
> 0x00018b40:  e0870000  add      r0, r7, r0^M
> 0x00018b44:  e5903004  ldr      r3, [r0, #4]^M
> 0x00018b48:  e2400008  sub      r0, r0, #8^M
> 0x00018b4c:  e1500003  cmp      r0, r3^M
> 0x00018b50:  1a000005  bne      #0x18b6c^M

But this block neither jumps to, nor falls through to ....
> ^M
> R00=000003f8 R01=00000000 R02=00000554 R03=00040000^M
> R04=00000000 R05=08000008 R06=00049418 R07=00000000^M
> R08=00000000 R09=00000000 R10=000492d8 R11=fffeb4b4^M
> R12=0000007f R13=fffeb460 R14=0000007e R15=00018b3c^M
> PSR=20000010 --C- A usr32^M
> R00=00049c30 R01=00000000 R02=00000554 R03=00049c30^M
> R04=00000000 R05=08000008 R06=00049418 R07=00049840^M
> R08=00000000 R09=00000000 R10=000492d8 R11=fffeb4b4^M
> R12=0000007f R13=fffeb460 R14=0000007e R15=00018b54^M
> PSR=60000010 -ZC- A usr32^M
> ----------------^M
> IN: _malloc_r^M

...here.  So there's some trace missing by the looks of it; or some 
other problem.

> 0x00019120:  e1a02a0b  lsl      r2, fp, #0x14^M
> 0x00019124:  e1a02a22  lsr      r2, r2, #0x14^M
> 0x00019128:  e3520000  cmp      r2, #0^M
> 0x0001912c:  1afffee7  bne      #0x18cd0^M

and the same here.

> ^M
> R00=0004b000 R01=08002108 R02=00049e40 R03=0004b000^M
> R04=0004a8e0 R05=08000008 R06=00049418 R07=00049840^M
> R08=08001000 R09=00000720 R10=00049e0c R11=0004b000^M
> R12=0000007f R13=fffeb460 R14=00018ca0 R15=00019120^M
> PSR=60000010 -ZC- A usr32^M
> ----------------^M
> IN: _malloc_r^M
> 0x00019130:  e5974008  ldr      r4, [r7, #8]^M
> 0x00019134:  e0898008  add      r8, sb, r8^M
> 0x00019138:  e3888001  orr      r8, r8, #1^M
> 0x0001913c:  e5848004  str      r8, [r4, #4]^M
> 0x00019140:  eaffff14  b        #0x18d98^M
> ^M
> R00=0004b000 R01=08002108 R02=00000000 R03=0004b000^M
> R04=0004a8e0 R05=08000008 R06=00049418 R07=00049840^M
> R08=08001000 R09=00000720 R10=00049e0c R11=0004b000^M
> R12=0000007f R13=fffeb460 R14=00018ca0 R15=00019130^M
> PSR=60000010 -ZC- A usr32^M
> R00=0004b000 R01=08002108 R02=00000000 R03=0004b000^M
> R04=0004a8e0 R05=08000008 R06=00049418 R07=00049840^M
> R08=08001721 R09=00000720 R10=00049e0c R11=0004b000^M
> R12=0000007f R13=fffeb460 R14=00018ca0 R15=00018d98^M
> PSR=60000010 -ZC- A usr32^M
> 
> Christophe
>

Christophe Lyon Oct. 24, 2019, 10:16 a.m. UTC | #9

On 23/10/2019 15:21, Richard Earnshaw (lists) wrote:
> On 23/10/2019 09:28, Christophe Lyon wrote:
>> On 21/10/2019 14:24, Richard Earnshaw (lists) wrote:
>>> On 21/10/2019 12:51, Christophe Lyon wrote:
>>>> On 18/10/2019 21:48, Richard Earnshaw wrote:
>>>>> Each patch should produce a working compiler (it did when it was
>>>>> originally written), though since the patch set has been re-ordered
>>>>> slightly there is a possibility that some of the intermediate steps
>>>>> may have missing test updates that are only cleaned up later.
>>>>> However, only the end of the series should be considered complete.
>>>>> I've kept the patch as a series to permit easier regression hunting
>>>>> should that prove necessary.
>>>>
>>>> Thanks for this information: my validation system was designed in such a way that it will run the GCC testsuite after each of your patches, so I'll keep in mind not to report regressions (I've noticed several already).
>>>>
>>>>
>>>> I can perform a manual validation taking your 29 patches as a single one and compare the results with those of the revision preceding the one were you committed patch #1. Do you think it would be useful?
>>>>
>>>>
>>>> Christophe
>>>>
>>>>
>>>
>>> I think if you can filter out any that are removed by later patches and then report against the patch that caused the regression itself then that would be the best.  But I realise that would be more work for you, so a round-up against the combined set would be OK.
>>>
>>> BTW, I'm aware of an issue with the compiler now generating
>>>
>>>      <alu> reg, reg, shift <reg>
>>>
>>> in Thumb2; no need to report that again.
>>>
>>> Thanks,
>>> R.
>>> .
>>>
>>
>>
>> Hi Richard,
>>
>> The validation of the whole set shows 1 regression, which was also reported by the validation of r277179 (early split most DImode comparison operations)
>>
>> When GCC is configured as:
>> --target arm-none-eabi
>> --with-mode default
>> --with-cpu default
>> --with-fpu default
>> (that is, no --with-mode, --with-cpu, --with-fpu option)
>> I'm using binutils-2.28 and newlib-3.1.0
>>
>> I can see:
>> FAIL: g++.dg/opt/pr36449.C  -std=gnu++14 execution test
>> (whatever -std=gnu++XX option)
> 
> That's strange.  The assembler code generated for that test is unchanged from before the patch series, so I can't see how it can't be a problem in the test itself.  What's more, I can't seem to reproduce this myself.

As you have noticed, I have created PR92207 to help understand this.

> 
> Similarly, in my build the code for _Znwj, malloc, malloc_r and free_r are also unchanged, while the malloc_[un]lock functions are empty stubs (not surprising as we aren't multi-threaded).
> 
> So the only thing that looks to have really changed are the linker offsets (some of the library code has changed, but I don't think it's really reached in practice, so shouldn't be relevant).
> 
>>
>> I'm executing the tests using qemu-4.1.0 -cpu arm926
>> The qemu traces shows that code enters main, then _Znwj (operator new), then _malloc_r
>> The qemu traces end with:
> 
> What do you mean by 'end with'?  What's the failure mode of the test?  A crash, or the test exiting with a failure code?
> 
qemu complains with:
qemu: uncaught target signal 11 (Segmentation fault) - core dumped
Segmentation fault (core dumped)

'end with' because my automated validation builds do not keep the full execution traces (that would need too much disk space)

>> IN: _malloc_r^M
>> 0x00019224:  e3a00ffe  mov      r0, #0x3f8^M
>> 0x00019228:  e3a0c07f  mov      ip, #0x7f^M
>> 0x0001922c:  e3a0e07e  mov      lr, #0x7e^M
>> 0x00019230:  eafffe41  b        #0x18b3c^M
>> ^M
>> R00=00049418 R01=00000000 R02=00000554 R03=00040000^M
>> R04=00000000 R05=08000008 R06=00049418 R07=00000000^M
>> R08=00000000 R09=00000000 R10=000492d8 R11=fffeb4b4^M
>> R12=00000060 R13=fffeb460 R14=00018b14 R15=00019224^M
>> PSR=20000010 --C- A usr32^M
>> ----------------^M
>> IN: _malloc_r^M
>> 0x00018b3c:  e59f76f8  ldr      r7, [pc, #0x6f8]^M
>> 0x00018b40:  e0870000  add      r0, r7, r0^M
>> 0x00018b44:  e5903004  ldr      r3, [r0, #4]^M
>> 0x00018b48:  e2400008  sub      r0, r0, #8^M
>> 0x00018b4c:  e1500003  cmp      r0, r3^M
>> 0x00018b50:  1a000005  bne      #0x18b6c^M
> 
> But this block neither jumps to, nor falls through to ....
>> ^M
>> R00=000003f8 R01=00000000 R02=00000554 R03=00040000^M
>> R04=00000000 R05=08000008 R06=00049418 R07=00000000^M
>> R08=00000000 R09=00000000 R10=000492d8 R11=fffeb4b4^M
>> R12=0000007f R13=fffeb460 R14=0000007e R15=00018b3c^M
>> PSR=20000010 --C- A usr32^M
>> R00=00049c30 R01=00000000 R02=00000554 R03=00049c30^M
>> R04=00000000 R05=08000008 R06=00049418 R07=00049840^M
>> R08=00000000 R09=00000000 R10=000492d8 R11=fffeb4b4^M
>> R12=0000007f R13=fffeb460 R14=0000007e R15=00018b54^M
>> PSR=60000010 -ZC- A usr32^M
>> ----------------^M
>> IN: _malloc_r^M
> 
> ...here.  So there's some trace missing by the looks of it; or some other problem.
> 
>> 0x00019120:  e1a02a0b  lsl      r2, fp, #0x14^M
>> 0x00019124:  e1a02a22  lsr      r2, r2, #0x14^M
>> 0x00019128:  e3520000  cmp      r2, #0^M
>> 0x0001912c:  1afffee7  bne      #0x18cd0^M
> 
> and the same here.
yes, qemu traces are 'incomplete'.

> 
>> ^M
>> R00=0004b000 R01=08002108 R02=00049e40 R03=0004b000^M
>> R04=0004a8e0 R05=08000008 R06=00049418 R07=00049840^M
>> R08=08001000 R09=00000720 R10=00049e0c R11=0004b000^M
>> R12=0000007f R13=fffeb460 R14=00018ca0 R15=00019120^M
>> PSR=60000010 -ZC- A usr32^M
>> ----------------^M
>> IN: _malloc_r^M
>> 0x00019130:  e5974008  ldr      r4, [r7, #8]^M
>> 0x00019134:  e0898008  add      r8, sb, r8^M
>> 0x00019138:  e3888001  orr      r8, r8, #1^M
>> 0x0001913c:  e5848004  str      r8, [r4, #4]^M
>> 0x00019140:  eaffff14  b        #0x18d98^M
>> ^M
>> R00=0004b000 R01=08002108 R02=00000000 R03=0004b000^M
>> R04=0004a8e0 R05=08000008 R06=00049418 R07=00049840^M
>> R08=08001000 R09=00000720 R10=00049e0c R11=0004b000^M
>> R12=0000007f R13=fffeb460 R14=00018ca0 R15=00019130^M
>> PSR=60000010 -ZC- A usr32^M
>> R00=0004b000 R01=08002108 R02=00000000 R03=0004b000^M
>> R04=0004a8e0 R05=08000008 R06=00049418 R07=00049840^M
>> R08=08001721 R09=00000720 R10=00049e0c R11=0004b000^M
>> R12=0000007f R13=fffeb460 R14=00018ca0 R15=00018d98^M
>> PSR=60000010 -ZC- A usr32^M
>>
>> Christophe
>>
> 
> .
>

Richard Earnshaw (lists) Oct. 24, 2019, 4:10 p.m. UTC | #10

On 24/10/2019 11:16, Christophe Lyon wrote:
> On 23/10/2019 15:21, Richard Earnshaw (lists) wrote:
>> On 23/10/2019 09:28, Christophe Lyon wrote:
>>> On 21/10/2019 14:24, Richard Earnshaw (lists) wrote:
>>>> On 21/10/2019 12:51, Christophe Lyon wrote:
>>>>> On 18/10/2019 21:48, Richard Earnshaw wrote:
>>>>>> Each patch should produce a working compiler (it did when it was
>>>>>> originally written), though since the patch set has been re-ordered
>>>>>> slightly there is a possibility that some of the intermediate steps
>>>>>> may have missing test updates that are only cleaned up later.
>>>>>> However, only the end of the series should be considered complete.
>>>>>> I've kept the patch as a series to permit easier regression hunting
>>>>>> should that prove necessary.
>>>>>
>>>>> Thanks for this information: my validation system was designed in 
>>>>> such a way that it will run the GCC testsuite after each of your 
>>>>> patches, so I'll keep in mind not to report regressions (I've 
>>>>> noticed several already).
>>>>>
>>>>>
>>>>> I can perform a manual validation taking your 29 patches as a 
>>>>> single one and compare the results with those of the revision 
>>>>> preceding the one were you committed patch #1. Do you think it 
>>>>> would be useful?
>>>>>
>>>>>
>>>>> Christophe
>>>>>
>>>>>
>>>>
>>>> I think if you can filter out any that are removed by later patches 
>>>> and then report against the patch that caused the regression itself 
>>>> then that would be the best.  But I realise that would be more work 
>>>> for you, so a round-up against the combined set would be OK.
>>>>
>>>> BTW, I'm aware of an issue with the compiler now generating
>>>>
>>>>      <alu> reg, reg, shift <reg>
>>>>
>>>> in Thumb2; no need to report that again.
>>>>
>>>> Thanks,
>>>> R.
>>>> .
>>>>
>>>
>>>
>>> Hi Richard,
>>>
>>> The validation of the whole set shows 1 regression, which was also 
>>> reported by the validation of r277179 (early split most DImode 
>>> comparison operations)
>>>
>>> When GCC is configured as:
>>> --target arm-none-eabi
>>> --with-mode default
>>> --with-cpu default
>>> --with-fpu default
>>> (that is, no --with-mode, --with-cpu, --with-fpu option)
>>> I'm using binutils-2.28 and newlib-3.1.0
>>>
>>> I can see:
>>> FAIL: g++.dg/opt/pr36449.C  -std=gnu++14 execution test
>>> (whatever -std=gnu++XX option)
>>
>> That's strange.  The assembler code generated for that test is 
>> unchanged from before the patch series, so I can't see how it can't be 
>> a problem in the test itself.  What's more, I can't seem to reproduce 
>> this myself.
> 
> As you have noticed, I have created PR92207 to help understand this.
> 
>>
>> Similarly, in my build the code for _Znwj, malloc, malloc_r and free_r 
>> are also unchanged, while the malloc_[un]lock functions are empty 
>> stubs (not surprising as we aren't multi-threaded).
>>
>> So the only thing that looks to have really changed are the linker 
>> offsets (some of the library code has changed, but I don't think it's 
>> really reached in practice, so shouldn't be relevant).
>>
>>>
>>> I'm executing the tests using qemu-4.1.0 -cpu arm926
>>> The qemu traces shows that code enters main, then _Znwj (operator 
>>> new), then _malloc_r
>>> The qemu traces end with:
>>
>> What do you mean by 'end with'?  What's the failure mode of the test?  
>> A crash, or the test exiting with a failure code?
>>
> qemu complains with:
> qemu: uncaught target signal 11 (Segmentation fault) - core dumped
> Segmentation fault (core dumped)
> 
> 'end with' because my automated validation builds do not keep the full 
> execution traces (that would need too much disk space)
> 

As I've said in the PR, this looks like a bug in the qemu+newlib code. 
We call sbrk() which says, OK, but then the page isn't mapped by qemu 
into the process and it then faults.

So I think these changes are off the hook, it's just bad luck that they 
expose the issue at this point in time.

R.

Richard Earnshaw (lists) Oct. 25, 2019, 10:08 a.m. UTC | #11

On 24/10/2019 17:10, Richard Earnshaw (lists) wrote:
> On 24/10/2019 11:16, Christophe Lyon wrote:
>> On 23/10/2019 15:21, Richard Earnshaw (lists) wrote:
>>> On 23/10/2019 09:28, Christophe Lyon wrote:
>>>> On 21/10/2019 14:24, Richard Earnshaw (lists) wrote:
>>>>> On 21/10/2019 12:51, Christophe Lyon wrote:
>>>>>> On 18/10/2019 21:48, Richard Earnshaw wrote:
>>>>>>> Each patch should produce a working compiler (it did when it was
>>>>>>> originally written), though since the patch set has been re-ordered
>>>>>>> slightly there is a possibility that some of the intermediate steps
>>>>>>> may have missing test updates that are only cleaned up later.
>>>>>>> However, only the end of the series should be considered complete.
>>>>>>> I've kept the patch as a series to permit easier regression hunting
>>>>>>> should that prove necessary.
>>>>>>
>>>>>> Thanks for this information: my validation system was designed in 
>>>>>> such a way that it will run the GCC testsuite after each of your 
>>>>>> patches, so I'll keep in mind not to report regressions (I've 
>>>>>> noticed several already).
>>>>>>
>>>>>>
>>>>>> I can perform a manual validation taking your 29 patches as a 
>>>>>> single one and compare the results with those of the revision 
>>>>>> preceding the one were you committed patch #1. Do you think it 
>>>>>> would be useful?
>>>>>>
>>>>>>
>>>>>> Christophe
>>>>>>
>>>>>>
>>>>>
>>>>> I think if you can filter out any that are removed by later patches 
>>>>> and then report against the patch that caused the regression itself 
>>>>> then that would be the best.  But I realise that would be more work 
>>>>> for you, so a round-up against the combined set would be OK.
>>>>>
>>>>> BTW, I'm aware of an issue with the compiler now generating
>>>>>
>>>>>      <alu> reg, reg, shift <reg>
>>>>>
>>>>> in Thumb2; no need to report that again.
>>>>>
>>>>> Thanks,
>>>>> R.
>>>>> .
>>>>>
>>>>
>>>>
>>>> Hi Richard,
>>>>
>>>> The validation of the whole set shows 1 regression, which was also 
>>>> reported by the validation of r277179 (early split most DImode 
>>>> comparison operations)
>>>>
>>>> When GCC is configured as:
>>>> --target arm-none-eabi
>>>> --with-mode default
>>>> --with-cpu default
>>>> --with-fpu default
>>>> (that is, no --with-mode, --with-cpu, --with-fpu option)
>>>> I'm using binutils-2.28 and newlib-3.1.0
>>>>
>>>> I can see:
>>>> FAIL: g++.dg/opt/pr36449.C  -std=gnu++14 execution test
>>>> (whatever -std=gnu++XX option)
>>>
>>> That's strange.  The assembler code generated for that test is 
>>> unchanged from before the patch series, so I can't see how it can't 
>>> be a problem in the test itself.  What's more, I can't seem to 
>>> reproduce this myself.
>>
>> As you have noticed, I have created PR92207 to help understand this.
>>
>>>
>>> Similarly, in my build the code for _Znwj, malloc, malloc_r and 
>>> free_r are also unchanged, while the malloc_[un]lock functions are 
>>> empty stubs (not surprising as we aren't multi-threaded).
>>>
>>> So the only thing that looks to have really changed are the linker 
>>> offsets (some of the library code has changed, but I don't think it's 
>>> really reached in practice, so shouldn't be relevant).
>>>
>>>>
>>>> I'm executing the tests using qemu-4.1.0 -cpu arm926
>>>> The qemu traces shows that code enters main, then _Znwj (operator 
>>>> new), then _malloc_r
>>>> The qemu traces end with:
>>>
>>> What do you mean by 'end with'?  What's the failure mode of the test? 
>>> A crash, or the test exiting with a failure code?
>>>
>> qemu complains with:
>> qemu: uncaught target signal 11 (Segmentation fault) - core dumped
>> Segmentation fault (core dumped)
>>
>> 'end with' because my automated validation builds do not keep the full 
>> execution traces (that would need too much disk space)
>>
> 
> As I've said in the PR, this looks like a bug in the qemu+newlib code. 
> We call sbrk() which says, OK, but then the page isn't mapped by qemu 
> into the process and it then faults.
> 
> So I think these changes are off the hook, it's just bad luck that they 
> expose the issue at this point in time.
> 
> R.
> 

I've closed the PR as invalid, because this is a newlib bug that is 
fixed on trunk.  https://sourceware.org/ml/newlib/2019/msg00413.html

R.

Christophe Lyon Oct. 25, 2019, 12:59 p.m. UTC | #12

On Fri, 25 Oct 2019 at 12:08, Richard Earnshaw (lists)
<Richard.Earnshaw@arm.com> wrote:
>
> On 24/10/2019 17:10, Richard Earnshaw (lists) wrote:
> > On 24/10/2019 11:16, Christophe Lyon wrote:
> >> On 23/10/2019 15:21, Richard Earnshaw (lists) wrote:
> >>> On 23/10/2019 09:28, Christophe Lyon wrote:
> >>>> On 21/10/2019 14:24, Richard Earnshaw (lists) wrote:
> >>>>> On 21/10/2019 12:51, Christophe Lyon wrote:
> >>>>>> On 18/10/2019 21:48, Richard Earnshaw wrote:
> >>>>>>> Each patch should produce a working compiler (it did when it was
> >>>>>>> originally written), though since the patch set has been re-ordered
> >>>>>>> slightly there is a possibility that some of the intermediate steps
> >>>>>>> may have missing test updates that are only cleaned up later.
> >>>>>>> However, only the end of the series should be considered complete.
> >>>>>>> I've kept the patch as a series to permit easier regression hunting
> >>>>>>> should that prove necessary.
> >>>>>>
> >>>>>> Thanks for this information: my validation system was designed in
> >>>>>> such a way that it will run the GCC testsuite after each of your
> >>>>>> patches, so I'll keep in mind not to report regressions (I've
> >>>>>> noticed several already).
> >>>>>>
> >>>>>>
> >>>>>> I can perform a manual validation taking your 29 patches as a
> >>>>>> single one and compare the results with those of the revision
> >>>>>> preceding the one were you committed patch #1. Do you think it
> >>>>>> would be useful?
> >>>>>>
> >>>>>>
> >>>>>> Christophe
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> I think if you can filter out any that are removed by later patches
> >>>>> and then report against the patch that caused the regression itself
> >>>>> then that would be the best.  But I realise that would be more work
> >>>>> for you, so a round-up against the combined set would be OK.
> >>>>>
> >>>>> BTW, I'm aware of an issue with the compiler now generating
> >>>>>
> >>>>>      <alu> reg, reg, shift <reg>
> >>>>>
> >>>>> in Thumb2; no need to report that again.
> >>>>>
> >>>>> Thanks,
> >>>>> R.
> >>>>> .
> >>>>>
> >>>>
> >>>>
> >>>> Hi Richard,
> >>>>
> >>>> The validation of the whole set shows 1 regression, which was also
> >>>> reported by the validation of r277179 (early split most DImode
> >>>> comparison operations)
> >>>>
> >>>> When GCC is configured as:
> >>>> --target arm-none-eabi
> >>>> --with-mode default
> >>>> --with-cpu default
> >>>> --with-fpu default
> >>>> (that is, no --with-mode, --with-cpu, --with-fpu option)
> >>>> I'm using binutils-2.28 and newlib-3.1.0
> >>>>
> >>>> I can see:
> >>>> FAIL: g++.dg/opt/pr36449.C  -std=gnu++14 execution test
> >>>> (whatever -std=gnu++XX option)
> >>>
> >>> That's strange.  The assembler code generated for that test is
> >>> unchanged from before the patch series, so I can't see how it can't
> >>> be a problem in the test itself.  What's more, I can't seem to
> >>> reproduce this myself.
> >>
> >> As you have noticed, I have created PR92207 to help understand this.
> >>
> >>>
> >>> Similarly, in my build the code for _Znwj, malloc, malloc_r and
> >>> free_r are also unchanged, while the malloc_[un]lock functions are
> >>> empty stubs (not surprising as we aren't multi-threaded).
> >>>
> >>> So the only thing that looks to have really changed are the linker
> >>> offsets (some of the library code has changed, but I don't think it's
> >>> really reached in practice, so shouldn't be relevant).
> >>>
> >>>>
> >>>> I'm executing the tests using qemu-4.1.0 -cpu arm926
> >>>> The qemu traces shows that code enters main, then _Znwj (operator
> >>>> new), then _malloc_r
> >>>> The qemu traces end with:
> >>>
> >>> What do you mean by 'end with'?  What's the failure mode of the test?
> >>> A crash, or the test exiting with a failure code?
> >>>
> >> qemu complains with:
> >> qemu: uncaught target signal 11 (Segmentation fault) - core dumped
> >> Segmentation fault (core dumped)
> >>
> >> 'end with' because my automated validation builds do not keep the full
> >> execution traces (that would need too much disk space)
> >>
> >
> > As I've said in the PR, this looks like a bug in the qemu+newlib code.
> > We call sbrk() which says, OK, but then the page isn't mapped by qemu
> > into the process and it then faults.
> >
> > So I think these changes are off the hook, it's just bad luck that they
> > expose the issue at this point in time.
> >
> > R.
> >
>
> I've closed the PR as invalid, because this is a newlib bug that is
> fixed on trunk.  https://sourceware.org/ml/newlib/2019/msg00413.html
>
Thanks for the analysis.
It looks like I have to upgrade the newlib version I'm using for validations.

Christophe

> R.

[00/29,arm] Rewrite DImode arithmetic support

Message

Comments