Message ID | AM5PR0802MB2610FED9736F1EBB2A34EAF783F90@AM5PR0802MB2610.eurprd08.prod.outlook.com |
---|---|
State | New |
Headers | show |
On 06/09/16 14:14, Wilco Dijkstra wrote: > Improve aarch64_legitimize_address - avoid splitting the offset if it is > supported. When we do split, take the mode size into account. BLKmode > falls into the unaligned case but should be treated like LDP/STP. > This improves codesize slightly due to fewer base address calculations: > > int f(int *p) { return p[5000] + p[7000]; } > > Now generates: > > f: > add x0, x0, 16384 > ldr w1, [x0, 3616] > ldr w0, [x0, 11616] > add w0, w1, w0 > ret > > instead of: > > f: > add x1, x0, 16384 > add x0, x0, 24576 > ldr w1, [x1, 3616] > ldr w0, [x0, 3424] > add w0, w1, w0 > ret > > OK for trunk? > > ChangeLog: > 2016-09-06 Wilco Dijkstra <wdijkstr@arm.com> > > gcc/ > * config/aarch64/aarch64.c (aarch64_legitimize_address): > Avoid use of base_offset if offset already in range. OK. R. > -- > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 27bbdbad8cddc576f9ed4fd0670116bd6d318412..119ff0aecb0c9f88899fa141b2c7f9158281f9c3 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -5058,9 +5058,19 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x */, machine_mode mode) > /* For offsets aren't a multiple of the access size, the limit is > -256...255. */ > else if (offset & (GET_MODE_SIZE (mode) - 1)) > - base_offset = (offset + 0x100) & ~0x1ff; > + { > + base_offset = (offset + 0x100) & ~0x1ff; > + > + /* BLKmode typically uses LDP of X-registers. */ > + if (mode == BLKmode) > + base_offset = (offset + 512) & ~0x3ff; > + } > + /* Small negative offsets are supported. */ > + else if (IN_RANGE (offset, -256, 0)) > + base_offset = 0; > + /* Use 12-bit offset by access size. */ > else > - base_offset = offset & ~0xfff; > + base_offset = offset & (~0xfff * GET_MODE_SIZE (mode)); > > if (base_offset != 0) > { >
Hi Wilco, On 7 September 2016 at 14:43, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote: > On 06/09/16 14:14, Wilco Dijkstra wrote: >> Improve aarch64_legitimize_address - avoid splitting the offset if it is >> supported. When we do split, take the mode size into account. BLKmode >> falls into the unaligned case but should be treated like LDP/STP. >> This improves codesize slightly due to fewer base address calculations: >> >> int f(int *p) { return p[5000] + p[7000]; } >> >> Now generates: >> >> f: >> add x0, x0, 16384 >> ldr w1, [x0, 3616] >> ldr w0, [x0, 11616] >> add w0, w1, w0 >> ret >> >> instead of: >> >> f: >> add x1, x0, 16384 >> add x0, x0, 24576 >> ldr w1, [x1, 3616] >> ldr w0, [x0, 3424] >> add w0, w1, w0 >> ret >> >> OK for trunk? >> >> ChangeLog: >> 2016-09-06 Wilco Dijkstra <wdijkstr@arm.com> >> >> gcc/ >> * config/aarch64/aarch64.c (aarch64_legitimize_address): >> Avoid use of base_offset if offset already in range. > > OK. > > R. After this patch, I've noticed a regression: FAIL: gcc.target/aarch64/ldp_vec_64_1.c scan-assembler ldp\td[0-9]+, d[0-9] You probably need to adjust the scan pattern. Thanks, Christophe > >> -- >> >> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >> index 27bbdbad8cddc576f9ed4fd0670116bd6d318412..119ff0aecb0c9f88899fa141b2c7f9158281f9c3 100644 >> --- a/gcc/config/aarch64/aarch64.c >> +++ b/gcc/config/aarch64/aarch64.c >> @@ -5058,9 +5058,19 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x */, machine_mode mode) >> /* For offsets aren't a multiple of the access size, the limit is >> -256...255. */ >> else if (offset & (GET_MODE_SIZE (mode) - 1)) >> - base_offset = (offset + 0x100) & ~0x1ff; >> + { >> + base_offset = (offset + 0x100) & ~0x1ff; >> + >> + /* BLKmode typically uses LDP of X-registers. */ >> + if (mode == BLKmode) >> + base_offset = (offset + 512) & ~0x3ff; >> + } >> + /* Small negative offsets are supported. */ >> + else if (IN_RANGE (offset, -256, 0)) >> + base_offset = 0; >> + /* Use 12-bit offset by access size. */ >> else >> - base_offset = offset & ~0xfff; >> + base_offset = offset & (~0xfff * GET_MODE_SIZE (mode)); >> >> if (base_offset != 0) >> { >> >
Christophe Lyon wrote: > After this patch, I've noticed a regression: > FAIL: gcc.target/aarch64/ldp_vec_64_1.c scan-assembler ldp\td[0-9]+, d[0-9] > You probably need to adjust the scan pattern. The original code was better and what we should generate. IVOpt chooses to use indexing eventhough it is more expensive, and that is a latent bug. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65068 which shows the same cost calculation issue. Wilco
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 27bbdbad8cddc576f9ed4fd0670116bd6d318412..119ff0aecb0c9f88899fa141b2c7f9158281f9c3 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -5058,9 +5058,19 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x */, machine_mode mode) /* For offsets aren't a multiple of the access size, the limit is -256...255. */ else if (offset & (GET_MODE_SIZE (mode) - 1)) - base_offset = (offset + 0x100) & ~0x1ff; + { + base_offset = (offset + 0x100) & ~0x1ff; + + /* BLKmode typically uses LDP of X-registers. */ + if (mode == BLKmode) + base_offset = (offset + 512) & ~0x3ff; + } + /* Small negative offsets are supported. */ + else if (IN_RANGE (offset, -256, 0)) + base_offset = 0; + /* Use 12-bit offset by access size. */ else - base_offset = offset & ~0xfff; + base_offset = offset & (~0xfff * GET_MODE_SIZE (mode)); if (base_offset != 0) {