Message ID | 20111119013804.GC20336@kam.mff.cuni.cz |
---|---|
State | New |
Headers | show |
Hi, Continuing investigation of fails on bootstrap I found next problem (besides the problem with unknown alignment described above): there is a mess with size_needed and epilogue_size_needed when we generate epilogue loop which also use SSE-moves, but no unrolled - that's probably the reason of the fails we saw. Please check the attached patch - though the full testing isn't over yet. bootstraps seem to be ok as well as arrayarg.f90-test (with sse_loop enabled). On 19 November 2011 05:38, Jan Hubicka <hubicka@ucw.cz> wrote: >> Given that x86 memset/memcpy is still broken, I think we should revert >> it for now. > > Well, looking into the code, the SSE alignment issues needs work - the > alignment test merely tests whether some alignmnet is known not whether 16 byte > alignment is known that is the cause of failures in 32bit bootstrap. I originally > convinced myself that this is safe since we soot for unaligned load/stores anyway. > > > I've commited the following patch that disabled SSE codegen and unbreaks atom > bootstrap. This seems more sensible to me given that the patch cumulated some > good improvements on the non-SSE path as well and we could return into the SSE > alignment issues incremntally. There is still falure in the fortran testcase > that I am convinced is previously latent issue. > > I will be offline tomorrow. If there are futher serious problems, just fell > free to revert the changes and we could look into them for next stage1. > > Honza > > * i386.c (atom_cost): Disable SSE loop until alignment issues are fixed. > Index: i386.c > =================================================================== > --- i386.c (revision 181479) > +++ i386.c (working copy) > @@ -1783,18 +1783,18 @@ struct processor_costs atom_cost = { > /* stringop_algs for memcpy. > SSE loops works best on Atom, but fall back into non-SSE unrolled loop variant > if that fails. */ > - {{{libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}, /* Known alignment. */ > - {libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}}, > - {{libcall, {{2048, sse_loop}, {2048, unrolled_loop}, {-1, libcall}}}, /* Unknown alignment. */ > - {libcall, {{2048, sse_loop}, {2048, unrolled_loop}, > + {{{libcall, {{4096, unrolled_loop}, {-1, libcall}}}, /* Known alignment. */ > + {libcall, {{4096, unrolled_loop}, {-1, libcall}}}}, > + {{libcall, {{2048, unrolled_loop}, {-1, libcall}}}, /* Unknown alignment. */ > + {libcall, {{2048, unrolled_loop}, > {-1, libcall}}}}}, > > /* stringop_algs for memset. */ > - {{{libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}, /* Known alignment. */ > - {libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}}, > - {{libcall, {{1024, sse_loop}, {1024, unrolled_loop}, /* Unknown alignment. */ > + {{{libcall, {{4096, unrolled_loop}, {-1, libcall}}}, /* Known alignment. */ > + {libcall, {{4096, unrolled_loop}, {-1, libcall}}}}, > + {{libcall, {{1024, unrolled_loop}, /* Unknown alignment. */ > {-1, libcall}}}, > - {libcall, {{2048, sse_loop}, {2048, unrolled_loop}, > + {libcall, {{2048, unrolled_loop}, > {-1, libcall}}}}}, > 1, /* scalar_stmt_cost. */ > 1, /* scalar load_cost. */
I found and fixed another problem in the latest memcpy/memest changes - with this fix all the failing tests mentioned in #51134 started passing. Bootstraps are also ok. Though I still see fails in 32-bit make check, so probably, it'd be better to revert the changes till these fails are fixed. On 21 November 2011 20:36, Michael Zolotukhin <michael.v.zolotukhin@gmail.com> wrote: > Hi, > > Continuing investigation of fails on bootstrap I found next problem > (besides the problem with unknown alignment described above): there is > a mess with size_needed and epilogue_size_needed when we generate > epilogue loop which also use SSE-moves, but no unrolled - that's > probably the reason of the fails we saw. > > Please check the attached patch - though the full testing isn't over > yet. bootstraps seem to be ok as well as arrayarg.f90-test (with > sse_loop enabled). > > On 19 November 2011 05:38, Jan Hubicka <hubicka@ucw.cz> wrote: >>> Given that x86 memset/memcpy is still broken, I think we should revert >>> it for now. >> >> Well, looking into the code, the SSE alignment issues needs work - the >> alignment test merely tests whether some alignmnet is known not whether 16 byte >> alignment is known that is the cause of failures in 32bit bootstrap. I originally >> convinced myself that this is safe since we soot for unaligned load/stores anyway. >> >> >> I've commited the following patch that disabled SSE codegen and unbreaks atom >> bootstrap. This seems more sensible to me given that the patch cumulated some >> good improvements on the non-SSE path as well and we could return into the SSE >> alignment issues incremntally. There is still falure in the fortran testcase >> that I am convinced is previously latent issue. >> >> I will be offline tomorrow. If there are futher serious problems, just fell >> free to revert the changes and we could look into them for next stage1. >> >> Honza >> >> * i386.c (atom_cost): Disable SSE loop until alignment issues are fixed. >> Index: i386.c >> =================================================================== >> --- i386.c (revision 181479) >> +++ i386.c (working copy) >> @@ -1783,18 +1783,18 @@ struct processor_costs atom_cost = { >> /* stringop_algs for memcpy. >> SSE loops works best on Atom, but fall back into non-SSE unrolled loop variant >> if that fails. */ >> - {{{libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}, /* Known alignment. */ >> - {libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}}, >> - {{libcall, {{2048, sse_loop}, {2048, unrolled_loop}, {-1, libcall}}}, /* Unknown alignment. */ >> - {libcall, {{2048, sse_loop}, {2048, unrolled_loop}, >> + {{{libcall, {{4096, unrolled_loop}, {-1, libcall}}}, /* Known alignment. */ >> + {libcall, {{4096, unrolled_loop}, {-1, libcall}}}}, >> + {{libcall, {{2048, unrolled_loop}, {-1, libcall}}}, /* Unknown alignment. */ >> + {libcall, {{2048, unrolled_loop}, >> {-1, libcall}}}}}, >> >> /* stringop_algs for memset. */ >> - {{{libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}, /* Known alignment. */ >> - {libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}}, >> - {{libcall, {{1024, sse_loop}, {1024, unrolled_loop}, /* Unknown alignment. */ >> + {{{libcall, {{4096, unrolled_loop}, {-1, libcall}}}, /* Known alignment. */ >> + {libcall, {{4096, unrolled_loop}, {-1, libcall}}}}, >> + {{libcall, {{1024, unrolled_loop}, /* Unknown alignment. */ >> {-1, libcall}}}, >> - {libcall, {{2048, sse_loop}, {2048, unrolled_loop}, >> + {libcall, {{2048, unrolled_loop}, >> {-1, libcall}}}}}, >> 1, /* scalar_stmt_cost. */ >> 1, /* scalar load_cost. */ > > > > -- > --- > Best regards, > Michael V. Zolotukhin, > Software Engineer > Intel Corporation.
On Wed, Nov 23, 2011 at 3:32 PM, Michael Zolotukhin <michael.v.zolotukhin@gmail.com> wrote: > I found and fixed another problem in the latest memcpy/memest changes > - with this fix all the failing tests mentioned in #51134 started > passing. Bootstraps are also ok. > Though I still see fails in 32-bit make check, so probably, it'd be > better to revert the changes till these fails are fixed. > I will revert it for now.
> On Wed, Nov 23, 2011 at 3:32 PM, Michael Zolotukhin > <michael.v.zolotukhin@gmail.com> wrote: > > I found and fixed another problem in the latest memcpy/memest changes > > - with this fix all the failing tests mentioned in #51134 started > > passing. Bootstraps are also ok. > > Though I still see fails in 32-bit make check, so probably, it'd be > > better to revert the changes till these fails are fixed. > > > > I will revert it for now. OK. I guess I can break out the simple fixes and commit them for 4.7 and we could revisit this for next stage1. Probably not by adding all the features together, but extending prologues/epilogues first and adding SSE loops with the new alignment logic next. Honza > > -- > H.J.
Hi Jan, I debugged the changes, and I think I've hunted down all the bugs. I slightly refactored the code - now all new SSE-related code is more localized. Also, I fixed some alignment issues. Please find the new patch in the attachment (it's made against rev 181709) - is it ok for trunk? Bootstrap and 'make check' passed on Atom and Corei7 (32,64 bits). I also checked specs2000, eembc1_1 and eembc2_0 on Atom. On 26 November 2011 09:18, Jan Hubicka <hubicka@ucw.cz> wrote: >> On Wed, Nov 23, 2011 at 3:32 PM, Michael Zolotukhin >> <michael.v.zolotukhin@gmail.com> wrote: >> > I found and fixed another problem in the latest memcpy/memest changes >> > - with this fix all the failing tests mentioned in #51134 started >> > passing. Bootstraps are also ok. >> > Though I still see fails in 32-bit make check, so probably, it'd be >> > better to revert the changes till these fails are fixed. >> > >> >> I will revert it for now. > > OK. I guess I can break out the simple fixes and commit them for 4.7 and we > could revisit this for next stage1. Probably not by adding all the features > together, but extending prologues/epilogues first and adding SSE loops with > the new alignment logic next. > > Honza >> >> -- >> H.J.
Any update? On 5 December 2011 15:14, Michael Zolotukhin <michael.v.zolotukhin@gmail.com> wrote: > Hi Jan, > I debugged the changes, and I think I've hunted down all the bugs. > I slightly refactored the code - now all new SSE-related code is more > localized. Also, I fixed some alignment issues. > Please find the new patch in the attachment (it's made against rev > 181709) - is it ok for trunk? > > Bootstrap and 'make check' passed on Atom and Corei7 (32,64 bits). I > also checked specs2000, eembc1_1 and eembc2_0 on Atom. > > On 26 November 2011 09:18, Jan Hubicka <hubicka@ucw.cz> wrote: >>> On Wed, Nov 23, 2011 at 3:32 PM, Michael Zolotukhin >>> <michael.v.zolotukhin@gmail.com> wrote: >>> > I found and fixed another problem in the latest memcpy/memest changes >>> > - with this fix all the failing tests mentioned in #51134 started >>> > passing. Bootstraps are also ok. >>> > Though I still see fails in 32-bit make check, so probably, it'd be >>> > better to revert the changes till these fails are fixed. >>> > >>> >>> I will revert it for now. >> >> OK. I guess I can break out the simple fixes and commit them for 4.7 and we >> could revisit this for next stage1. Probably not by adding all the features >> together, but extending prologues/epilogues first and adding SSE loops with >> the new alignment logic next. >> >> Honza >>> >>> -- >>> H.J. > > > > -- > --- > Best regards, > Michael V. Zolotukhin, > Software Engineer > Intel Corporation.
> Any update?
I will look into it today, but anyway I think it is stage1 material, so we have some time to progress on it.
Honza
On Mon, Dec 12, 2011 at 6:02 AM, Jan Hubicka <hubicka@ucw.cz> wrote: >> Any update? > > I will look into it today, but anyway I think it is stage1 material, so we have some time to progress on it. > > Honza Hi Honza, The old patch was reverted and the new patch was posted at http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00336.html Have you got a chance to review it? Thanks.
> On Mon, Dec 12, 2011 at 6:02 AM, Jan Hubicka <hubicka@ucw.cz> wrote: > >> Any update? > > > > I will look into it today, but anyway I think it is stage1 material, so we have some time to progress on it. > > > > Honza > > Hi Honza, > > The old patch was reverted and the new patch was posted at > > http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00336.html > > Have you got a chance to review it? I am in China till 5th, I will try to return to it shortly after returning. Ping me again if not - there seems to be a lot of work left on this patch... Honza
On Fri, Aug 31, 2012 at 1:54 AM, Jan Hubicka <hubicka@ucw.cz> wrote: >> On Mon, Dec 12, 2011 at 6:02 AM, Jan Hubicka <hubicka@ucw.cz> wrote: >> >> Any update? >> > >> > I will look into it today, but anyway I think it is stage1 material, so we have some time to progress on it. >> > >> > Honza >> >> Hi Honza, >> >> The old patch was reverted and the new patch was posted at >> >> http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00336.html >> >> Have you got a chance to review it? > > I am in China till 5th, I will try to return to it shortly after returning. > Ping me again if not - there seems to be a lot of work left on this patch... > Hi Honza, Michael, Any changes to get it into GCC 4.8? Thanks.
Hi HJ, The last-year patch is currently almost useless, as efforts needed for its rebase seem to be almost the same as efforts needed for writing it from scratch. I hoped to make a patch covering at least subset of cases, but unfortunately haven't had time even for it yet. What time do we have for it now, when does stage1 finish? Thanks, Michael On 26 September 2012 19:00, H.J. Lu <hjl.tools@gmail.com> wrote: > On Fri, Aug 31, 2012 at 1:54 AM, Jan Hubicka <hubicka@ucw.cz> wrote: >>> On Mon, Dec 12, 2011 at 6:02 AM, Jan Hubicka <hubicka@ucw.cz> wrote: >>> >> Any update? >>> > >>> > I will look into it today, but anyway I think it is stage1 material, so we have some time to progress on it. >>> > >>> > Honza >>> >>> Hi Honza, >>> >>> The old patch was reverted and the new patch was posted at >>> >>> http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00336.html >>> >>> Have you got a chance to review it? >> >> I am in China till 5th, I will try to return to it shortly after returning. >> Ping me again if not - there seems to be a lot of work left on this patch... >> > > Hi Honza, Michael, > > Any changes to get it into GCC 4.8? > > Thanks. > > > -- > H.J.
Index: i386.c =================================================================== --- i386.c (revision 181479) +++ i386.c (working copy) @@ -1783,18 +1783,18 @@ struct processor_costs atom_cost = { /* stringop_algs for memcpy. SSE loops works best on Atom, but fall back into non-SSE unrolled loop variant if that fails. */ - {{{libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}, /* Known alignment. */ - {libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}}, - {{libcall, {{2048, sse_loop}, {2048, unrolled_loop}, {-1, libcall}}}, /* Unknown alignment. */ - {libcall, {{2048, sse_loop}, {2048, unrolled_loop}, + {{{libcall, {{4096, unrolled_loop}, {-1, libcall}}}, /* Known alignment. */ + {libcall, {{4096, unrolled_loop}, {-1, libcall}}}}, + {{libcall, {{2048, unrolled_loop}, {-1, libcall}}}, /* Unknown alignment. */ + {libcall, {{2048, unrolled_loop}, {-1, libcall}}}}}, /* stringop_algs for memset. */ - {{{libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}, /* Known alignment. */ - {libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}}, - {{libcall, {{1024, sse_loop}, {1024, unrolled_loop}, /* Unknown alignment. */ + {{{libcall, {{4096, unrolled_loop}, {-1, libcall}}}, /* Known alignment. */ + {libcall, {{4096, unrolled_loop}, {-1, libcall}}}}, + {{libcall, {{1024, unrolled_loop}, /* Unknown alignment. */ {-1, libcall}}}, - {libcall, {{2048, sse_loop}, {2048, unrolled_loop}, + {libcall, {{2048, unrolled_loop}, {-1, libcall}}}}}, 1, /* scalar_stmt_cost. */ 1, /* scalar load_cost. */