diff mbox

Memset/memcpy patch

Message ID 20111119013804.GC20336@kam.mff.cuni.cz
State New
Headers show

Commit Message

Jan Hubicka Nov. 19, 2011, 1:38 a.m. UTC
> Given that x86 memset/memcpy is still broken, I think we should revert
> it for now.

Well, looking into the code, the SSE alignment issues needs work - the
alignment test merely tests whether some alignmnet is known not whether 16 byte
alignment is known that is the cause of failures in 32bit bootstrap.  I originally
convinced myself that this is safe since we soot for unaligned load/stores anyway.


I've commited the following patch that disabled SSE codegen and unbreaks atom
bootstrap.  This seems more sensible to me given that the patch cumulated some
good improvements on the non-SSE path as well and we could return into the SSE
alignment issues incremntally.  There is still falure in the fortran testcase
that I am convinced is previously latent issue.

I will be offline tomorrow.  If there are futher serious problems, just fell
free to revert the changes and we could look into them for next stage1.

Honza

	* i386.c (atom_cost): Disable SSE loop until alignment issues are fixed.

Comments

Michael Zolotukhin Nov. 21, 2011, 4:36 p.m. UTC | #1
Hi,

Continuing investigation of fails on bootstrap I found next problem
(besides the problem with unknown alignment described above): there is
a mess with size_needed and epilogue_size_needed when we generate
epilogue loop which also use SSE-moves, but no unrolled - that's
probably the reason of the fails we saw.

Please check the attached patch - though the full testing isn't over
yet. bootstraps seem to be ok as well as arrayarg.f90-test (with
sse_loop enabled).

On 19 November 2011 05:38, Jan Hubicka <hubicka@ucw.cz> wrote:
>> Given that x86 memset/memcpy is still broken, I think we should revert
>> it for now.
>
> Well, looking into the code, the SSE alignment issues needs work - the
> alignment test merely tests whether some alignmnet is known not whether 16 byte
> alignment is known that is the cause of failures in 32bit bootstrap.  I originally
> convinced myself that this is safe since we soot for unaligned load/stores anyway.
>
>
> I've commited the following patch that disabled SSE codegen and unbreaks atom
> bootstrap.  This seems more sensible to me given that the patch cumulated some
> good improvements on the non-SSE path as well and we could return into the SSE
> alignment issues incremntally.  There is still falure in the fortran testcase
> that I am convinced is previously latent issue.
>
> I will be offline tomorrow.  If there are futher serious problems, just fell
> free to revert the changes and we could look into them for next stage1.
>
> Honza
>
>        * i386.c (atom_cost): Disable SSE loop until alignment issues are fixed.
> Index: i386.c
> ===================================================================
> --- i386.c      (revision 181479)
> +++ i386.c      (working copy)
> @@ -1783,18 +1783,18 @@ struct processor_costs atom_cost = {
>   /* stringop_algs for memcpy.
>      SSE loops works best on Atom, but fall back into non-SSE unrolled loop variant
>      if that fails.  */
> -  {{{libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}, /* Known alignment.  */
> -    {libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}},
> -   {{libcall, {{2048, sse_loop}, {2048, unrolled_loop}, {-1, libcall}}}, /* Unknown alignment.  */
> -    {libcall, {{2048, sse_loop}, {2048, unrolled_loop},
> +  {{{libcall, {{4096, unrolled_loop}, {-1, libcall}}}, /* Known alignment.  */
> +    {libcall, {{4096, unrolled_loop}, {-1, libcall}}}},
> +   {{libcall, {{2048, unrolled_loop}, {-1, libcall}}}, /* Unknown alignment.  */
> +    {libcall, {{2048, unrolled_loop},
>               {-1, libcall}}}}},
>
>   /* stringop_algs for memset.  */
> -  {{{libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}, /* Known alignment.  */
> -    {libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}},
> -   {{libcall, {{1024, sse_loop}, {1024, unrolled_loop},         /* Unknown alignment.  */
> +  {{{libcall, {{4096, unrolled_loop}, {-1, libcall}}}, /* Known alignment.  */
> +    {libcall, {{4096, unrolled_loop}, {-1, libcall}}}},
> +   {{libcall, {{1024, unrolled_loop},   /* Unknown alignment.  */
>               {-1, libcall}}},
> -    {libcall, {{2048, sse_loop}, {2048, unrolled_loop},
> +    {libcall, {{2048, unrolled_loop},
>               {-1, libcall}}}}},
>   1,                                   /* scalar_stmt_cost.  */
>   1,                                   /* scalar load_cost.  */
Michael Zolotukhin Nov. 23, 2011, 11:32 p.m. UTC | #2
I found and fixed another problem in the latest memcpy/memest changes
- with this fix all the failing tests mentioned in #51134 started
passing. Bootstraps are also ok.
Though I still see fails in 32-bit make check, so probably, it'd be
better to revert the changes till these fails are fixed.

On 21 November 2011 20:36, Michael Zolotukhin
<michael.v.zolotukhin@gmail.com> wrote:
> Hi,
>
> Continuing investigation of fails on bootstrap I found next problem
> (besides the problem with unknown alignment described above): there is
> a mess with size_needed and epilogue_size_needed when we generate
> epilogue loop which also use SSE-moves, but no unrolled - that's
> probably the reason of the fails we saw.
>
> Please check the attached patch - though the full testing isn't over
> yet. bootstraps seem to be ok as well as arrayarg.f90-test (with
> sse_loop enabled).
>
> On 19 November 2011 05:38, Jan Hubicka <hubicka@ucw.cz> wrote:
>>> Given that x86 memset/memcpy is still broken, I think we should revert
>>> it for now.
>>
>> Well, looking into the code, the SSE alignment issues needs work - the
>> alignment test merely tests whether some alignmnet is known not whether 16 byte
>> alignment is known that is the cause of failures in 32bit bootstrap.  I originally
>> convinced myself that this is safe since we soot for unaligned load/stores anyway.
>>
>>
>> I've commited the following patch that disabled SSE codegen and unbreaks atom
>> bootstrap.  This seems more sensible to me given that the patch cumulated some
>> good improvements on the non-SSE path as well and we could return into the SSE
>> alignment issues incremntally.  There is still falure in the fortran testcase
>> that I am convinced is previously latent issue.
>>
>> I will be offline tomorrow.  If there are futher serious problems, just fell
>> free to revert the changes and we could look into them for next stage1.
>>
>> Honza
>>
>>        * i386.c (atom_cost): Disable SSE loop until alignment issues are fixed.
>> Index: i386.c
>> ===================================================================
>> --- i386.c      (revision 181479)
>> +++ i386.c      (working copy)
>> @@ -1783,18 +1783,18 @@ struct processor_costs atom_cost = {
>>   /* stringop_algs for memcpy.
>>      SSE loops works best on Atom, but fall back into non-SSE unrolled loop variant
>>      if that fails.  */
>> -  {{{libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}, /* Known alignment.  */
>> -    {libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}},
>> -   {{libcall, {{2048, sse_loop}, {2048, unrolled_loop}, {-1, libcall}}}, /* Unknown alignment.  */
>> -    {libcall, {{2048, sse_loop}, {2048, unrolled_loop},
>> +  {{{libcall, {{4096, unrolled_loop}, {-1, libcall}}}, /* Known alignment.  */
>> +    {libcall, {{4096, unrolled_loop}, {-1, libcall}}}},
>> +   {{libcall, {{2048, unrolled_loop}, {-1, libcall}}}, /* Unknown alignment.  */
>> +    {libcall, {{2048, unrolled_loop},
>>               {-1, libcall}}}}},
>>
>>   /* stringop_algs for memset.  */
>> -  {{{libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}, /* Known alignment.  */
>> -    {libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}},
>> -   {{libcall, {{1024, sse_loop}, {1024, unrolled_loop},         /* Unknown alignment.  */
>> +  {{{libcall, {{4096, unrolled_loop}, {-1, libcall}}}, /* Known alignment.  */
>> +    {libcall, {{4096, unrolled_loop}, {-1, libcall}}}},
>> +   {{libcall, {{1024, unrolled_loop},   /* Unknown alignment.  */
>>               {-1, libcall}}},
>> -    {libcall, {{2048, sse_loop}, {2048, unrolled_loop},
>> +    {libcall, {{2048, unrolled_loop},
>>               {-1, libcall}}}}},
>>   1,                                   /* scalar_stmt_cost.  */
>>   1,                                   /* scalar load_cost.  */
>
>
>
> --
> ---
> Best regards,
> Michael V. Zolotukhin,
> Software Engineer
> Intel Corporation.
H.J. Lu Nov. 24, 2011, 6:31 p.m. UTC | #3
On Wed, Nov 23, 2011 at 3:32 PM, Michael Zolotukhin
<michael.v.zolotukhin@gmail.com> wrote:
> I found and fixed another problem in the latest memcpy/memest changes
> - with this fix all the failing tests mentioned in #51134 started
> passing. Bootstraps are also ok.
> Though I still see fails in 32-bit make check, so probably, it'd be
> better to revert the changes till these fails are fixed.
>

I will revert it for now.
Jan Hubicka Nov. 26, 2011, 5:18 a.m. UTC | #4
> On Wed, Nov 23, 2011 at 3:32 PM, Michael Zolotukhin
> <michael.v.zolotukhin@gmail.com> wrote:
> > I found and fixed another problem in the latest memcpy/memest changes
> > - with this fix all the failing tests mentioned in #51134 started
> > passing. Bootstraps are also ok.
> > Though I still see fails in 32-bit make check, so probably, it'd be
> > better to revert the changes till these fails are fixed.
> >
> 
> I will revert it for now.

OK.  I guess I can break out the simple fixes and commit them for 4.7 and we
could revisit this for next stage1. Probably not by adding all the features
together, but extending prologues/epilogues first and adding SSE loops with
the new alignment logic next.

Honza
> 
> -- 
> H.J.
Michael Zolotukhin Dec. 5, 2011, 11:14 a.m. UTC | #5
Hi Jan,
I debugged the changes, and I think I've hunted down all the bugs.
I slightly refactored the code - now all new SSE-related code is more
localized. Also, I fixed some alignment issues.
Please find the new patch in the attachment (it's made against rev
181709) - is it ok for trunk?

Bootstrap and 'make check' passed on Atom and Corei7 (32,64 bits). I
also checked specs2000, eembc1_1 and eembc2_0 on Atom.

On 26 November 2011 09:18, Jan Hubicka <hubicka@ucw.cz> wrote:
>> On Wed, Nov 23, 2011 at 3:32 PM, Michael Zolotukhin
>> <michael.v.zolotukhin@gmail.com> wrote:
>> > I found and fixed another problem in the latest memcpy/memest changes
>> > - with this fix all the failing tests mentioned in #51134 started
>> > passing. Bootstraps are also ok.
>> > Though I still see fails in 32-bit make check, so probably, it'd be
>> > better to revert the changes till these fails are fixed.
>> >
>>
>> I will revert it for now.
>
> OK.  I guess I can break out the simple fixes and commit them for 4.7 and we
> could revisit this for next stage1. Probably not by adding all the features
> together, but extending prologues/epilogues first and adding SSE loops with
> the new alignment logic next.
>
> Honza
>>
>> --
>> H.J.
Michael Zolotukhin Dec. 12, 2011, 11:58 a.m. UTC | #6
Any update?

On 5 December 2011 15:14, Michael Zolotukhin
<michael.v.zolotukhin@gmail.com> wrote:
> Hi Jan,
> I debugged the changes, and I think I've hunted down all the bugs.
> I slightly refactored the code - now all new SSE-related code is more
> localized. Also, I fixed some alignment issues.
> Please find the new patch in the attachment (it's made against rev
> 181709) - is it ok for trunk?
>
> Bootstrap and 'make check' passed on Atom and Corei7 (32,64 bits). I
> also checked specs2000, eembc1_1 and eembc2_0 on Atom.
>
> On 26 November 2011 09:18, Jan Hubicka <hubicka@ucw.cz> wrote:
>>> On Wed, Nov 23, 2011 at 3:32 PM, Michael Zolotukhin
>>> <michael.v.zolotukhin@gmail.com> wrote:
>>> > I found and fixed another problem in the latest memcpy/memest changes
>>> > - with this fix all the failing tests mentioned in #51134 started
>>> > passing. Bootstraps are also ok.
>>> > Though I still see fails in 32-bit make check, so probably, it'd be
>>> > better to revert the changes till these fails are fixed.
>>> >
>>>
>>> I will revert it for now.
>>
>> OK.  I guess I can break out the simple fixes and commit them for 4.7 and we
>> could revisit this for next stage1. Probably not by adding all the features
>> together, but extending prologues/epilogues first and adding SSE loops with
>> the new alignment logic next.
>>
>> Honza
>>>
>>> --
>>> H.J.
>
>
>
> --
> ---
> Best regards,
> Michael V. Zolotukhin,
> Software Engineer
> Intel Corporation.
Jan Hubicka Dec. 12, 2011, 2:02 p.m. UTC | #7
> Any update?

I will look into it today, but anyway I think it is stage1 material, so we have some time to progress on it.

Honza
H.J. Lu Aug. 30, 2012, 6:06 p.m. UTC | #8
On Mon, Dec 12, 2011 at 6:02 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> Any update?
>
> I will look into it today, but anyway I think it is stage1 material, so we have some time to progress on it.
>
> Honza

Hi Honza,

The old patch was reverted and the new patch was posted at

http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00336.html

Have you got a chance to review it?

Thanks.
Jan Hubicka Aug. 31, 2012, 8:54 a.m. UTC | #9
> On Mon, Dec 12, 2011 at 6:02 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
> >> Any update?
> >
> > I will look into it today, but anyway I think it is stage1 material, so we have some time to progress on it.
> >
> > Honza
> 
> Hi Honza,
> 
> The old patch was reverted and the new patch was posted at
> 
> http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00336.html
> 
> Have you got a chance to review it?

I am in China till 5th, I will try to return to it shortly after returning.
Ping me again if not - there seems to be a lot of work left on this patch...

Honza
H.J. Lu Sept. 26, 2012, 3 p.m. UTC | #10
On Fri, Aug 31, 2012 at 1:54 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> On Mon, Dec 12, 2011 at 6:02 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> >> Any update?
>> >
>> > I will look into it today, but anyway I think it is stage1 material, so we have some time to progress on it.
>> >
>> > Honza
>>
>> Hi Honza,
>>
>> The old patch was reverted and the new patch was posted at
>>
>> http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00336.html
>>
>> Have you got a chance to review it?
>
> I am in China till 5th, I will try to return to it shortly after returning.
> Ping me again if not - there seems to be a lot of work left on this patch...
>

Hi Honza, Michael,

Any changes to get it into GCC 4.8?

Thanks.
Michael Zolotukhin Sept. 26, 2012, 9:38 p.m. UTC | #11
Hi HJ,
The last-year patch is currently almost useless, as efforts needed for
its rebase seem to be almost the same as efforts needed for writing it
from scratch. I hoped to make a patch covering at least subset of
cases, but unfortunately haven't had time even for it yet.

What time do we have for it now, when does stage1 finish?

Thanks, Michael

On 26 September 2012 19:00, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Aug 31, 2012 at 1:54 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>> On Mon, Dec 12, 2011 at 6:02 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>> >> Any update?
>>> >
>>> > I will look into it today, but anyway I think it is stage1 material, so we have some time to progress on it.
>>> >
>>> > Honza
>>>
>>> Hi Honza,
>>>
>>> The old patch was reverted and the new patch was posted at
>>>
>>> http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00336.html
>>>
>>> Have you got a chance to review it?
>>
>> I am in China till 5th, I will try to return to it shortly after returning.
>> Ping me again if not - there seems to be a lot of work left on this patch...
>>
>
> Hi Honza, Michael,
>
> Any changes to get it into GCC 4.8?
>
> Thanks.
>
>
> --
> H.J.
diff mbox

Patch

Index: i386.c
===================================================================
--- i386.c	(revision 181479)
+++ i386.c	(working copy)
@@ -1783,18 +1783,18 @@  struct processor_costs atom_cost = {
   /* stringop_algs for memcpy.  
      SSE loops works best on Atom, but fall back into non-SSE unrolled loop variant
      if that fails.  */
-  {{{libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}, /* Known alignment.  */
-    {libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}},
-   {{libcall, {{2048, sse_loop}, {2048, unrolled_loop}, {-1, libcall}}}, /* Unknown alignment.  */
-    {libcall, {{2048, sse_loop}, {2048, unrolled_loop},
+  {{{libcall, {{4096, unrolled_loop}, {-1, libcall}}}, /* Known alignment.  */
+    {libcall, {{4096, unrolled_loop}, {-1, libcall}}}},
+   {{libcall, {{2048, unrolled_loop}, {-1, libcall}}}, /* Unknown alignment.  */
+    {libcall, {{2048, unrolled_loop},
 	       {-1, libcall}}}}},
 
   /* stringop_algs for memset.  */
-  {{{libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}, /* Known alignment.  */
-    {libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}},
-   {{libcall, {{1024, sse_loop}, {1024, unrolled_loop},	 /* Unknown alignment.  */
+  {{{libcall, {{4096, unrolled_loop}, {-1, libcall}}}, /* Known alignment.  */
+    {libcall, {{4096, unrolled_loop}, {-1, libcall}}}},
+   {{libcall, {{1024, unrolled_loop},	 /* Unknown alignment.  */
 	       {-1, libcall}}},
-    {libcall, {{2048, sse_loop}, {2048, unrolled_loop},
+    {libcall, {{2048, unrolled_loop},
 	       {-1, libcall}}}}},
   1,					/* scalar_stmt_cost.  */
   1,					/* scalar load_cost.  */