Message ID | CAO2gOZVr=f-Cgz1DUC0=gRcFEGyO+uRd=iDL8SeMfJwVeVia5w@mail.gmail.com |
---|---|
State | New |
Headers | show |
To be clear, this flag is for malloc implementation (such as tcmalloc) with side effect unknown to the compiler. Using -fno-builtin-xxx is too conservative for that purpose. David On Tue, May 8, 2012 at 7:43 AM, Dehao Chen <dehao@google.com> wrote: > Hello, > > This patch adds a flag to guard the optimization that optimize the > following code away: > > free (malloc (4)); > > In some cases, we'd like this type of malloc/free pairs to remain in > the optimized code. > > Tested with bootstrap, and no regression in the gcc testsuite. > > Is it ok for mainline? > > Thanks, > Dehao > > gcc/ChangeLog > 2012-05-08 Dehao Chen <dehao@google.com> > > * common.opt (feliminate-malloc): New. > * doc/invoke.texi: Document it. > * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Honor it. > > gcc/testsuite/ChangeLog > 2012-05-08 Dehao Chen <dehao@google.com> > > * gcc.dg/free-malloc.c: Check if -fno-eliminate-malloc is working > as expected. > > Index: gcc/doc/invoke.texi > =================================================================== > --- gcc/doc/invoke.texi (revision 187277) > +++ gcc/doc/invoke.texi (working copy) > @@ -360,7 +360,8 @@ > -fcx-limited-range @gol > -fdata-sections -fdce -fdelayed-branch @gol > -fdelete-null-pointer-checks -fdevirtualize -fdse @gol > --fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects @gol > +-fearly-inlining -feliminate-malloc -fipa-sra -fexpensive-optimizations @gol > +-ffat-lto-objects @gol > -ffast-math -ffinite-math-only -ffloat-store > -fexcess-precision=@var{style} @gol > -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol > -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol > @@ -6238,6 +6239,7 @@ > -fdefer-pop @gol > -fdelayed-branch @gol > -fdse @gol > +-feliminate-malloc @gol > -fguess-branch-probability @gol > -fif-conversion2 @gol > -fif-conversion @gol > @@ -6762,6 +6764,11 @@ > Perform dead store elimination (DSE) on RTL@. > Enabled by default at @option{-O} and higher. > > +@item -feliminate-malloc > +@opindex feliminate-malloc > +Eliminate unnecessary malloc/free pairs. > +Enabled by default at @option{-O} and higher. > + > @item -fif-conversion > @opindex fif-conversion > Attempt to transform conditional jumps into branch-less equivalents. This > Index: gcc/testsuite/gcc.dg/free-malloc.c > =================================================================== > --- gcc/testsuite/gcc.dg/free-malloc.c (revision 0) > +++ gcc/testsuite/gcc.dg/free-malloc.c (revision 0) > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fno-eliminate-malloc" } */ > +/* { dg-final { scan-assembler-times "malloc" 2} } */ > +/* { dg-final { scan-assembler-times "free" 2} } */ > + > +extern void * malloc (unsigned long); > +extern void free (void *); > + > +void test () > +{ > + free (malloc (10)); > +} > Index: gcc/common.opt > =================================================================== > --- gcc/common.opt (revision 187277) > +++ gcc/common.opt (working copy) > @@ -1474,6 +1474,10 @@ > Common Var(flag_dce) Init(1) Optimization > Use the RTL dead code elimination pass > > +feliminate-malloc > +Common Var(flag_eliminate_malloc) Init(1) Optimization > +Eliminate unnecessary malloc/free pairs > + > fdse > Common Var(flag_dse) Init(1) Optimization > Use the RTL dead store elimination pass > Index: gcc/tree-ssa-dce.c > =================================================================== > --- gcc/tree-ssa-dce.c (revision 187277) > +++ gcc/tree-ssa-dce.c (working copy) > @@ -309,6 +309,8 @@ > case BUILT_IN_CALLOC: > case BUILT_IN_ALLOCA: > case BUILT_IN_ALLOCA_WITH_ALIGN: > + if (!flag_eliminate_malloc) > + mark_stmt_necessary (stmt, true); > return; > > default:;
On Tue, May 8, 2012 at 6:18 PM, Xinliang David Li <davidxl@google.com> wrote: > To be clear, this flag is for malloc implementation (such as tcmalloc) > with side effect unknown to the compiler. Using -fno-builtin-xxx is > too conservative for that purpose. I don't think that flys. Btw, the patch also guards alloca - alloca is purely GCC internal. What's the "unknown side-effects" that are also important to preserve for free(malloc(4))? Richard. > David > > On Tue, May 8, 2012 at 7:43 AM, Dehao Chen <dehao@google.com> wrote: >> Hello, >> >> This patch adds a flag to guard the optimization that optimize the >> following code away: >> >> free (malloc (4)); >> >> In some cases, we'd like this type of malloc/free pairs to remain in >> the optimized code. >> >> Tested with bootstrap, and no regression in the gcc testsuite. >> >> Is it ok for mainline? >> >> Thanks, >> Dehao >> >> gcc/ChangeLog >> 2012-05-08 Dehao Chen <dehao@google.com> >> >> * common.opt (feliminate-malloc): New. >> * doc/invoke.texi: Document it. >> * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Honor it. >> >> gcc/testsuite/ChangeLog >> 2012-05-08 Dehao Chen <dehao@google.com> >> >> * gcc.dg/free-malloc.c: Check if -fno-eliminate-malloc is working >> as expected. >> >> Index: gcc/doc/invoke.texi >> =================================================================== >> --- gcc/doc/invoke.texi (revision 187277) >> +++ gcc/doc/invoke.texi (working copy) >> @@ -360,7 +360,8 @@ >> -fcx-limited-range @gol >> -fdata-sections -fdce -fdelayed-branch @gol >> -fdelete-null-pointer-checks -fdevirtualize -fdse @gol >> --fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects @gol >> +-fearly-inlining -feliminate-malloc -fipa-sra -fexpensive-optimizations @gol >> +-ffat-lto-objects @gol >> -ffast-math -ffinite-math-only -ffloat-store >> -fexcess-precision=@var{style} @gol >> -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol >> -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol >> @@ -6238,6 +6239,7 @@ >> -fdefer-pop @gol >> -fdelayed-branch @gol >> -fdse @gol >> +-feliminate-malloc @gol >> -fguess-branch-probability @gol >> -fif-conversion2 @gol >> -fif-conversion @gol >> @@ -6762,6 +6764,11 @@ >> Perform dead store elimination (DSE) on RTL@. >> Enabled by default at @option{-O} and higher. >> >> +@item -feliminate-malloc >> +@opindex feliminate-malloc >> +Eliminate unnecessary malloc/free pairs. >> +Enabled by default at @option{-O} and higher. >> + >> @item -fif-conversion >> @opindex fif-conversion >> Attempt to transform conditional jumps into branch-less equivalents. This >> Index: gcc/testsuite/gcc.dg/free-malloc.c >> =================================================================== >> --- gcc/testsuite/gcc.dg/free-malloc.c (revision 0) >> +++ gcc/testsuite/gcc.dg/free-malloc.c (revision 0) >> @@ -0,0 +1,12 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2 -fno-eliminate-malloc" } */ >> +/* { dg-final { scan-assembler-times "malloc" 2} } */ >> +/* { dg-final { scan-assembler-times "free" 2} } */ >> + >> +extern void * malloc (unsigned long); >> +extern void free (void *); >> + >> +void test () >> +{ >> + free (malloc (10)); >> +} >> Index: gcc/common.opt >> =================================================================== >> --- gcc/common.opt (revision 187277) >> +++ gcc/common.opt (working copy) >> @@ -1474,6 +1474,10 @@ >> Common Var(flag_dce) Init(1) Optimization >> Use the RTL dead code elimination pass >> >> +feliminate-malloc >> +Common Var(flag_eliminate_malloc) Init(1) Optimization >> +Eliminate unnecessary malloc/free pairs >> + >> fdse >> Common Var(flag_dse) Init(1) Optimization >> Use the RTL dead store elimination pass >> Index: gcc/tree-ssa-dce.c >> =================================================================== >> --- gcc/tree-ssa-dce.c (revision 187277) >> +++ gcc/tree-ssa-dce.c (working copy) >> @@ -309,6 +309,8 @@ >> case BUILT_IN_CALLOC: >> case BUILT_IN_ALLOCA: >> case BUILT_IN_ALLOCA_WITH_ALIGN: >> + if (!flag_eliminate_malloc) >> + mark_stmt_necessary (stmt, true); >> return; >> >> default:;
On Wed, May 9, 2012 at 4:12 PM, Richard Guenther <richard.guenther@gmail.com> wrote: > On Tue, May 8, 2012 at 6:18 PM, Xinliang David Li <davidxl@google.com> wrote: >> To be clear, this flag is for malloc implementation (such as tcmalloc) >> with side effect unknown to the compiler. Using -fno-builtin-xxx is >> too conservative for that purpose. > > I don't think that flys. Btw, the patch also guards alloca - alloca is purely > GCC internal. > > What's the "unknown side-effects" that are also important to preserve > for free(malloc(4))? Malloc implementation may record some info to a global structure, and a program may use this free(malloc()) pair to simulate the real runs to get some data, such as peak memory requirement. Dehao > > Richard. > >> David >> >> On Tue, May 8, 2012 at 7:43 AM, Dehao Chen <dehao@google.com> wrote: >>> Hello, >>> >>> This patch adds a flag to guard the optimization that optimize the >>> following code away: >>> >>> free (malloc (4)); >>> >>> In some cases, we'd like this type of malloc/free pairs to remain in >>> the optimized code. >>> >>> Tested with bootstrap, and no regression in the gcc testsuite. >>> >>> Is it ok for mainline? >>> >>> Thanks, >>> Dehao >>> >>> gcc/ChangeLog >>> 2012-05-08 Dehao Chen <dehao@google.com> >>> >>> * common.opt (feliminate-malloc): New. >>> * doc/invoke.texi: Document it. >>> * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Honor it. >>> >>> gcc/testsuite/ChangeLog >>> 2012-05-08 Dehao Chen <dehao@google.com> >>> >>> * gcc.dg/free-malloc.c: Check if -fno-eliminate-malloc is working >>> as expected. >>> >>> Index: gcc/doc/invoke.texi >>> =================================================================== >>> --- gcc/doc/invoke.texi (revision 187277) >>> +++ gcc/doc/invoke.texi (working copy) >>> @@ -360,7 +360,8 @@ >>> -fcx-limited-range @gol >>> -fdata-sections -fdce -fdelayed-branch @gol >>> -fdelete-null-pointer-checks -fdevirtualize -fdse @gol >>> --fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects @gol >>> +-fearly-inlining -feliminate-malloc -fipa-sra -fexpensive-optimizations @gol >>> +-ffat-lto-objects @gol >>> -ffast-math -ffinite-math-only -ffloat-store >>> -fexcess-precision=@var{style} @gol >>> -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol >>> -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol >>> @@ -6238,6 +6239,7 @@ >>> -fdefer-pop @gol >>> -fdelayed-branch @gol >>> -fdse @gol >>> +-feliminate-malloc @gol >>> -fguess-branch-probability @gol >>> -fif-conversion2 @gol >>> -fif-conversion @gol >>> @@ -6762,6 +6764,11 @@ >>> Perform dead store elimination (DSE) on RTL@. >>> Enabled by default at @option{-O} and higher. >>> >>> +@item -feliminate-malloc >>> +@opindex feliminate-malloc >>> +Eliminate unnecessary malloc/free pairs. >>> +Enabled by default at @option{-O} and higher. >>> + >>> @item -fif-conversion >>> @opindex fif-conversion >>> Attempt to transform conditional jumps into branch-less equivalents. This >>> Index: gcc/testsuite/gcc.dg/free-malloc.c >>> =================================================================== >>> --- gcc/testsuite/gcc.dg/free-malloc.c (revision 0) >>> +++ gcc/testsuite/gcc.dg/free-malloc.c (revision 0) >>> @@ -0,0 +1,12 @@ >>> +/* { dg-do compile } */ >>> +/* { dg-options "-O2 -fno-eliminate-malloc" } */ >>> +/* { dg-final { scan-assembler-times "malloc" 2} } */ >>> +/* { dg-final { scan-assembler-times "free" 2} } */ >>> + >>> +extern void * malloc (unsigned long); >>> +extern void free (void *); >>> + >>> +void test () >>> +{ >>> + free (malloc (10)); >>> +} >>> Index: gcc/common.opt >>> =================================================================== >>> --- gcc/common.opt (revision 187277) >>> +++ gcc/common.opt (working copy) >>> @@ -1474,6 +1474,10 @@ >>> Common Var(flag_dce) Init(1) Optimization >>> Use the RTL dead code elimination pass >>> >>> +feliminate-malloc >>> +Common Var(flag_eliminate_malloc) Init(1) Optimization >>> +Eliminate unnecessary malloc/free pairs >>> + >>> fdse >>> Common Var(flag_dse) Init(1) Optimization >>> Use the RTL dead store elimination pass >>> Index: gcc/tree-ssa-dce.c >>> =================================================================== >>> --- gcc/tree-ssa-dce.c (revision 187277) >>> +++ gcc/tree-ssa-dce.c (working copy) >>> @@ -309,6 +309,8 @@ >>> case BUILT_IN_CALLOC: >>> case BUILT_IN_ALLOCA: >>> case BUILT_IN_ALLOCA_WITH_ALIGN: >>> + if (!flag_eliminate_malloc) >>> + mark_stmt_necessary (stmt, true); >>> return; >>> >>> default:;
On Wed, May 9, 2012 at 10:38 AM, Dehao Chen <dehao@google.com> wrote: > On Wed, May 9, 2012 at 4:12 PM, Richard Guenther > <richard.guenther@gmail.com> wrote: >> On Tue, May 8, 2012 at 6:18 PM, Xinliang David Li <davidxl@google.com> wrote: >>> To be clear, this flag is for malloc implementation (such as tcmalloc) >>> with side effect unknown to the compiler. Using -fno-builtin-xxx is >>> too conservative for that purpose. >> >> I don't think that flys. Btw, the patch also guards alloca - alloca is purely >> GCC internal. >> >> What's the "unknown side-effects" that are also important to preserve >> for free(malloc(4))? > > Malloc implementation may record some info to a global structure, and > a program may use this free(malloc()) pair to simulate the real runs > to get some data, such as peak memory requirement. So why not use an alternate interface into this special allocator for this purpose? > Dehao > >> >> Richard. >> >>> David >>> >>> On Tue, May 8, 2012 at 7:43 AM, Dehao Chen <dehao@google.com> wrote: >>>> Hello, >>>> >>>> This patch adds a flag to guard the optimization that optimize the >>>> following code away: >>>> >>>> free (malloc (4)); >>>> >>>> In some cases, we'd like this type of malloc/free pairs to remain in >>>> the optimized code. >>>> >>>> Tested with bootstrap, and no regression in the gcc testsuite. >>>> >>>> Is it ok for mainline? >>>> >>>> Thanks, >>>> Dehao >>>> >>>> gcc/ChangeLog >>>> 2012-05-08 Dehao Chen <dehao@google.com> >>>> >>>> * common.opt (feliminate-malloc): New. >>>> * doc/invoke.texi: Document it. >>>> * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Honor it. >>>> >>>> gcc/testsuite/ChangeLog >>>> 2012-05-08 Dehao Chen <dehao@google.com> >>>> >>>> * gcc.dg/free-malloc.c: Check if -fno-eliminate-malloc is working >>>> as expected. >>>> >>>> Index: gcc/doc/invoke.texi >>>> =================================================================== >>>> --- gcc/doc/invoke.texi (revision 187277) >>>> +++ gcc/doc/invoke.texi (working copy) >>>> @@ -360,7 +360,8 @@ >>>> -fcx-limited-range @gol >>>> -fdata-sections -fdce -fdelayed-branch @gol >>>> -fdelete-null-pointer-checks -fdevirtualize -fdse @gol >>>> --fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects @gol >>>> +-fearly-inlining -feliminate-malloc -fipa-sra -fexpensive-optimizations @gol >>>> +-ffat-lto-objects @gol >>>> -ffast-math -ffinite-math-only -ffloat-store >>>> -fexcess-precision=@var{style} @gol >>>> -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol >>>> -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol >>>> @@ -6238,6 +6239,7 @@ >>>> -fdefer-pop @gol >>>> -fdelayed-branch @gol >>>> -fdse @gol >>>> +-feliminate-malloc @gol >>>> -fguess-branch-probability @gol >>>> -fif-conversion2 @gol >>>> -fif-conversion @gol >>>> @@ -6762,6 +6764,11 @@ >>>> Perform dead store elimination (DSE) on RTL@. >>>> Enabled by default at @option{-O} and higher. >>>> >>>> +@item -feliminate-malloc >>>> +@opindex feliminate-malloc >>>> +Eliminate unnecessary malloc/free pairs. >>>> +Enabled by default at @option{-O} and higher. >>>> + >>>> @item -fif-conversion >>>> @opindex fif-conversion >>>> Attempt to transform conditional jumps into branch-less equivalents. This >>>> Index: gcc/testsuite/gcc.dg/free-malloc.c >>>> =================================================================== >>>> --- gcc/testsuite/gcc.dg/free-malloc.c (revision 0) >>>> +++ gcc/testsuite/gcc.dg/free-malloc.c (revision 0) >>>> @@ -0,0 +1,12 @@ >>>> +/* { dg-do compile } */ >>>> +/* { dg-options "-O2 -fno-eliminate-malloc" } */ >>>> +/* { dg-final { scan-assembler-times "malloc" 2} } */ >>>> +/* { dg-final { scan-assembler-times "free" 2} } */ >>>> + >>>> +extern void * malloc (unsigned long); >>>> +extern void free (void *); >>>> + >>>> +void test () >>>> +{ >>>> + free (malloc (10)); >>>> +} >>>> Index: gcc/common.opt >>>> =================================================================== >>>> --- gcc/common.opt (revision 187277) >>>> +++ gcc/common.opt (working copy) >>>> @@ -1474,6 +1474,10 @@ >>>> Common Var(flag_dce) Init(1) Optimization >>>> Use the RTL dead code elimination pass >>>> >>>> +feliminate-malloc >>>> +Common Var(flag_eliminate_malloc) Init(1) Optimization >>>> +Eliminate unnecessary malloc/free pairs >>>> + >>>> fdse >>>> Common Var(flag_dse) Init(1) Optimization >>>> Use the RTL dead store elimination pass >>>> Index: gcc/tree-ssa-dce.c >>>> =================================================================== >>>> --- gcc/tree-ssa-dce.c (revision 187277) >>>> +++ gcc/tree-ssa-dce.c (working copy) >>>> @@ -309,6 +309,8 @@ >>>> case BUILT_IN_CALLOC: >>>> case BUILT_IN_ALLOCA: >>>> case BUILT_IN_ALLOCA_WITH_ALIGN: >>>> + if (!flag_eliminate_malloc) >>>> + mark_stmt_necessary (stmt, true); >>>> return; >>>> >>>> default:;
On Wed, May 9, 2012 at 5:22 PM, Richard Guenther <richard.guenther@gmail.com> wrote: > On Wed, May 9, 2012 at 10:38 AM, Dehao Chen <dehao@google.com> wrote: >> On Wed, May 9, 2012 at 4:12 PM, Richard Guenther >> <richard.guenther@gmail.com> wrote: >>> On Tue, May 8, 2012 at 6:18 PM, Xinliang David Li <davidxl@google.com> wrote: >>>> To be clear, this flag is for malloc implementation (such as tcmalloc) >>>> with side effect unknown to the compiler. Using -fno-builtin-xxx is >>>> too conservative for that purpose. >>> >>> I don't think that flys. Btw, the patch also guards alloca - alloca is purely >>> GCC internal. >>> >>> What's the "unknown side-effects" that are also important to preserve >>> for free(malloc(4))? >> >> Malloc implementation may record some info to a global structure, and >> a program may use this free(malloc()) pair to simulate the real runs >> to get some data, such as peak memory requirement. > > So why not use an alternate interface into this special allocator for this > purpose? There can be the following scenario: We want to add a module to an existing app. Before implementing the module, we want to collect some statistics on real runs. In this scenario, we need: * No change to the legacy code * Optimized build for the simulation run * Provide accurate statistical info We want to collect data for both new module and the legacy code without changing the later, thus we cannot use a new malloc/free interface. Thanks, Dehao > >> Dehao >> >>> >>> Richard. >>> >>>> David >>>> >>>> On Tue, May 8, 2012 at 7:43 AM, Dehao Chen <dehao@google.com> wrote: >>>>> Hello, >>>>> >>>>> This patch adds a flag to guard the optimization that optimize the >>>>> following code away: >>>>> >>>>> free (malloc (4)); >>>>> >>>>> In some cases, we'd like this type of malloc/free pairs to remain in >>>>> the optimized code. >>>>> >>>>> Tested with bootstrap, and no regression in the gcc testsuite. >>>>> >>>>> Is it ok for mainline? >>>>> >>>>> Thanks, >>>>> Dehao >>>>> >>>>> gcc/ChangeLog >>>>> 2012-05-08 Dehao Chen <dehao@google.com> >>>>> >>>>> * common.opt (feliminate-malloc): New. >>>>> * doc/invoke.texi: Document it. >>>>> * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Honor it. >>>>> >>>>> gcc/testsuite/ChangeLog >>>>> 2012-05-08 Dehao Chen <dehao@google.com> >>>>> >>>>> * gcc.dg/free-malloc.c: Check if -fno-eliminate-malloc is working >>>>> as expected. >>>>> >>>>> Index: gcc/doc/invoke.texi >>>>> =================================================================== >>>>> --- gcc/doc/invoke.texi (revision 187277) >>>>> +++ gcc/doc/invoke.texi (working copy) >>>>> @@ -360,7 +360,8 @@ >>>>> -fcx-limited-range @gol >>>>> -fdata-sections -fdce -fdelayed-branch @gol >>>>> -fdelete-null-pointer-checks -fdevirtualize -fdse @gol >>>>> --fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects @gol >>>>> +-fearly-inlining -feliminate-malloc -fipa-sra -fexpensive-optimizations @gol >>>>> +-ffat-lto-objects @gol >>>>> -ffast-math -ffinite-math-only -ffloat-store >>>>> -fexcess-precision=@var{style} @gol >>>>> -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol >>>>> -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol >>>>> @@ -6238,6 +6239,7 @@ >>>>> -fdefer-pop @gol >>>>> -fdelayed-branch @gol >>>>> -fdse @gol >>>>> +-feliminate-malloc @gol >>>>> -fguess-branch-probability @gol >>>>> -fif-conversion2 @gol >>>>> -fif-conversion @gol >>>>> @@ -6762,6 +6764,11 @@ >>>>> Perform dead store elimination (DSE) on RTL@. >>>>> Enabled by default at @option{-O} and higher. >>>>> >>>>> +@item -feliminate-malloc >>>>> +@opindex feliminate-malloc >>>>> +Eliminate unnecessary malloc/free pairs. >>>>> +Enabled by default at @option{-O} and higher. >>>>> + >>>>> @item -fif-conversion >>>>> @opindex fif-conversion >>>>> Attempt to transform conditional jumps into branch-less equivalents. This >>>>> Index: gcc/testsuite/gcc.dg/free-malloc.c >>>>> =================================================================== >>>>> --- gcc/testsuite/gcc.dg/free-malloc.c (revision 0) >>>>> +++ gcc/testsuite/gcc.dg/free-malloc.c (revision 0) >>>>> @@ -0,0 +1,12 @@ >>>>> +/* { dg-do compile } */ >>>>> +/* { dg-options "-O2 -fno-eliminate-malloc" } */ >>>>> +/* { dg-final { scan-assembler-times "malloc" 2} } */ >>>>> +/* { dg-final { scan-assembler-times "free" 2} } */ >>>>> + >>>>> +extern void * malloc (unsigned long); >>>>> +extern void free (void *); >>>>> + >>>>> +void test () >>>>> +{ >>>>> + free (malloc (10)); >>>>> +} >>>>> Index: gcc/common.opt >>>>> =================================================================== >>>>> --- gcc/common.opt (revision 187277) >>>>> +++ gcc/common.opt (working copy) >>>>> @@ -1474,6 +1474,10 @@ >>>>> Common Var(flag_dce) Init(1) Optimization >>>>> Use the RTL dead code elimination pass >>>>> >>>>> +feliminate-malloc >>>>> +Common Var(flag_eliminate_malloc) Init(1) Optimization >>>>> +Eliminate unnecessary malloc/free pairs >>>>> + >>>>> fdse >>>>> Common Var(flag_dse) Init(1) Optimization >>>>> Use the RTL dead store elimination pass >>>>> Index: gcc/tree-ssa-dce.c >>>>> =================================================================== >>>>> --- gcc/tree-ssa-dce.c (revision 187277) >>>>> +++ gcc/tree-ssa-dce.c (working copy) >>>>> @@ -309,6 +309,8 @@ >>>>> case BUILT_IN_CALLOC: >>>>> case BUILT_IN_ALLOCA: >>>>> case BUILT_IN_ALLOCA_WITH_ALIGN: >>>>> + if (!flag_eliminate_malloc) >>>>> + mark_stmt_necessary (stmt, true); >>>>> return; >>>>> >>>>> default:;
On Wed, May 9, 2012 at 11:48 AM, Dehao Chen <dehao@google.com> wrote: > On Wed, May 9, 2012 at 5:22 PM, Richard Guenther > <richard.guenther@gmail.com> wrote: >> On Wed, May 9, 2012 at 10:38 AM, Dehao Chen <dehao@google.com> wrote: >>> On Wed, May 9, 2012 at 4:12 PM, Richard Guenther >>> <richard.guenther@gmail.com> wrote: >>>> On Tue, May 8, 2012 at 6:18 PM, Xinliang David Li <davidxl@google.com> wrote: >>>>> To be clear, this flag is for malloc implementation (such as tcmalloc) >>>>> with side effect unknown to the compiler. Using -fno-builtin-xxx is >>>>> too conservative for that purpose. >>>> >>>> I don't think that flys. Btw, the patch also guards alloca - alloca is purely >>>> GCC internal. >>>> >>>> What's the "unknown side-effects" that are also important to preserve >>>> for free(malloc(4))? >>> >>> Malloc implementation may record some info to a global structure, and >>> a program may use this free(malloc()) pair to simulate the real runs >>> to get some data, such as peak memory requirement. >> >> So why not use an alternate interface into this special allocator for this >> purpose? > > There can be the following scenario: > > We want to add a module to an existing app. Before implementing the > module, we want to collect some statistics on real runs. In this > scenario, we need: > > * No change to the legacy code > * Optimized build for the simulation run > * Provide accurate statistical info > > We want to collect data for both new module and the legacy code > without changing the later, thus we cannot use a new malloc/free > interface. So put in a optimization barrier then. Like free (({ void * x = malloc (4); __asm ("" : "+m" (x)); __x; }); Btw, why can't you simply build the new module (which doesn't something real anyway, just fake stuff) without optimization? > Thanks, > Dehao > >> >>> Dehao >>> >>>> >>>> Richard. >>>> >>>>> David >>>>> >>>>> On Tue, May 8, 2012 at 7:43 AM, Dehao Chen <dehao@google.com> wrote: >>>>>> Hello, >>>>>> >>>>>> This patch adds a flag to guard the optimization that optimize the >>>>>> following code away: >>>>>> >>>>>> free (malloc (4)); >>>>>> >>>>>> In some cases, we'd like this type of malloc/free pairs to remain in >>>>>> the optimized code. >>>>>> >>>>>> Tested with bootstrap, and no regression in the gcc testsuite. >>>>>> >>>>>> Is it ok for mainline? >>>>>> >>>>>> Thanks, >>>>>> Dehao >>>>>> >>>>>> gcc/ChangeLog >>>>>> 2012-05-08 Dehao Chen <dehao@google.com> >>>>>> >>>>>> * common.opt (feliminate-malloc): New. >>>>>> * doc/invoke.texi: Document it. >>>>>> * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Honor it. >>>>>> >>>>>> gcc/testsuite/ChangeLog >>>>>> 2012-05-08 Dehao Chen <dehao@google.com> >>>>>> >>>>>> * gcc.dg/free-malloc.c: Check if -fno-eliminate-malloc is working >>>>>> as expected. >>>>>> >>>>>> Index: gcc/doc/invoke.texi >>>>>> =================================================================== >>>>>> --- gcc/doc/invoke.texi (revision 187277) >>>>>> +++ gcc/doc/invoke.texi (working copy) >>>>>> @@ -360,7 +360,8 @@ >>>>>> -fcx-limited-range @gol >>>>>> -fdata-sections -fdce -fdelayed-branch @gol >>>>>> -fdelete-null-pointer-checks -fdevirtualize -fdse @gol >>>>>> --fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects @gol >>>>>> +-fearly-inlining -feliminate-malloc -fipa-sra -fexpensive-optimizations @gol >>>>>> +-ffat-lto-objects @gol >>>>>> -ffast-math -ffinite-math-only -ffloat-store >>>>>> -fexcess-precision=@var{style} @gol >>>>>> -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol >>>>>> -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol >>>>>> @@ -6238,6 +6239,7 @@ >>>>>> -fdefer-pop @gol >>>>>> -fdelayed-branch @gol >>>>>> -fdse @gol >>>>>> +-feliminate-malloc @gol >>>>>> -fguess-branch-probability @gol >>>>>> -fif-conversion2 @gol >>>>>> -fif-conversion @gol >>>>>> @@ -6762,6 +6764,11 @@ >>>>>> Perform dead store elimination (DSE) on RTL@. >>>>>> Enabled by default at @option{-O} and higher. >>>>>> >>>>>> +@item -feliminate-malloc >>>>>> +@opindex feliminate-malloc >>>>>> +Eliminate unnecessary malloc/free pairs. >>>>>> +Enabled by default at @option{-O} and higher. >>>>>> + >>>>>> @item -fif-conversion >>>>>> @opindex fif-conversion >>>>>> Attempt to transform conditional jumps into branch-less equivalents. This >>>>>> Index: gcc/testsuite/gcc.dg/free-malloc.c >>>>>> =================================================================== >>>>>> --- gcc/testsuite/gcc.dg/free-malloc.c (revision 0) >>>>>> +++ gcc/testsuite/gcc.dg/free-malloc.c (revision 0) >>>>>> @@ -0,0 +1,12 @@ >>>>>> +/* { dg-do compile } */ >>>>>> +/* { dg-options "-O2 -fno-eliminate-malloc" } */ >>>>>> +/* { dg-final { scan-assembler-times "malloc" 2} } */ >>>>>> +/* { dg-final { scan-assembler-times "free" 2} } */ >>>>>> + >>>>>> +extern void * malloc (unsigned long); >>>>>> +extern void free (void *); >>>>>> + >>>>>> +void test () >>>>>> +{ >>>>>> + free (malloc (10)); >>>>>> +} >>>>>> Index: gcc/common.opt >>>>>> =================================================================== >>>>>> --- gcc/common.opt (revision 187277) >>>>>> +++ gcc/common.opt (working copy) >>>>>> @@ -1474,6 +1474,10 @@ >>>>>> Common Var(flag_dce) Init(1) Optimization >>>>>> Use the RTL dead code elimination pass >>>>>> >>>>>> +feliminate-malloc >>>>>> +Common Var(flag_eliminate_malloc) Init(1) Optimization >>>>>> +Eliminate unnecessary malloc/free pairs >>>>>> + >>>>>> fdse >>>>>> Common Var(flag_dse) Init(1) Optimization >>>>>> Use the RTL dead store elimination pass >>>>>> Index: gcc/tree-ssa-dce.c >>>>>> =================================================================== >>>>>> --- gcc/tree-ssa-dce.c (revision 187277) >>>>>> +++ gcc/tree-ssa-dce.c (working copy) >>>>>> @@ -309,6 +309,8 @@ >>>>>> case BUILT_IN_CALLOC: >>>>>> case BUILT_IN_ALLOCA: >>>>>> case BUILT_IN_ALLOCA_WITH_ALIGN: >>>>>> + if (!flag_eliminate_malloc) >>>>>> + mark_stmt_necessary (stmt, true); >>>>>> return; >>>>>> >>>>>> default:;
On Wed, May 09, 2012 at 05:48:25PM +0800, Dehao Chen wrote: > > So why not use an alternate interface into this special allocator for this > > purpose? > > There can be the following scenario: > > We want to add a module to an existing app. Before implementing the > module, we want to collect some statistics on real runs. In this > scenario, we need: > > * No change to the legacy code > * Optimized build for the simulation run > * Provide accurate statistical info > > We want to collect data for both new module and the legacy code > without changing the later, thus we cannot use a new malloc/free > interface. Note that even in the glibc testsuite there had to be workarounds for this (asm barrier was used): http://sources.redhat.com/ml/libc-alpha/2012-05/msg00138.html Jakub
On Wed, May 9, 2012 at 1:12 AM, Richard Guenther <richard.guenther@gmail.com> wrote: > On Tue, May 8, 2012 at 6:18 PM, Xinliang David Li <davidxl@google.com> wrote: >> To be clear, this flag is for malloc implementation (such as tcmalloc) >> with side effect unknown to the compiler. Using -fno-builtin-xxx is >> too conservative for that purpose. > > I don't think that flys. Btw, the patch also guards alloca - alloca is purely > GCC internal. > > What's the "unknown side-effects" that are also important to preserve > for free(malloc(4))? > The side effect is the user registered malloc hooks. The pattern 'free(malloc(4)', or loops like for (..) { malloc(..); // result not used } itself is not interesting. They only appear in the test code and the problem can be worked around by using -fno-builtin-malloc. The option is to avoid disable this and all similar malloc optimizations (in the future) for malloc implementation with hooks. Thanks, David > Richard. > >> David >> >> On Tue, May 8, 2012 at 7:43 AM, Dehao Chen <dehao@google.com> wrote: >>> Hello, >>> >>> This patch adds a flag to guard the optimization that optimize the >>> following code away: >>> >>> free (malloc (4)); >>> >>> In some cases, we'd like this type of malloc/free pairs to remain in >>> the optimized code. >>> >>> Tested with bootstrap, and no regression in the gcc testsuite. >>> >>> Is it ok for mainline? >>> >>> Thanks, >>> Dehao >>> >>> gcc/ChangeLog >>> 2012-05-08 Dehao Chen <dehao@google.com> >>> >>> * common.opt (feliminate-malloc): New. >>> * doc/invoke.texi: Document it. >>> * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Honor it. >>> >>> gcc/testsuite/ChangeLog >>> 2012-05-08 Dehao Chen <dehao@google.com> >>> >>> * gcc.dg/free-malloc.c: Check if -fno-eliminate-malloc is working >>> as expected. >>> >>> Index: gcc/doc/invoke.texi >>> =================================================================== >>> --- gcc/doc/invoke.texi (revision 187277) >>> +++ gcc/doc/invoke.texi (working copy) >>> @@ -360,7 +360,8 @@ >>> -fcx-limited-range @gol >>> -fdata-sections -fdce -fdelayed-branch @gol >>> -fdelete-null-pointer-checks -fdevirtualize -fdse @gol >>> --fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects @gol >>> +-fearly-inlining -feliminate-malloc -fipa-sra -fexpensive-optimizations @gol >>> +-ffat-lto-objects @gol >>> -ffast-math -ffinite-math-only -ffloat-store >>> -fexcess-precision=@var{style} @gol >>> -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol >>> -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol >>> @@ -6238,6 +6239,7 @@ >>> -fdefer-pop @gol >>> -fdelayed-branch @gol >>> -fdse @gol >>> +-feliminate-malloc @gol >>> -fguess-branch-probability @gol >>> -fif-conversion2 @gol >>> -fif-conversion @gol >>> @@ -6762,6 +6764,11 @@ >>> Perform dead store elimination (DSE) on RTL@. >>> Enabled by default at @option{-O} and higher. >>> >>> +@item -feliminate-malloc >>> +@opindex feliminate-malloc >>> +Eliminate unnecessary malloc/free pairs. >>> +Enabled by default at @option{-O} and higher. >>> + >>> @item -fif-conversion >>> @opindex fif-conversion >>> Attempt to transform conditional jumps into branch-less equivalents. This >>> Index: gcc/testsuite/gcc.dg/free-malloc.c >>> =================================================================== >>> --- gcc/testsuite/gcc.dg/free-malloc.c (revision 0) >>> +++ gcc/testsuite/gcc.dg/free-malloc.c (revision 0) >>> @@ -0,0 +1,12 @@ >>> +/* { dg-do compile } */ >>> +/* { dg-options "-O2 -fno-eliminate-malloc" } */ >>> +/* { dg-final { scan-assembler-times "malloc" 2} } */ >>> +/* { dg-final { scan-assembler-times "free" 2} } */ >>> + >>> +extern void * malloc (unsigned long); >>> +extern void free (void *); >>> + >>> +void test () >>> +{ >>> + free (malloc (10)); >>> +} >>> Index: gcc/common.opt >>> =================================================================== >>> --- gcc/common.opt (revision 187277) >>> +++ gcc/common.opt (working copy) >>> @@ -1474,6 +1474,10 @@ >>> Common Var(flag_dce) Init(1) Optimization >>> Use the RTL dead code elimination pass >>> >>> +feliminate-malloc >>> +Common Var(flag_eliminate_malloc) Init(1) Optimization >>> +Eliminate unnecessary malloc/free pairs >>> + >>> fdse >>> Common Var(flag_dse) Init(1) Optimization >>> Use the RTL dead store elimination pass >>> Index: gcc/tree-ssa-dce.c >>> =================================================================== >>> --- gcc/tree-ssa-dce.c (revision 187277) >>> +++ gcc/tree-ssa-dce.c (working copy) >>> @@ -309,6 +309,8 @@ >>> case BUILT_IN_CALLOC: >>> case BUILT_IN_ALLOCA: >>> case BUILT_IN_ALLOCA_WITH_ALIGN: >>> + if (!flag_eliminate_malloc) >>> + mark_stmt_necessary (stmt, true); >>> return; >>> >>> default:;
On Wed, May 9, 2012 at 9:32 AM, Xinliang David Li <davidxl@google.com> wrote: > On Wed, May 9, 2012 at 1:12 AM, Richard Guenther > <richard.guenther@gmail.com> wrote: >> On Tue, May 8, 2012 at 6:18 PM, Xinliang David Li <davidxl@google.com> wrote: >>> To be clear, this flag is for malloc implementation (such as tcmalloc) >>> with side effect unknown to the compiler. Using -fno-builtin-xxx is >>> too conservative for that purpose. >> >> I don't think that flys. Btw, the patch also guards alloca - alloca is purely >> GCC internal. >> >> What's the "unknown side-effects" that are also important to preserve >> for free(malloc(4))? >> > > The side effect is the user registered malloc hooks. IIRC future versions of glibc will have those malloc hooks removed. Because of this problem. Thanks, Andrew Pinski > > The pattern 'free(malloc(4)', or loops like > > for (..) > { > malloc(..); // result not used > } > > itself is not interesting. They only appear in the test code and the > problem can be worked around by using -fno-builtin-malloc. The option > is to avoid disable this and all similar malloc optimizations (in the > future) for malloc implementation with hooks. > > Thanks, > > David > >> Richard. >> >>> David >>> >>> On Tue, May 8, 2012 at 7:43 AM, Dehao Chen <dehao@google.com> wrote: >>>> Hello, >>>> >>>> This patch adds a flag to guard the optimization that optimize the >>>> following code away: >>>> >>>> free (malloc (4)); >>>> >>>> In some cases, we'd like this type of malloc/free pairs to remain in >>>> the optimized code. >>>> >>>> Tested with bootstrap, and no regression in the gcc testsuite. >>>> >>>> Is it ok for mainline? >>>> >>>> Thanks, >>>> Dehao >>>> >>>> gcc/ChangeLog >>>> 2012-05-08 Dehao Chen <dehao@google.com> >>>> >>>> * common.opt (feliminate-malloc): New. >>>> * doc/invoke.texi: Document it. >>>> * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Honor it. >>>> >>>> gcc/testsuite/ChangeLog >>>> 2012-05-08 Dehao Chen <dehao@google.com> >>>> >>>> * gcc.dg/free-malloc.c: Check if -fno-eliminate-malloc is working >>>> as expected. >>>> >>>> Index: gcc/doc/invoke.texi >>>> =================================================================== >>>> --- gcc/doc/invoke.texi (revision 187277) >>>> +++ gcc/doc/invoke.texi (working copy) >>>> @@ -360,7 +360,8 @@ >>>> -fcx-limited-range @gol >>>> -fdata-sections -fdce -fdelayed-branch @gol >>>> -fdelete-null-pointer-checks -fdevirtualize -fdse @gol >>>> --fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects @gol >>>> +-fearly-inlining -feliminate-malloc -fipa-sra -fexpensive-optimizations @gol >>>> +-ffat-lto-objects @gol >>>> -ffast-math -ffinite-math-only -ffloat-store >>>> -fexcess-precision=@var{style} @gol >>>> -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol >>>> -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol >>>> @@ -6238,6 +6239,7 @@ >>>> -fdefer-pop @gol >>>> -fdelayed-branch @gol >>>> -fdse @gol >>>> +-feliminate-malloc @gol >>>> -fguess-branch-probability @gol >>>> -fif-conversion2 @gol >>>> -fif-conversion @gol >>>> @@ -6762,6 +6764,11 @@ >>>> Perform dead store elimination (DSE) on RTL@. >>>> Enabled by default at @option{-O} and higher. >>>> >>>> +@item -feliminate-malloc >>>> +@opindex feliminate-malloc >>>> +Eliminate unnecessary malloc/free pairs. >>>> +Enabled by default at @option{-O} and higher. >>>> + >>>> @item -fif-conversion >>>> @opindex fif-conversion >>>> Attempt to transform conditional jumps into branch-less equivalents. This >>>> Index: gcc/testsuite/gcc.dg/free-malloc.c >>>> =================================================================== >>>> --- gcc/testsuite/gcc.dg/free-malloc.c (revision 0) >>>> +++ gcc/testsuite/gcc.dg/free-malloc.c (revision 0) >>>> @@ -0,0 +1,12 @@ >>>> +/* { dg-do compile } */ >>>> +/* { dg-options "-O2 -fno-eliminate-malloc" } */ >>>> +/* { dg-final { scan-assembler-times "malloc" 2} } */ >>>> +/* { dg-final { scan-assembler-times "free" 2} } */ >>>> + >>>> +extern void * malloc (unsigned long); >>>> +extern void free (void *); >>>> + >>>> +void test () >>>> +{ >>>> + free (malloc (10)); >>>> +} >>>> Index: gcc/common.opt >>>> =================================================================== >>>> --- gcc/common.opt (revision 187277) >>>> +++ gcc/common.opt (working copy) >>>> @@ -1474,6 +1474,10 @@ >>>> Common Var(flag_dce) Init(1) Optimization >>>> Use the RTL dead code elimination pass >>>> >>>> +feliminate-malloc >>>> +Common Var(flag_eliminate_malloc) Init(1) Optimization >>>> +Eliminate unnecessary malloc/free pairs >>>> + >>>> fdse >>>> Common Var(flag_dse) Init(1) Optimization >>>> Use the RTL dead store elimination pass >>>> Index: gcc/tree-ssa-dce.c >>>> =================================================================== >>>> --- gcc/tree-ssa-dce.c (revision 187277) >>>> +++ gcc/tree-ssa-dce.c (working copy) >>>> @@ -309,6 +309,8 @@ >>>> case BUILT_IN_CALLOC: >>>> case BUILT_IN_ALLOCA: >>>> case BUILT_IN_ALLOCA_WITH_ALIGN: >>>> + if (!flag_eliminate_malloc) >>>> + mark_stmt_necessary (stmt, true); >>>> return; >>>> >>>> default:;
On Wed, May 9, 2012 at 6:32 PM, Xinliang David Li <davidxl@google.com> wrote: > On Wed, May 9, 2012 at 1:12 AM, Richard Guenther > <richard.guenther@gmail.com> wrote: >> On Tue, May 8, 2012 at 6:18 PM, Xinliang David Li <davidxl@google.com> wrote: >>> To be clear, this flag is for malloc implementation (such as tcmalloc) >>> with side effect unknown to the compiler. Using -fno-builtin-xxx is >>> too conservative for that purpose. >> >> I don't think that flys. Btw, the patch also guards alloca - alloca is purely >> GCC internal. >> >> What's the "unknown side-effects" that are also important to preserve >> for free(malloc(4))? >> > > The side effect is the user registered malloc hooks. > > The pattern 'free(malloc(4)', or loops like > > for (..) > { > malloc(..); // result not used > } > > itself is not interesting. They only appear in the test code and the > problem can be worked around by using -fno-builtin-malloc. The option > is to avoid disable this and all similar malloc optimizations (in the > future) for malloc implementation with hooks. What is then interesting? The above are all the same as if optimization figured out the storage is not used, no? Richard. > Thanks, > > David > >> Richard. >> >>> David >>> >>> On Tue, May 8, 2012 at 7:43 AM, Dehao Chen <dehao@google.com> wrote: >>>> Hello, >>>> >>>> This patch adds a flag to guard the optimization that optimize the >>>> following code away: >>>> >>>> free (malloc (4)); >>>> >>>> In some cases, we'd like this type of malloc/free pairs to remain in >>>> the optimized code. >>>> >>>> Tested with bootstrap, and no regression in the gcc testsuite. >>>> >>>> Is it ok for mainline? >>>> >>>> Thanks, >>>> Dehao >>>> >>>> gcc/ChangeLog >>>> 2012-05-08 Dehao Chen <dehao@google.com> >>>> >>>> * common.opt (feliminate-malloc): New. >>>> * doc/invoke.texi: Document it. >>>> * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Honor it. >>>> >>>> gcc/testsuite/ChangeLog >>>> 2012-05-08 Dehao Chen <dehao@google.com> >>>> >>>> * gcc.dg/free-malloc.c: Check if -fno-eliminate-malloc is working >>>> as expected. >>>> >>>> Index: gcc/doc/invoke.texi >>>> =================================================================== >>>> --- gcc/doc/invoke.texi (revision 187277) >>>> +++ gcc/doc/invoke.texi (working copy) >>>> @@ -360,7 +360,8 @@ >>>> -fcx-limited-range @gol >>>> -fdata-sections -fdce -fdelayed-branch @gol >>>> -fdelete-null-pointer-checks -fdevirtualize -fdse @gol >>>> --fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects @gol >>>> +-fearly-inlining -feliminate-malloc -fipa-sra -fexpensive-optimizations @gol >>>> +-ffat-lto-objects @gol >>>> -ffast-math -ffinite-math-only -ffloat-store >>>> -fexcess-precision=@var{style} @gol >>>> -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol >>>> -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol >>>> @@ -6238,6 +6239,7 @@ >>>> -fdefer-pop @gol >>>> -fdelayed-branch @gol >>>> -fdse @gol >>>> +-feliminate-malloc @gol >>>> -fguess-branch-probability @gol >>>> -fif-conversion2 @gol >>>> -fif-conversion @gol >>>> @@ -6762,6 +6764,11 @@ >>>> Perform dead store elimination (DSE) on RTL@. >>>> Enabled by default at @option{-O} and higher. >>>> >>>> +@item -feliminate-malloc >>>> +@opindex feliminate-malloc >>>> +Eliminate unnecessary malloc/free pairs. >>>> +Enabled by default at @option{-O} and higher. >>>> + >>>> @item -fif-conversion >>>> @opindex fif-conversion >>>> Attempt to transform conditional jumps into branch-less equivalents. This >>>> Index: gcc/testsuite/gcc.dg/free-malloc.c >>>> =================================================================== >>>> --- gcc/testsuite/gcc.dg/free-malloc.c (revision 0) >>>> +++ gcc/testsuite/gcc.dg/free-malloc.c (revision 0) >>>> @@ -0,0 +1,12 @@ >>>> +/* { dg-do compile } */ >>>> +/* { dg-options "-O2 -fno-eliminate-malloc" } */ >>>> +/* { dg-final { scan-assembler-times "malloc" 2} } */ >>>> +/* { dg-final { scan-assembler-times "free" 2} } */ >>>> + >>>> +extern void * malloc (unsigned long); >>>> +extern void free (void *); >>>> + >>>> +void test () >>>> +{ >>>> + free (malloc (10)); >>>> +} >>>> Index: gcc/common.opt >>>> =================================================================== >>>> --- gcc/common.opt (revision 187277) >>>> +++ gcc/common.opt (working copy) >>>> @@ -1474,6 +1474,10 @@ >>>> Common Var(flag_dce) Init(1) Optimization >>>> Use the RTL dead code elimination pass >>>> >>>> +feliminate-malloc >>>> +Common Var(flag_eliminate_malloc) Init(1) Optimization >>>> +Eliminate unnecessary malloc/free pairs >>>> + >>>> fdse >>>> Common Var(flag_dse) Init(1) Optimization >>>> Use the RTL dead store elimination pass >>>> Index: gcc/tree-ssa-dce.c >>>> =================================================================== >>>> --- gcc/tree-ssa-dce.c (revision 187277) >>>> +++ gcc/tree-ssa-dce.c (working copy) >>>> @@ -309,6 +309,8 @@ >>>> case BUILT_IN_CALLOC: >>>> case BUILT_IN_ALLOCA: >>>> case BUILT_IN_ALLOCA_WITH_ALIGN: >>>> + if (!flag_eliminate_malloc) >>>> + mark_stmt_necessary (stmt, true); >>>> return; >>>> >>>> default:;
On Wed, May 9, 2012 at 11:21 AM, Richard Guenther <richard.guenther@gmail.com> wrote: > On Wed, May 9, 2012 at 6:32 PM, Xinliang David Li <davidxl@google.com> wrote: >> On Wed, May 9, 2012 at 1:12 AM, Richard Guenther >> <richard.guenther@gmail.com> wrote: >>> On Tue, May 8, 2012 at 6:18 PM, Xinliang David Li <davidxl@google.com> wrote: >>>> To be clear, this flag is for malloc implementation (such as tcmalloc) >>>> with side effect unknown to the compiler. Using -fno-builtin-xxx is >>>> too conservative for that purpose. >>> >>> I don't think that flys. Btw, the patch also guards alloca - alloca is purely >>> GCC internal. >>> >>> What's the "unknown side-effects" that are also important to preserve >>> for free(malloc(4))? >>> >> >> The side effect is the user registered malloc hooks. >> >> The pattern 'free(malloc(4)', or loops like >> >> for (..) >> { >> malloc(..); // result not used >> } >> >> itself is not interesting. They only appear in the test code and the >> problem can be worked around by using -fno-builtin-malloc. The option >> is to avoid disable this and all similar malloc optimizations (in the >> future) for malloc implementation with hooks. > > What is then interesting? The above are all the same as if optimization > figured out the storage is not used, no? Compiler may choose to convert malloc call into alloca or coalesce multiple malloc calls -- but this may not always be the best choice given that more advanced locality based heap allocation is possible. Nothing like this exist yet, but the point is that it is convenient to have a way to keep malloc call while keeping its aliasing property. David > > Richard. > >> Thanks, >> >> David >> >>> Richard. >>> >>>> David >>>> >>>> On Tue, May 8, 2012 at 7:43 AM, Dehao Chen <dehao@google.com> wrote: >>>>> Hello, >>>>> >>>>> This patch adds a flag to guard the optimization that optimize the >>>>> following code away: >>>>> >>>>> free (malloc (4)); >>>>> >>>>> In some cases, we'd like this type of malloc/free pairs to remain in >>>>> the optimized code. >>>>> >>>>> Tested with bootstrap, and no regression in the gcc testsuite. >>>>> >>>>> Is it ok for mainline? >>>>> >>>>> Thanks, >>>>> Dehao >>>>> >>>>> gcc/ChangeLog >>>>> 2012-05-08 Dehao Chen <dehao@google.com> >>>>> >>>>> * common.opt (feliminate-malloc): New. >>>>> * doc/invoke.texi: Document it. >>>>> * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Honor it. >>>>> >>>>> gcc/testsuite/ChangeLog >>>>> 2012-05-08 Dehao Chen <dehao@google.com> >>>>> >>>>> * gcc.dg/free-malloc.c: Check if -fno-eliminate-malloc is working >>>>> as expected. >>>>> >>>>> Index: gcc/doc/invoke.texi >>>>> =================================================================== >>>>> --- gcc/doc/invoke.texi (revision 187277) >>>>> +++ gcc/doc/invoke.texi (working copy) >>>>> @@ -360,7 +360,8 @@ >>>>> -fcx-limited-range @gol >>>>> -fdata-sections -fdce -fdelayed-branch @gol >>>>> -fdelete-null-pointer-checks -fdevirtualize -fdse @gol >>>>> --fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects @gol >>>>> +-fearly-inlining -feliminate-malloc -fipa-sra -fexpensive-optimizations @gol >>>>> +-ffat-lto-objects @gol >>>>> -ffast-math -ffinite-math-only -ffloat-store >>>>> -fexcess-precision=@var{style} @gol >>>>> -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol >>>>> -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol >>>>> @@ -6238,6 +6239,7 @@ >>>>> -fdefer-pop @gol >>>>> -fdelayed-branch @gol >>>>> -fdse @gol >>>>> +-feliminate-malloc @gol >>>>> -fguess-branch-probability @gol >>>>> -fif-conversion2 @gol >>>>> -fif-conversion @gol >>>>> @@ -6762,6 +6764,11 @@ >>>>> Perform dead store elimination (DSE) on RTL@. >>>>> Enabled by default at @option{-O} and higher. >>>>> >>>>> +@item -feliminate-malloc >>>>> +@opindex feliminate-malloc >>>>> +Eliminate unnecessary malloc/free pairs. >>>>> +Enabled by default at @option{-O} and higher. >>>>> + >>>>> @item -fif-conversion >>>>> @opindex fif-conversion >>>>> Attempt to transform conditional jumps into branch-less equivalents. This >>>>> Index: gcc/testsuite/gcc.dg/free-malloc.c >>>>> =================================================================== >>>>> --- gcc/testsuite/gcc.dg/free-malloc.c (revision 0) >>>>> +++ gcc/testsuite/gcc.dg/free-malloc.c (revision 0) >>>>> @@ -0,0 +1,12 @@ >>>>> +/* { dg-do compile } */ >>>>> +/* { dg-options "-O2 -fno-eliminate-malloc" } */ >>>>> +/* { dg-final { scan-assembler-times "malloc" 2} } */ >>>>> +/* { dg-final { scan-assembler-times "free" 2} } */ >>>>> + >>>>> +extern void * malloc (unsigned long); >>>>> +extern void free (void *); >>>>> + >>>>> +void test () >>>>> +{ >>>>> + free (malloc (10)); >>>>> +} >>>>> Index: gcc/common.opt >>>>> =================================================================== >>>>> --- gcc/common.opt (revision 187277) >>>>> +++ gcc/common.opt (working copy) >>>>> @@ -1474,6 +1474,10 @@ >>>>> Common Var(flag_dce) Init(1) Optimization >>>>> Use the RTL dead code elimination pass >>>>> >>>>> +feliminate-malloc >>>>> +Common Var(flag_eliminate_malloc) Init(1) Optimization >>>>> +Eliminate unnecessary malloc/free pairs >>>>> + >>>>> fdse >>>>> Common Var(flag_dse) Init(1) Optimization >>>>> Use the RTL dead store elimination pass >>>>> Index: gcc/tree-ssa-dce.c >>>>> =================================================================== >>>>> --- gcc/tree-ssa-dce.c (revision 187277) >>>>> +++ gcc/tree-ssa-dce.c (working copy) >>>>> @@ -309,6 +309,8 @@ >>>>> case BUILT_IN_CALLOC: >>>>> case BUILT_IN_ALLOCA: >>>>> case BUILT_IN_ALLOCA_WITH_ALIGN: >>>>> + if (!flag_eliminate_malloc) >>>>> + mark_stmt_necessary (stmt, true); >>>>> return; >>>>> >>>>> default:;
Index: gcc/doc/invoke.texi =================================================================== --- gcc/doc/invoke.texi (revision 187277) +++ gcc/doc/invoke.texi (working copy) @@ -360,7 +360,8 @@ -fcx-limited-range @gol -fdata-sections -fdce -fdelayed-branch @gol -fdelete-null-pointer-checks -fdevirtualize -fdse @gol --fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects @gol +-fearly-inlining -feliminate-malloc -fipa-sra -fexpensive-optimizations @gol +-ffat-lto-objects @gol -ffast-math -ffinite-math-only -ffloat-store -fexcess-precision=@var{style} @gol -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol @@ -6238,6 +6239,7 @@ -fdefer-pop @gol -fdelayed-branch @gol -fdse @gol +-feliminate-malloc @gol -fguess-branch-probability @gol -fif-conversion2 @gol -fif-conversion @gol @@ -6762,6 +6764,11 @@ Perform dead store elimination (DSE) on RTL@. Enabled by default at @option{-O} and higher. +@item -feliminate-malloc +@opindex feliminate-malloc +Eliminate unnecessary malloc/free pairs. +Enabled by default at @option{-O} and higher. + @item -fif-conversion @opindex fif-conversion Attempt to transform conditional jumps into branch-less equivalents. This Index: gcc/testsuite/gcc.dg/free-malloc.c =================================================================== --- gcc/testsuite/gcc.dg/free-malloc.c (revision 0) +++ gcc/testsuite/gcc.dg/free-malloc.c (revision 0) @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-eliminate-malloc" } */ +/* { dg-final { scan-assembler-times "malloc" 2} } */ +/* { dg-final { scan-assembler-times "free" 2} } */ + +extern void * malloc (unsigned long); +extern void free (void *); + +void test () +{ + free (malloc (10)); +} Index: gcc/common.opt =================================================================== --- gcc/common.opt (revision 187277) +++ gcc/common.opt (working copy) @@ -1474,6 +1474,10 @@ Common Var(flag_dce) Init(1) Optimization Use the RTL dead code elimination pass +feliminate-malloc +Common Var(flag_eliminate_malloc) Init(1) Optimization +Eliminate unnecessary malloc/free pairs + fdse Common Var(flag_dse) Init(1) Optimization Use the RTL dead store elimination pass Index: gcc/tree-ssa-dce.c =================================================================== --- gcc/tree-ssa-dce.c (revision 187277) +++ gcc/tree-ssa-dce.c (working copy) @@ -309,6 +309,8 @@ case BUILT_IN_CALLOC: case BUILT_IN_ALLOCA: case BUILT_IN_ALLOCA_WITH_ALIGN: + if (!flag_eliminate_malloc) + mark_stmt_necessary (stmt, true); return; default:;