Patchwork Add -feliminate-malloc to enable/disable elimination of redundant malloc/free pairs

login
register
mail settings
Submitter Dehao Chen
Date May 8, 2012, 2:43 p.m.
Message ID <CAO2gOZVr=f-Cgz1DUC0=gRcFEGyO+uRd=iDL8SeMfJwVeVia5w@mail.gmail.com>
Download mbox | patch
Permalink /patch/157704/
State New
Headers show

Comments

Dehao Chen - May 8, 2012, 2:43 p.m.
Hello,

This patch adds a flag to guard the optimization that optimize the
following code away:

free (malloc (4));

In some cases, we'd like this type of malloc/free pairs to remain in
the optimized code.

Tested with bootstrap, and no regression in the gcc testsuite.

Is it ok for mainline?

Thanks,
Dehao

gcc/ChangeLog
2012-05-08  Dehao Chen  <dehao@google.com>

	* common.opt (feliminate-malloc): New.
	* doc/invoke.texi: Document it.
	* tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Honor it.

gcc/testsuite/ChangeLog
2012-05-08  Dehao Chen  <dehao@google.com>

	* gcc.dg/free-malloc.c: Check if -fno-eliminate-malloc is working
	as expected.
Xinliang David Li - May 8, 2012, 4:18 p.m.
To be clear, this flag is for malloc implementation (such as tcmalloc)
with side effect unknown to the compiler. Using -fno-builtin-xxx is
too conservative for that purpose.

David

On Tue, May 8, 2012 at 7:43 AM, Dehao Chen <dehao@google.com> wrote:
> Hello,
>
> This patch adds a flag to guard the optimization that optimize the
> following code away:
>
> free (malloc (4));
>
> In some cases, we'd like this type of malloc/free pairs to remain in
> the optimized code.
>
> Tested with bootstrap, and no regression in the gcc testsuite.
>
> Is it ok for mainline?
>
> Thanks,
> Dehao
>
> gcc/ChangeLog
> 2012-05-08  Dehao Chen  <dehao@google.com>
>
>        * common.opt (feliminate-malloc): New.
>        * doc/invoke.texi: Document it.
>        * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Honor it.
>
> gcc/testsuite/ChangeLog
> 2012-05-08  Dehao Chen  <dehao@google.com>
>
>        * gcc.dg/free-malloc.c: Check if -fno-eliminate-malloc is working
>        as expected.
>
> Index: gcc/doc/invoke.texi
> ===================================================================
> --- gcc/doc/invoke.texi (revision 187277)
> +++ gcc/doc/invoke.texi (working copy)
> @@ -360,7 +360,8 @@
>  -fcx-limited-range @gol
>  -fdata-sections -fdce -fdelayed-branch @gol
>  -fdelete-null-pointer-checks -fdevirtualize -fdse @gol
> --fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects @gol
> +-fearly-inlining -feliminate-malloc -fipa-sra -fexpensive-optimizations @gol
> +-ffat-lto-objects @gol
>  -ffast-math -ffinite-math-only -ffloat-store
> -fexcess-precision=@var{style} @gol
>  -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol
>  -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol
> @@ -6238,6 +6239,7 @@
>  -fdefer-pop @gol
>  -fdelayed-branch @gol
>  -fdse @gol
> +-feliminate-malloc @gol
>  -fguess-branch-probability @gol
>  -fif-conversion2 @gol
>  -fif-conversion @gol
> @@ -6762,6 +6764,11 @@
>  Perform dead store elimination (DSE) on RTL@.
>  Enabled by default at @option{-O} and higher.
>
> +@item -feliminate-malloc
> +@opindex feliminate-malloc
> +Eliminate unnecessary malloc/free pairs.
> +Enabled by default at @option{-O} and higher.
> +
>  @item -fif-conversion
>  @opindex fif-conversion
>  Attempt to transform conditional jumps into branch-less equivalents.  This
> Index: gcc/testsuite/gcc.dg/free-malloc.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/free-malloc.c  (revision 0)
> +++ gcc/testsuite/gcc.dg/free-malloc.c  (revision 0)
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-eliminate-malloc" } */
> +/* { dg-final { scan-assembler-times "malloc" 2} } */
> +/* { dg-final { scan-assembler-times "free" 2} } */
> +
> +extern void * malloc (unsigned long);
> +extern void free (void *);
> +
> +void test ()
> +{
> +  free (malloc (10));
> +}
> Index: gcc/common.opt
> ===================================================================
> --- gcc/common.opt      (revision 187277)
> +++ gcc/common.opt      (working copy)
> @@ -1474,6 +1474,10 @@
>  Common Var(flag_dce) Init(1) Optimization
>  Use the RTL dead code elimination pass
>
> +feliminate-malloc
> +Common Var(flag_eliminate_malloc) Init(1) Optimization
> +Eliminate unnecessary malloc/free pairs
> +
>  fdse
>  Common Var(flag_dse) Init(1) Optimization
>  Use the RTL dead store elimination pass
> Index: gcc/tree-ssa-dce.c
> ===================================================================
> --- gcc/tree-ssa-dce.c  (revision 187277)
> +++ gcc/tree-ssa-dce.c  (working copy)
> @@ -309,6 +309,8 @@
>            case BUILT_IN_CALLOC:
>            case BUILT_IN_ALLOCA:
>            case BUILT_IN_ALLOCA_WITH_ALIGN:
> +             if (!flag_eliminate_malloc)
> +               mark_stmt_necessary (stmt, true);
>              return;
>
>            default:;
Richard Guenther - May 9, 2012, 8:12 a.m.
On Tue, May 8, 2012 at 6:18 PM, Xinliang David Li <davidxl@google.com> wrote:
> To be clear, this flag is for malloc implementation (such as tcmalloc)
> with side effect unknown to the compiler. Using -fno-builtin-xxx is
> too conservative for that purpose.

I don't think that flys.  Btw, the patch also guards alloca - alloca is purely
GCC internal.

What's the "unknown side-effects" that are also important to preserve
for free(malloc(4))?

Richard.

> David
>
> On Tue, May 8, 2012 at 7:43 AM, Dehao Chen <dehao@google.com> wrote:
>> Hello,
>>
>> This patch adds a flag to guard the optimization that optimize the
>> following code away:
>>
>> free (malloc (4));
>>
>> In some cases, we'd like this type of malloc/free pairs to remain in
>> the optimized code.
>>
>> Tested with bootstrap, and no regression in the gcc testsuite.
>>
>> Is it ok for mainline?
>>
>> Thanks,
>> Dehao
>>
>> gcc/ChangeLog
>> 2012-05-08  Dehao Chen  <dehao@google.com>
>>
>>        * common.opt (feliminate-malloc): New.
>>        * doc/invoke.texi: Document it.
>>        * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Honor it.
>>
>> gcc/testsuite/ChangeLog
>> 2012-05-08  Dehao Chen  <dehao@google.com>
>>
>>        * gcc.dg/free-malloc.c: Check if -fno-eliminate-malloc is working
>>        as expected.
>>
>> Index: gcc/doc/invoke.texi
>> ===================================================================
>> --- gcc/doc/invoke.texi (revision 187277)
>> +++ gcc/doc/invoke.texi (working copy)
>> @@ -360,7 +360,8 @@
>>  -fcx-limited-range @gol
>>  -fdata-sections -fdce -fdelayed-branch @gol
>>  -fdelete-null-pointer-checks -fdevirtualize -fdse @gol
>> --fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects @gol
>> +-fearly-inlining -feliminate-malloc -fipa-sra -fexpensive-optimizations @gol
>> +-ffat-lto-objects @gol
>>  -ffast-math -ffinite-math-only -ffloat-store
>> -fexcess-precision=@var{style} @gol
>>  -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol
>>  -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol
>> @@ -6238,6 +6239,7 @@
>>  -fdefer-pop @gol
>>  -fdelayed-branch @gol
>>  -fdse @gol
>> +-feliminate-malloc @gol
>>  -fguess-branch-probability @gol
>>  -fif-conversion2 @gol
>>  -fif-conversion @gol
>> @@ -6762,6 +6764,11 @@
>>  Perform dead store elimination (DSE) on RTL@.
>>  Enabled by default at @option{-O} and higher.
>>
>> +@item -feliminate-malloc
>> +@opindex feliminate-malloc
>> +Eliminate unnecessary malloc/free pairs.
>> +Enabled by default at @option{-O} and higher.
>> +
>>  @item -fif-conversion
>>  @opindex fif-conversion
>>  Attempt to transform conditional jumps into branch-less equivalents.  This
>> Index: gcc/testsuite/gcc.dg/free-malloc.c
>> ===================================================================
>> --- gcc/testsuite/gcc.dg/free-malloc.c  (revision 0)
>> +++ gcc/testsuite/gcc.dg/free-malloc.c  (revision 0)
>> @@ -0,0 +1,12 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fno-eliminate-malloc" } */
>> +/* { dg-final { scan-assembler-times "malloc" 2} } */
>> +/* { dg-final { scan-assembler-times "free" 2} } */
>> +
>> +extern void * malloc (unsigned long);
>> +extern void free (void *);
>> +
>> +void test ()
>> +{
>> +  free (malloc (10));
>> +}
>> Index: gcc/common.opt
>> ===================================================================
>> --- gcc/common.opt      (revision 187277)
>> +++ gcc/common.opt      (working copy)
>> @@ -1474,6 +1474,10 @@
>>  Common Var(flag_dce) Init(1) Optimization
>>  Use the RTL dead code elimination pass
>>
>> +feliminate-malloc
>> +Common Var(flag_eliminate_malloc) Init(1) Optimization
>> +Eliminate unnecessary malloc/free pairs
>> +
>>  fdse
>>  Common Var(flag_dse) Init(1) Optimization
>>  Use the RTL dead store elimination pass
>> Index: gcc/tree-ssa-dce.c
>> ===================================================================
>> --- gcc/tree-ssa-dce.c  (revision 187277)
>> +++ gcc/tree-ssa-dce.c  (working copy)
>> @@ -309,6 +309,8 @@
>>            case BUILT_IN_CALLOC:
>>            case BUILT_IN_ALLOCA:
>>            case BUILT_IN_ALLOCA_WITH_ALIGN:
>> +             if (!flag_eliminate_malloc)
>> +               mark_stmt_necessary (stmt, true);
>>              return;
>>
>>            default:;
Dehao Chen - May 9, 2012, 8:38 a.m.
On Wed, May 9, 2012 at 4:12 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Tue, May 8, 2012 at 6:18 PM, Xinliang David Li <davidxl@google.com> wrote:
>> To be clear, this flag is for malloc implementation (such as tcmalloc)
>> with side effect unknown to the compiler. Using -fno-builtin-xxx is
>> too conservative for that purpose.
>
> I don't think that flys.  Btw, the patch also guards alloca - alloca is purely
> GCC internal.
>
> What's the "unknown side-effects" that are also important to preserve
> for free(malloc(4))?

Malloc implementation may record some info to a global structure, and
a program may use this free(malloc()) pair to simulate the real runs
to get some data, such as peak memory requirement.

Dehao

>
> Richard.
>
>> David
>>
>> On Tue, May 8, 2012 at 7:43 AM, Dehao Chen <dehao@google.com> wrote:
>>> Hello,
>>>
>>> This patch adds a flag to guard the optimization that optimize the
>>> following code away:
>>>
>>> free (malloc (4));
>>>
>>> In some cases, we'd like this type of malloc/free pairs to remain in
>>> the optimized code.
>>>
>>> Tested with bootstrap, and no regression in the gcc testsuite.
>>>
>>> Is it ok for mainline?
>>>
>>> Thanks,
>>> Dehao
>>>
>>> gcc/ChangeLog
>>> 2012-05-08  Dehao Chen  <dehao@google.com>
>>>
>>>        * common.opt (feliminate-malloc): New.
>>>        * doc/invoke.texi: Document it.
>>>        * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Honor it.
>>>
>>> gcc/testsuite/ChangeLog
>>> 2012-05-08  Dehao Chen  <dehao@google.com>
>>>
>>>        * gcc.dg/free-malloc.c: Check if -fno-eliminate-malloc is working
>>>        as expected.
>>>
>>> Index: gcc/doc/invoke.texi
>>> ===================================================================
>>> --- gcc/doc/invoke.texi (revision 187277)
>>> +++ gcc/doc/invoke.texi (working copy)
>>> @@ -360,7 +360,8 @@
>>>  -fcx-limited-range @gol
>>>  -fdata-sections -fdce -fdelayed-branch @gol
>>>  -fdelete-null-pointer-checks -fdevirtualize -fdse @gol
>>> --fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects @gol
>>> +-fearly-inlining -feliminate-malloc -fipa-sra -fexpensive-optimizations @gol
>>> +-ffat-lto-objects @gol
>>>  -ffast-math -ffinite-math-only -ffloat-store
>>> -fexcess-precision=@var{style} @gol
>>>  -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol
>>>  -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol
>>> @@ -6238,6 +6239,7 @@
>>>  -fdefer-pop @gol
>>>  -fdelayed-branch @gol
>>>  -fdse @gol
>>> +-feliminate-malloc @gol
>>>  -fguess-branch-probability @gol
>>>  -fif-conversion2 @gol
>>>  -fif-conversion @gol
>>> @@ -6762,6 +6764,11 @@
>>>  Perform dead store elimination (DSE) on RTL@.
>>>  Enabled by default at @option{-O} and higher.
>>>
>>> +@item -feliminate-malloc
>>> +@opindex feliminate-malloc
>>> +Eliminate unnecessary malloc/free pairs.
>>> +Enabled by default at @option{-O} and higher.
>>> +
>>>  @item -fif-conversion
>>>  @opindex fif-conversion
>>>  Attempt to transform conditional jumps into branch-less equivalents.  This
>>> Index: gcc/testsuite/gcc.dg/free-malloc.c
>>> ===================================================================
>>> --- gcc/testsuite/gcc.dg/free-malloc.c  (revision 0)
>>> +++ gcc/testsuite/gcc.dg/free-malloc.c  (revision 0)
>>> @@ -0,0 +1,12 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -fno-eliminate-malloc" } */
>>> +/* { dg-final { scan-assembler-times "malloc" 2} } */
>>> +/* { dg-final { scan-assembler-times "free" 2} } */
>>> +
>>> +extern void * malloc (unsigned long);
>>> +extern void free (void *);
>>> +
>>> +void test ()
>>> +{
>>> +  free (malloc (10));
>>> +}
>>> Index: gcc/common.opt
>>> ===================================================================
>>> --- gcc/common.opt      (revision 187277)
>>> +++ gcc/common.opt      (working copy)
>>> @@ -1474,6 +1474,10 @@
>>>  Common Var(flag_dce) Init(1) Optimization
>>>  Use the RTL dead code elimination pass
>>>
>>> +feliminate-malloc
>>> +Common Var(flag_eliminate_malloc) Init(1) Optimization
>>> +Eliminate unnecessary malloc/free pairs
>>> +
>>>  fdse
>>>  Common Var(flag_dse) Init(1) Optimization
>>>  Use the RTL dead store elimination pass
>>> Index: gcc/tree-ssa-dce.c
>>> ===================================================================
>>> --- gcc/tree-ssa-dce.c  (revision 187277)
>>> +++ gcc/tree-ssa-dce.c  (working copy)
>>> @@ -309,6 +309,8 @@
>>>            case BUILT_IN_CALLOC:
>>>            case BUILT_IN_ALLOCA:
>>>            case BUILT_IN_ALLOCA_WITH_ALIGN:
>>> +             if (!flag_eliminate_malloc)
>>> +               mark_stmt_necessary (stmt, true);
>>>              return;
>>>
>>>            default:;
Richard Guenther - May 9, 2012, 9:22 a.m.
On Wed, May 9, 2012 at 10:38 AM, Dehao Chen <dehao@google.com> wrote:
> On Wed, May 9, 2012 at 4:12 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Tue, May 8, 2012 at 6:18 PM, Xinliang David Li <davidxl@google.com> wrote:
>>> To be clear, this flag is for malloc implementation (such as tcmalloc)
>>> with side effect unknown to the compiler. Using -fno-builtin-xxx is
>>> too conservative for that purpose.
>>
>> I don't think that flys.  Btw, the patch also guards alloca - alloca is purely
>> GCC internal.
>>
>> What's the "unknown side-effects" that are also important to preserve
>> for free(malloc(4))?
>
> Malloc implementation may record some info to a global structure, and
> a program may use this free(malloc()) pair to simulate the real runs
> to get some data, such as peak memory requirement.

So why not use an alternate interface into this special allocator for this
purpose?

> Dehao
>
>>
>> Richard.
>>
>>> David
>>>
>>> On Tue, May 8, 2012 at 7:43 AM, Dehao Chen <dehao@google.com> wrote:
>>>> Hello,
>>>>
>>>> This patch adds a flag to guard the optimization that optimize the
>>>> following code away:
>>>>
>>>> free (malloc (4));
>>>>
>>>> In some cases, we'd like this type of malloc/free pairs to remain in
>>>> the optimized code.
>>>>
>>>> Tested with bootstrap, and no regression in the gcc testsuite.
>>>>
>>>> Is it ok for mainline?
>>>>
>>>> Thanks,
>>>> Dehao
>>>>
>>>> gcc/ChangeLog
>>>> 2012-05-08  Dehao Chen  <dehao@google.com>
>>>>
>>>>        * common.opt (feliminate-malloc): New.
>>>>        * doc/invoke.texi: Document it.
>>>>        * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Honor it.
>>>>
>>>> gcc/testsuite/ChangeLog
>>>> 2012-05-08  Dehao Chen  <dehao@google.com>
>>>>
>>>>        * gcc.dg/free-malloc.c: Check if -fno-eliminate-malloc is working
>>>>        as expected.
>>>>
>>>> Index: gcc/doc/invoke.texi
>>>> ===================================================================
>>>> --- gcc/doc/invoke.texi (revision 187277)
>>>> +++ gcc/doc/invoke.texi (working copy)
>>>> @@ -360,7 +360,8 @@
>>>>  -fcx-limited-range @gol
>>>>  -fdata-sections -fdce -fdelayed-branch @gol
>>>>  -fdelete-null-pointer-checks -fdevirtualize -fdse @gol
>>>> --fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects @gol
>>>> +-fearly-inlining -feliminate-malloc -fipa-sra -fexpensive-optimizations @gol
>>>> +-ffat-lto-objects @gol
>>>>  -ffast-math -ffinite-math-only -ffloat-store
>>>> -fexcess-precision=@var{style} @gol
>>>>  -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol
>>>>  -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol
>>>> @@ -6238,6 +6239,7 @@
>>>>  -fdefer-pop @gol
>>>>  -fdelayed-branch @gol
>>>>  -fdse @gol
>>>> +-feliminate-malloc @gol
>>>>  -fguess-branch-probability @gol
>>>>  -fif-conversion2 @gol
>>>>  -fif-conversion @gol
>>>> @@ -6762,6 +6764,11 @@
>>>>  Perform dead store elimination (DSE) on RTL@.
>>>>  Enabled by default at @option{-O} and higher.
>>>>
>>>> +@item -feliminate-malloc
>>>> +@opindex feliminate-malloc
>>>> +Eliminate unnecessary malloc/free pairs.
>>>> +Enabled by default at @option{-O} and higher.
>>>> +
>>>>  @item -fif-conversion
>>>>  @opindex fif-conversion
>>>>  Attempt to transform conditional jumps into branch-less equivalents.  This
>>>> Index: gcc/testsuite/gcc.dg/free-malloc.c
>>>> ===================================================================
>>>> --- gcc/testsuite/gcc.dg/free-malloc.c  (revision 0)
>>>> +++ gcc/testsuite/gcc.dg/free-malloc.c  (revision 0)
>>>> @@ -0,0 +1,12 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -fno-eliminate-malloc" } */
>>>> +/* { dg-final { scan-assembler-times "malloc" 2} } */
>>>> +/* { dg-final { scan-assembler-times "free" 2} } */
>>>> +
>>>> +extern void * malloc (unsigned long);
>>>> +extern void free (void *);
>>>> +
>>>> +void test ()
>>>> +{
>>>> +  free (malloc (10));
>>>> +}
>>>> Index: gcc/common.opt
>>>> ===================================================================
>>>> --- gcc/common.opt      (revision 187277)
>>>> +++ gcc/common.opt      (working copy)
>>>> @@ -1474,6 +1474,10 @@
>>>>  Common Var(flag_dce) Init(1) Optimization
>>>>  Use the RTL dead code elimination pass
>>>>
>>>> +feliminate-malloc
>>>> +Common Var(flag_eliminate_malloc) Init(1) Optimization
>>>> +Eliminate unnecessary malloc/free pairs
>>>> +
>>>>  fdse
>>>>  Common Var(flag_dse) Init(1) Optimization
>>>>  Use the RTL dead store elimination pass
>>>> Index: gcc/tree-ssa-dce.c
>>>> ===================================================================
>>>> --- gcc/tree-ssa-dce.c  (revision 187277)
>>>> +++ gcc/tree-ssa-dce.c  (working copy)
>>>> @@ -309,6 +309,8 @@
>>>>            case BUILT_IN_CALLOC:
>>>>            case BUILT_IN_ALLOCA:
>>>>            case BUILT_IN_ALLOCA_WITH_ALIGN:
>>>> +             if (!flag_eliminate_malloc)
>>>> +               mark_stmt_necessary (stmt, true);
>>>>              return;
>>>>
>>>>            default:;
Dehao Chen - May 9, 2012, 9:48 a.m.
On Wed, May 9, 2012 at 5:22 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Wed, May 9, 2012 at 10:38 AM, Dehao Chen <dehao@google.com> wrote:
>> On Wed, May 9, 2012 at 4:12 PM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Tue, May 8, 2012 at 6:18 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>> To be clear, this flag is for malloc implementation (such as tcmalloc)
>>>> with side effect unknown to the compiler. Using -fno-builtin-xxx is
>>>> too conservative for that purpose.
>>>
>>> I don't think that flys.  Btw, the patch also guards alloca - alloca is purely
>>> GCC internal.
>>>
>>> What's the "unknown side-effects" that are also important to preserve
>>> for free(malloc(4))?
>>
>> Malloc implementation may record some info to a global structure, and
>> a program may use this free(malloc()) pair to simulate the real runs
>> to get some data, such as peak memory requirement.
>
> So why not use an alternate interface into this special allocator for this
> purpose?

There can be the following scenario:

We want to add a module to an existing app. Before implementing the
module, we want to collect some statistics on real runs. In this
scenario, we need:

* No change to the legacy code
* Optimized build for the simulation run
* Provide accurate statistical info

We want to collect data for both new module and the legacy code
without changing the later, thus we cannot use a new malloc/free
interface.

Thanks,
Dehao

>
>> Dehao
>>
>>>
>>> Richard.
>>>
>>>> David
>>>>
>>>> On Tue, May 8, 2012 at 7:43 AM, Dehao Chen <dehao@google.com> wrote:
>>>>> Hello,
>>>>>
>>>>> This patch adds a flag to guard the optimization that optimize the
>>>>> following code away:
>>>>>
>>>>> free (malloc (4));
>>>>>
>>>>> In some cases, we'd like this type of malloc/free pairs to remain in
>>>>> the optimized code.
>>>>>
>>>>> Tested with bootstrap, and no regression in the gcc testsuite.
>>>>>
>>>>> Is it ok for mainline?
>>>>>
>>>>> Thanks,
>>>>> Dehao
>>>>>
>>>>> gcc/ChangeLog
>>>>> 2012-05-08  Dehao Chen  <dehao@google.com>
>>>>>
>>>>>        * common.opt (feliminate-malloc): New.
>>>>>        * doc/invoke.texi: Document it.
>>>>>        * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Honor it.
>>>>>
>>>>> gcc/testsuite/ChangeLog
>>>>> 2012-05-08  Dehao Chen  <dehao@google.com>
>>>>>
>>>>>        * gcc.dg/free-malloc.c: Check if -fno-eliminate-malloc is working
>>>>>        as expected.
>>>>>
>>>>> Index: gcc/doc/invoke.texi
>>>>> ===================================================================
>>>>> --- gcc/doc/invoke.texi (revision 187277)
>>>>> +++ gcc/doc/invoke.texi (working copy)
>>>>> @@ -360,7 +360,8 @@
>>>>>  -fcx-limited-range @gol
>>>>>  -fdata-sections -fdce -fdelayed-branch @gol
>>>>>  -fdelete-null-pointer-checks -fdevirtualize -fdse @gol
>>>>> --fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects @gol
>>>>> +-fearly-inlining -feliminate-malloc -fipa-sra -fexpensive-optimizations @gol
>>>>> +-ffat-lto-objects @gol
>>>>>  -ffast-math -ffinite-math-only -ffloat-store
>>>>> -fexcess-precision=@var{style} @gol
>>>>>  -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol
>>>>>  -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol
>>>>> @@ -6238,6 +6239,7 @@
>>>>>  -fdefer-pop @gol
>>>>>  -fdelayed-branch @gol
>>>>>  -fdse @gol
>>>>> +-feliminate-malloc @gol
>>>>>  -fguess-branch-probability @gol
>>>>>  -fif-conversion2 @gol
>>>>>  -fif-conversion @gol
>>>>> @@ -6762,6 +6764,11 @@
>>>>>  Perform dead store elimination (DSE) on RTL@.
>>>>>  Enabled by default at @option{-O} and higher.
>>>>>
>>>>> +@item -feliminate-malloc
>>>>> +@opindex feliminate-malloc
>>>>> +Eliminate unnecessary malloc/free pairs.
>>>>> +Enabled by default at @option{-O} and higher.
>>>>> +
>>>>>  @item -fif-conversion
>>>>>  @opindex fif-conversion
>>>>>  Attempt to transform conditional jumps into branch-less equivalents.  This
>>>>> Index: gcc/testsuite/gcc.dg/free-malloc.c
>>>>> ===================================================================
>>>>> --- gcc/testsuite/gcc.dg/free-malloc.c  (revision 0)
>>>>> +++ gcc/testsuite/gcc.dg/free-malloc.c  (revision 0)
>>>>> @@ -0,0 +1,12 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-O2 -fno-eliminate-malloc" } */
>>>>> +/* { dg-final { scan-assembler-times "malloc" 2} } */
>>>>> +/* { dg-final { scan-assembler-times "free" 2} } */
>>>>> +
>>>>> +extern void * malloc (unsigned long);
>>>>> +extern void free (void *);
>>>>> +
>>>>> +void test ()
>>>>> +{
>>>>> +  free (malloc (10));
>>>>> +}
>>>>> Index: gcc/common.opt
>>>>> ===================================================================
>>>>> --- gcc/common.opt      (revision 187277)
>>>>> +++ gcc/common.opt      (working copy)
>>>>> @@ -1474,6 +1474,10 @@
>>>>>  Common Var(flag_dce) Init(1) Optimization
>>>>>  Use the RTL dead code elimination pass
>>>>>
>>>>> +feliminate-malloc
>>>>> +Common Var(flag_eliminate_malloc) Init(1) Optimization
>>>>> +Eliminate unnecessary malloc/free pairs
>>>>> +
>>>>>  fdse
>>>>>  Common Var(flag_dse) Init(1) Optimization
>>>>>  Use the RTL dead store elimination pass
>>>>> Index: gcc/tree-ssa-dce.c
>>>>> ===================================================================
>>>>> --- gcc/tree-ssa-dce.c  (revision 187277)
>>>>> +++ gcc/tree-ssa-dce.c  (working copy)
>>>>> @@ -309,6 +309,8 @@
>>>>>            case BUILT_IN_CALLOC:
>>>>>            case BUILT_IN_ALLOCA:
>>>>>            case BUILT_IN_ALLOCA_WITH_ALIGN:
>>>>> +             if (!flag_eliminate_malloc)
>>>>> +               mark_stmt_necessary (stmt, true);
>>>>>              return;
>>>>>
>>>>>            default:;
Richard Guenther - May 9, 2012, 9:53 a.m.
On Wed, May 9, 2012 at 11:48 AM, Dehao Chen <dehao@google.com> wrote:
> On Wed, May 9, 2012 at 5:22 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Wed, May 9, 2012 at 10:38 AM, Dehao Chen <dehao@google.com> wrote:
>>> On Wed, May 9, 2012 at 4:12 PM, Richard Guenther
>>> <richard.guenther@gmail.com> wrote:
>>>> On Tue, May 8, 2012 at 6:18 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>> To be clear, this flag is for malloc implementation (such as tcmalloc)
>>>>> with side effect unknown to the compiler. Using -fno-builtin-xxx is
>>>>> too conservative for that purpose.
>>>>
>>>> I don't think that flys.  Btw, the patch also guards alloca - alloca is purely
>>>> GCC internal.
>>>>
>>>> What's the "unknown side-effects" that are also important to preserve
>>>> for free(malloc(4))?
>>>
>>> Malloc implementation may record some info to a global structure, and
>>> a program may use this free(malloc()) pair to simulate the real runs
>>> to get some data, such as peak memory requirement.
>>
>> So why not use an alternate interface into this special allocator for this
>> purpose?
>
> There can be the following scenario:
>
> We want to add a module to an existing app. Before implementing the
> module, we want to collect some statistics on real runs. In this
> scenario, we need:
>
> * No change to the legacy code
> * Optimized build for the simulation run
> * Provide accurate statistical info
>
> We want to collect data for both new module and the legacy code
> without changing the later, thus we cannot use a new malloc/free
> interface.

So put in a optimization barrier then.  Like

free (({ void * x = malloc (4); __asm ("" : "+m" (x)); __x; });

Btw, why can't you simply build the new module (which doesn't something
real anyway, just fake stuff) without optimization?

> Thanks,
> Dehao
>
>>
>>> Dehao
>>>
>>>>
>>>> Richard.
>>>>
>>>>> David
>>>>>
>>>>> On Tue, May 8, 2012 at 7:43 AM, Dehao Chen <dehao@google.com> wrote:
>>>>>> Hello,
>>>>>>
>>>>>> This patch adds a flag to guard the optimization that optimize the
>>>>>> following code away:
>>>>>>
>>>>>> free (malloc (4));
>>>>>>
>>>>>> In some cases, we'd like this type of malloc/free pairs to remain in
>>>>>> the optimized code.
>>>>>>
>>>>>> Tested with bootstrap, and no regression in the gcc testsuite.
>>>>>>
>>>>>> Is it ok for mainline?
>>>>>>
>>>>>> Thanks,
>>>>>> Dehao
>>>>>>
>>>>>> gcc/ChangeLog
>>>>>> 2012-05-08  Dehao Chen  <dehao@google.com>
>>>>>>
>>>>>>        * common.opt (feliminate-malloc): New.
>>>>>>        * doc/invoke.texi: Document it.
>>>>>>        * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Honor it.
>>>>>>
>>>>>> gcc/testsuite/ChangeLog
>>>>>> 2012-05-08  Dehao Chen  <dehao@google.com>
>>>>>>
>>>>>>        * gcc.dg/free-malloc.c: Check if -fno-eliminate-malloc is working
>>>>>>        as expected.
>>>>>>
>>>>>> Index: gcc/doc/invoke.texi
>>>>>> ===================================================================
>>>>>> --- gcc/doc/invoke.texi (revision 187277)
>>>>>> +++ gcc/doc/invoke.texi (working copy)
>>>>>> @@ -360,7 +360,8 @@
>>>>>>  -fcx-limited-range @gol
>>>>>>  -fdata-sections -fdce -fdelayed-branch @gol
>>>>>>  -fdelete-null-pointer-checks -fdevirtualize -fdse @gol
>>>>>> --fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects @gol
>>>>>> +-fearly-inlining -feliminate-malloc -fipa-sra -fexpensive-optimizations @gol
>>>>>> +-ffat-lto-objects @gol
>>>>>>  -ffast-math -ffinite-math-only -ffloat-store
>>>>>> -fexcess-precision=@var{style} @gol
>>>>>>  -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol
>>>>>>  -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol
>>>>>> @@ -6238,6 +6239,7 @@
>>>>>>  -fdefer-pop @gol
>>>>>>  -fdelayed-branch @gol
>>>>>>  -fdse @gol
>>>>>> +-feliminate-malloc @gol
>>>>>>  -fguess-branch-probability @gol
>>>>>>  -fif-conversion2 @gol
>>>>>>  -fif-conversion @gol
>>>>>> @@ -6762,6 +6764,11 @@
>>>>>>  Perform dead store elimination (DSE) on RTL@.
>>>>>>  Enabled by default at @option{-O} and higher.
>>>>>>
>>>>>> +@item -feliminate-malloc
>>>>>> +@opindex feliminate-malloc
>>>>>> +Eliminate unnecessary malloc/free pairs.
>>>>>> +Enabled by default at @option{-O} and higher.
>>>>>> +
>>>>>>  @item -fif-conversion
>>>>>>  @opindex fif-conversion
>>>>>>  Attempt to transform conditional jumps into branch-less equivalents.  This
>>>>>> Index: gcc/testsuite/gcc.dg/free-malloc.c
>>>>>> ===================================================================
>>>>>> --- gcc/testsuite/gcc.dg/free-malloc.c  (revision 0)
>>>>>> +++ gcc/testsuite/gcc.dg/free-malloc.c  (revision 0)
>>>>>> @@ -0,0 +1,12 @@
>>>>>> +/* { dg-do compile } */
>>>>>> +/* { dg-options "-O2 -fno-eliminate-malloc" } */
>>>>>> +/* { dg-final { scan-assembler-times "malloc" 2} } */
>>>>>> +/* { dg-final { scan-assembler-times "free" 2} } */
>>>>>> +
>>>>>> +extern void * malloc (unsigned long);
>>>>>> +extern void free (void *);
>>>>>> +
>>>>>> +void test ()
>>>>>> +{
>>>>>> +  free (malloc (10));
>>>>>> +}
>>>>>> Index: gcc/common.opt
>>>>>> ===================================================================
>>>>>> --- gcc/common.opt      (revision 187277)
>>>>>> +++ gcc/common.opt      (working copy)
>>>>>> @@ -1474,6 +1474,10 @@
>>>>>>  Common Var(flag_dce) Init(1) Optimization
>>>>>>  Use the RTL dead code elimination pass
>>>>>>
>>>>>> +feliminate-malloc
>>>>>> +Common Var(flag_eliminate_malloc) Init(1) Optimization
>>>>>> +Eliminate unnecessary malloc/free pairs
>>>>>> +
>>>>>>  fdse
>>>>>>  Common Var(flag_dse) Init(1) Optimization
>>>>>>  Use the RTL dead store elimination pass
>>>>>> Index: gcc/tree-ssa-dce.c
>>>>>> ===================================================================
>>>>>> --- gcc/tree-ssa-dce.c  (revision 187277)
>>>>>> +++ gcc/tree-ssa-dce.c  (working copy)
>>>>>> @@ -309,6 +309,8 @@
>>>>>>            case BUILT_IN_CALLOC:
>>>>>>            case BUILT_IN_ALLOCA:
>>>>>>            case BUILT_IN_ALLOCA_WITH_ALIGN:
>>>>>> +             if (!flag_eliminate_malloc)
>>>>>> +               mark_stmt_necessary (stmt, true);
>>>>>>              return;
>>>>>>
>>>>>>            default:;
Jakub Jelinek - May 9, 2012, 9:55 a.m.
On Wed, May 09, 2012 at 05:48:25PM +0800, Dehao Chen wrote:
> > So why not use an alternate interface into this special allocator for this
> > purpose?
> 
> There can be the following scenario:
> 
> We want to add a module to an existing app. Before implementing the
> module, we want to collect some statistics on real runs. In this
> scenario, we need:
> 
> * No change to the legacy code
> * Optimized build for the simulation run
> * Provide accurate statistical info
> 
> We want to collect data for both new module and the legacy code
> without changing the later, thus we cannot use a new malloc/free
> interface.

Note that even in the glibc testsuite there had to be workarounds for this
(asm barrier was used):
http://sources.redhat.com/ml/libc-alpha/2012-05/msg00138.html

	Jakub
Xinliang David Li - May 9, 2012, 4:32 p.m.
On Wed, May 9, 2012 at 1:12 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Tue, May 8, 2012 at 6:18 PM, Xinliang David Li <davidxl@google.com> wrote:
>> To be clear, this flag is for malloc implementation (such as tcmalloc)
>> with side effect unknown to the compiler. Using -fno-builtin-xxx is
>> too conservative for that purpose.
>
> I don't think that flys.  Btw, the patch also guards alloca - alloca is purely
> GCC internal.
>
> What's the "unknown side-effects" that are also important to preserve
> for free(malloc(4))?
>

The side effect is the user registered malloc hooks.

The pattern 'free(malloc(4)', or loops like

for (..)
 {
    malloc(..); // result not used
 }

itself is not interesting. They only appear in the test code and the
problem can be worked around by using -fno-builtin-malloc. The option
is to avoid disable this and all similar malloc optimizations (in the
future) for malloc implementation with hooks.

Thanks,

David

> Richard.
>
>> David
>>
>> On Tue, May 8, 2012 at 7:43 AM, Dehao Chen <dehao@google.com> wrote:
>>> Hello,
>>>
>>> This patch adds a flag to guard the optimization that optimize the
>>> following code away:
>>>
>>> free (malloc (4));
>>>
>>> In some cases, we'd like this type of malloc/free pairs to remain in
>>> the optimized code.
>>>
>>> Tested with bootstrap, and no regression in the gcc testsuite.
>>>
>>> Is it ok for mainline?
>>>
>>> Thanks,
>>> Dehao
>>>
>>> gcc/ChangeLog
>>> 2012-05-08  Dehao Chen  <dehao@google.com>
>>>
>>>        * common.opt (feliminate-malloc): New.
>>>        * doc/invoke.texi: Document it.
>>>        * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Honor it.
>>>
>>> gcc/testsuite/ChangeLog
>>> 2012-05-08  Dehao Chen  <dehao@google.com>
>>>
>>>        * gcc.dg/free-malloc.c: Check if -fno-eliminate-malloc is working
>>>        as expected.
>>>
>>> Index: gcc/doc/invoke.texi
>>> ===================================================================
>>> --- gcc/doc/invoke.texi (revision 187277)
>>> +++ gcc/doc/invoke.texi (working copy)
>>> @@ -360,7 +360,8 @@
>>>  -fcx-limited-range @gol
>>>  -fdata-sections -fdce -fdelayed-branch @gol
>>>  -fdelete-null-pointer-checks -fdevirtualize -fdse @gol
>>> --fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects @gol
>>> +-fearly-inlining -feliminate-malloc -fipa-sra -fexpensive-optimizations @gol
>>> +-ffat-lto-objects @gol
>>>  -ffast-math -ffinite-math-only -ffloat-store
>>> -fexcess-precision=@var{style} @gol
>>>  -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol
>>>  -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol
>>> @@ -6238,6 +6239,7 @@
>>>  -fdefer-pop @gol
>>>  -fdelayed-branch @gol
>>>  -fdse @gol
>>> +-feliminate-malloc @gol
>>>  -fguess-branch-probability @gol
>>>  -fif-conversion2 @gol
>>>  -fif-conversion @gol
>>> @@ -6762,6 +6764,11 @@
>>>  Perform dead store elimination (DSE) on RTL@.
>>>  Enabled by default at @option{-O} and higher.
>>>
>>> +@item -feliminate-malloc
>>> +@opindex feliminate-malloc
>>> +Eliminate unnecessary malloc/free pairs.
>>> +Enabled by default at @option{-O} and higher.
>>> +
>>>  @item -fif-conversion
>>>  @opindex fif-conversion
>>>  Attempt to transform conditional jumps into branch-less equivalents.  This
>>> Index: gcc/testsuite/gcc.dg/free-malloc.c
>>> ===================================================================
>>> --- gcc/testsuite/gcc.dg/free-malloc.c  (revision 0)
>>> +++ gcc/testsuite/gcc.dg/free-malloc.c  (revision 0)
>>> @@ -0,0 +1,12 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -fno-eliminate-malloc" } */
>>> +/* { dg-final { scan-assembler-times "malloc" 2} } */
>>> +/* { dg-final { scan-assembler-times "free" 2} } */
>>> +
>>> +extern void * malloc (unsigned long);
>>> +extern void free (void *);
>>> +
>>> +void test ()
>>> +{
>>> +  free (malloc (10));
>>> +}
>>> Index: gcc/common.opt
>>> ===================================================================
>>> --- gcc/common.opt      (revision 187277)
>>> +++ gcc/common.opt      (working copy)
>>> @@ -1474,6 +1474,10 @@
>>>  Common Var(flag_dce) Init(1) Optimization
>>>  Use the RTL dead code elimination pass
>>>
>>> +feliminate-malloc
>>> +Common Var(flag_eliminate_malloc) Init(1) Optimization
>>> +Eliminate unnecessary malloc/free pairs
>>> +
>>>  fdse
>>>  Common Var(flag_dse) Init(1) Optimization
>>>  Use the RTL dead store elimination pass
>>> Index: gcc/tree-ssa-dce.c
>>> ===================================================================
>>> --- gcc/tree-ssa-dce.c  (revision 187277)
>>> +++ gcc/tree-ssa-dce.c  (working copy)
>>> @@ -309,6 +309,8 @@
>>>            case BUILT_IN_CALLOC:
>>>            case BUILT_IN_ALLOCA:
>>>            case BUILT_IN_ALLOCA_WITH_ALIGN:
>>> +             if (!flag_eliminate_malloc)
>>> +               mark_stmt_necessary (stmt, true);
>>>              return;
>>>
>>>            default:;
Andrew Pinski - May 9, 2012, 4:35 p.m.
On Wed, May 9, 2012 at 9:32 AM, Xinliang David Li <davidxl@google.com> wrote:
> On Wed, May 9, 2012 at 1:12 AM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Tue, May 8, 2012 at 6:18 PM, Xinliang David Li <davidxl@google.com> wrote:
>>> To be clear, this flag is for malloc implementation (such as tcmalloc)
>>> with side effect unknown to the compiler. Using -fno-builtin-xxx is
>>> too conservative for that purpose.
>>
>> I don't think that flys.  Btw, the patch also guards alloca - alloca is purely
>> GCC internal.
>>
>> What's the "unknown side-effects" that are also important to preserve
>> for free(malloc(4))?
>>
>
> The side effect is the user registered malloc hooks.


IIRC future versions of glibc will have those malloc hooks removed.
Because of this problem.

Thanks,
Andrew Pinski


>
> The pattern 'free(malloc(4)', or loops like
>
> for (..)
>  {
>    malloc(..); // result not used
>  }
>
> itself is not interesting. They only appear in the test code and the
> problem can be worked around by using -fno-builtin-malloc. The option
> is to avoid disable this and all similar malloc optimizations (in the
> future) for malloc implementation with hooks.
>
> Thanks,
>
> David
>
>> Richard.
>>
>>> David
>>>
>>> On Tue, May 8, 2012 at 7:43 AM, Dehao Chen <dehao@google.com> wrote:
>>>> Hello,
>>>>
>>>> This patch adds a flag to guard the optimization that optimize the
>>>> following code away:
>>>>
>>>> free (malloc (4));
>>>>
>>>> In some cases, we'd like this type of malloc/free pairs to remain in
>>>> the optimized code.
>>>>
>>>> Tested with bootstrap, and no regression in the gcc testsuite.
>>>>
>>>> Is it ok for mainline?
>>>>
>>>> Thanks,
>>>> Dehao
>>>>
>>>> gcc/ChangeLog
>>>> 2012-05-08  Dehao Chen  <dehao@google.com>
>>>>
>>>>        * common.opt (feliminate-malloc): New.
>>>>        * doc/invoke.texi: Document it.
>>>>        * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Honor it.
>>>>
>>>> gcc/testsuite/ChangeLog
>>>> 2012-05-08  Dehao Chen  <dehao@google.com>
>>>>
>>>>        * gcc.dg/free-malloc.c: Check if -fno-eliminate-malloc is working
>>>>        as expected.
>>>>
>>>> Index: gcc/doc/invoke.texi
>>>> ===================================================================
>>>> --- gcc/doc/invoke.texi (revision 187277)
>>>> +++ gcc/doc/invoke.texi (working copy)
>>>> @@ -360,7 +360,8 @@
>>>>  -fcx-limited-range @gol
>>>>  -fdata-sections -fdce -fdelayed-branch @gol
>>>>  -fdelete-null-pointer-checks -fdevirtualize -fdse @gol
>>>> --fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects @gol
>>>> +-fearly-inlining -feliminate-malloc -fipa-sra -fexpensive-optimizations @gol
>>>> +-ffat-lto-objects @gol
>>>>  -ffast-math -ffinite-math-only -ffloat-store
>>>> -fexcess-precision=@var{style} @gol
>>>>  -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol
>>>>  -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol
>>>> @@ -6238,6 +6239,7 @@
>>>>  -fdefer-pop @gol
>>>>  -fdelayed-branch @gol
>>>>  -fdse @gol
>>>> +-feliminate-malloc @gol
>>>>  -fguess-branch-probability @gol
>>>>  -fif-conversion2 @gol
>>>>  -fif-conversion @gol
>>>> @@ -6762,6 +6764,11 @@
>>>>  Perform dead store elimination (DSE) on RTL@.
>>>>  Enabled by default at @option{-O} and higher.
>>>>
>>>> +@item -feliminate-malloc
>>>> +@opindex feliminate-malloc
>>>> +Eliminate unnecessary malloc/free pairs.
>>>> +Enabled by default at @option{-O} and higher.
>>>> +
>>>>  @item -fif-conversion
>>>>  @opindex fif-conversion
>>>>  Attempt to transform conditional jumps into branch-less equivalents.  This
>>>> Index: gcc/testsuite/gcc.dg/free-malloc.c
>>>> ===================================================================
>>>> --- gcc/testsuite/gcc.dg/free-malloc.c  (revision 0)
>>>> +++ gcc/testsuite/gcc.dg/free-malloc.c  (revision 0)
>>>> @@ -0,0 +1,12 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -fno-eliminate-malloc" } */
>>>> +/* { dg-final { scan-assembler-times "malloc" 2} } */
>>>> +/* { dg-final { scan-assembler-times "free" 2} } */
>>>> +
>>>> +extern void * malloc (unsigned long);
>>>> +extern void free (void *);
>>>> +
>>>> +void test ()
>>>> +{
>>>> +  free (malloc (10));
>>>> +}
>>>> Index: gcc/common.opt
>>>> ===================================================================
>>>> --- gcc/common.opt      (revision 187277)
>>>> +++ gcc/common.opt      (working copy)
>>>> @@ -1474,6 +1474,10 @@
>>>>  Common Var(flag_dce) Init(1) Optimization
>>>>  Use the RTL dead code elimination pass
>>>>
>>>> +feliminate-malloc
>>>> +Common Var(flag_eliminate_malloc) Init(1) Optimization
>>>> +Eliminate unnecessary malloc/free pairs
>>>> +
>>>>  fdse
>>>>  Common Var(flag_dse) Init(1) Optimization
>>>>  Use the RTL dead store elimination pass
>>>> Index: gcc/tree-ssa-dce.c
>>>> ===================================================================
>>>> --- gcc/tree-ssa-dce.c  (revision 187277)
>>>> +++ gcc/tree-ssa-dce.c  (working copy)
>>>> @@ -309,6 +309,8 @@
>>>>            case BUILT_IN_CALLOC:
>>>>            case BUILT_IN_ALLOCA:
>>>>            case BUILT_IN_ALLOCA_WITH_ALIGN:
>>>> +             if (!flag_eliminate_malloc)
>>>> +               mark_stmt_necessary (stmt, true);
>>>>              return;
>>>>
>>>>            default:;
Richard Guenther - May 9, 2012, 6:21 p.m.
On Wed, May 9, 2012 at 6:32 PM, Xinliang David Li <davidxl@google.com> wrote:
> On Wed, May 9, 2012 at 1:12 AM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Tue, May 8, 2012 at 6:18 PM, Xinliang David Li <davidxl@google.com> wrote:
>>> To be clear, this flag is for malloc implementation (such as tcmalloc)
>>> with side effect unknown to the compiler. Using -fno-builtin-xxx is
>>> too conservative for that purpose.
>>
>> I don't think that flys.  Btw, the patch also guards alloca - alloca is purely
>> GCC internal.
>>
>> What's the "unknown side-effects" that are also important to preserve
>> for free(malloc(4))?
>>
>
> The side effect is the user registered malloc hooks.
>
> The pattern 'free(malloc(4)', or loops like
>
> for (..)
>  {
>    malloc(..); // result not used
>  }
>
> itself is not interesting. They only appear in the test code and the
> problem can be worked around by using -fno-builtin-malloc. The option
> is to avoid disable this and all similar malloc optimizations (in the
> future) for malloc implementation with hooks.

What is then interesting?  The above are all the same as if optimization
figured out the storage is not used, no?

Richard.

> Thanks,
>
> David
>
>> Richard.
>>
>>> David
>>>
>>> On Tue, May 8, 2012 at 7:43 AM, Dehao Chen <dehao@google.com> wrote:
>>>> Hello,
>>>>
>>>> This patch adds a flag to guard the optimization that optimize the
>>>> following code away:
>>>>
>>>> free (malloc (4));
>>>>
>>>> In some cases, we'd like this type of malloc/free pairs to remain in
>>>> the optimized code.
>>>>
>>>> Tested with bootstrap, and no regression in the gcc testsuite.
>>>>
>>>> Is it ok for mainline?
>>>>
>>>> Thanks,
>>>> Dehao
>>>>
>>>> gcc/ChangeLog
>>>> 2012-05-08  Dehao Chen  <dehao@google.com>
>>>>
>>>>        * common.opt (feliminate-malloc): New.
>>>>        * doc/invoke.texi: Document it.
>>>>        * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Honor it.
>>>>
>>>> gcc/testsuite/ChangeLog
>>>> 2012-05-08  Dehao Chen  <dehao@google.com>
>>>>
>>>>        * gcc.dg/free-malloc.c: Check if -fno-eliminate-malloc is working
>>>>        as expected.
>>>>
>>>> Index: gcc/doc/invoke.texi
>>>> ===================================================================
>>>> --- gcc/doc/invoke.texi (revision 187277)
>>>> +++ gcc/doc/invoke.texi (working copy)
>>>> @@ -360,7 +360,8 @@
>>>>  -fcx-limited-range @gol
>>>>  -fdata-sections -fdce -fdelayed-branch @gol
>>>>  -fdelete-null-pointer-checks -fdevirtualize -fdse @gol
>>>> --fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects @gol
>>>> +-fearly-inlining -feliminate-malloc -fipa-sra -fexpensive-optimizations @gol
>>>> +-ffat-lto-objects @gol
>>>>  -ffast-math -ffinite-math-only -ffloat-store
>>>> -fexcess-precision=@var{style} @gol
>>>>  -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol
>>>>  -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol
>>>> @@ -6238,6 +6239,7 @@
>>>>  -fdefer-pop @gol
>>>>  -fdelayed-branch @gol
>>>>  -fdse @gol
>>>> +-feliminate-malloc @gol
>>>>  -fguess-branch-probability @gol
>>>>  -fif-conversion2 @gol
>>>>  -fif-conversion @gol
>>>> @@ -6762,6 +6764,11 @@
>>>>  Perform dead store elimination (DSE) on RTL@.
>>>>  Enabled by default at @option{-O} and higher.
>>>>
>>>> +@item -feliminate-malloc
>>>> +@opindex feliminate-malloc
>>>> +Eliminate unnecessary malloc/free pairs.
>>>> +Enabled by default at @option{-O} and higher.
>>>> +
>>>>  @item -fif-conversion
>>>>  @opindex fif-conversion
>>>>  Attempt to transform conditional jumps into branch-less equivalents.  This
>>>> Index: gcc/testsuite/gcc.dg/free-malloc.c
>>>> ===================================================================
>>>> --- gcc/testsuite/gcc.dg/free-malloc.c  (revision 0)
>>>> +++ gcc/testsuite/gcc.dg/free-malloc.c  (revision 0)
>>>> @@ -0,0 +1,12 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -fno-eliminate-malloc" } */
>>>> +/* { dg-final { scan-assembler-times "malloc" 2} } */
>>>> +/* { dg-final { scan-assembler-times "free" 2} } */
>>>> +
>>>> +extern void * malloc (unsigned long);
>>>> +extern void free (void *);
>>>> +
>>>> +void test ()
>>>> +{
>>>> +  free (malloc (10));
>>>> +}
>>>> Index: gcc/common.opt
>>>> ===================================================================
>>>> --- gcc/common.opt      (revision 187277)
>>>> +++ gcc/common.opt      (working copy)
>>>> @@ -1474,6 +1474,10 @@
>>>>  Common Var(flag_dce) Init(1) Optimization
>>>>  Use the RTL dead code elimination pass
>>>>
>>>> +feliminate-malloc
>>>> +Common Var(flag_eliminate_malloc) Init(1) Optimization
>>>> +Eliminate unnecessary malloc/free pairs
>>>> +
>>>>  fdse
>>>>  Common Var(flag_dse) Init(1) Optimization
>>>>  Use the RTL dead store elimination pass
>>>> Index: gcc/tree-ssa-dce.c
>>>> ===================================================================
>>>> --- gcc/tree-ssa-dce.c  (revision 187277)
>>>> +++ gcc/tree-ssa-dce.c  (working copy)
>>>> @@ -309,6 +309,8 @@
>>>>            case BUILT_IN_CALLOC:
>>>>            case BUILT_IN_ALLOCA:
>>>>            case BUILT_IN_ALLOCA_WITH_ALIGN:
>>>> +             if (!flag_eliminate_malloc)
>>>> +               mark_stmt_necessary (stmt, true);
>>>>              return;
>>>>
>>>>            default:;
Xinliang David Li - May 9, 2012, 10:46 p.m.
On Wed, May 9, 2012 at 11:21 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Wed, May 9, 2012 at 6:32 PM, Xinliang David Li <davidxl@google.com> wrote:
>> On Wed, May 9, 2012 at 1:12 AM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Tue, May 8, 2012 at 6:18 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>> To be clear, this flag is for malloc implementation (such as tcmalloc)
>>>> with side effect unknown to the compiler. Using -fno-builtin-xxx is
>>>> too conservative for that purpose.
>>>
>>> I don't think that flys.  Btw, the patch also guards alloca - alloca is purely
>>> GCC internal.
>>>
>>> What's the "unknown side-effects" that are also important to preserve
>>> for free(malloc(4))?
>>>
>>
>> The side effect is the user registered malloc hooks.
>>
>> The pattern 'free(malloc(4)', or loops like
>>
>> for (..)
>>  {
>>    malloc(..); // result not used
>>  }
>>
>> itself is not interesting. They only appear in the test code and the
>> problem can be worked around by using -fno-builtin-malloc. The option
>> is to avoid disable this and all similar malloc optimizations (in the
>> future) for malloc implementation with hooks.
>
> What is then interesting?  The above are all the same as if optimization
> figured out the storage is not used, no?

Compiler may choose to convert malloc call into alloca or coalesce
multiple malloc calls -- but this may not always be the best choice
given that more advanced locality based heap allocation is possible.
Nothing like this exist yet, but the point is that it is convenient to
have a way to keep malloc call while keeping its aliasing property.

David

>
> Richard.
>
>> Thanks,
>>
>> David
>>
>>> Richard.
>>>
>>>> David
>>>>
>>>> On Tue, May 8, 2012 at 7:43 AM, Dehao Chen <dehao@google.com> wrote:
>>>>> Hello,
>>>>>
>>>>> This patch adds a flag to guard the optimization that optimize the
>>>>> following code away:
>>>>>
>>>>> free (malloc (4));
>>>>>
>>>>> In some cases, we'd like this type of malloc/free pairs to remain in
>>>>> the optimized code.
>>>>>
>>>>> Tested with bootstrap, and no regression in the gcc testsuite.
>>>>>
>>>>> Is it ok for mainline?
>>>>>
>>>>> Thanks,
>>>>> Dehao
>>>>>
>>>>> gcc/ChangeLog
>>>>> 2012-05-08  Dehao Chen  <dehao@google.com>
>>>>>
>>>>>        * common.opt (feliminate-malloc): New.
>>>>>        * doc/invoke.texi: Document it.
>>>>>        * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Honor it.
>>>>>
>>>>> gcc/testsuite/ChangeLog
>>>>> 2012-05-08  Dehao Chen  <dehao@google.com>
>>>>>
>>>>>        * gcc.dg/free-malloc.c: Check if -fno-eliminate-malloc is working
>>>>>        as expected.
>>>>>
>>>>> Index: gcc/doc/invoke.texi
>>>>> ===================================================================
>>>>> --- gcc/doc/invoke.texi (revision 187277)
>>>>> +++ gcc/doc/invoke.texi (working copy)
>>>>> @@ -360,7 +360,8 @@
>>>>>  -fcx-limited-range @gol
>>>>>  -fdata-sections -fdce -fdelayed-branch @gol
>>>>>  -fdelete-null-pointer-checks -fdevirtualize -fdse @gol
>>>>> --fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects @gol
>>>>> +-fearly-inlining -feliminate-malloc -fipa-sra -fexpensive-optimizations @gol
>>>>> +-ffat-lto-objects @gol
>>>>>  -ffast-math -ffinite-math-only -ffloat-store
>>>>> -fexcess-precision=@var{style} @gol
>>>>>  -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol
>>>>>  -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol
>>>>> @@ -6238,6 +6239,7 @@
>>>>>  -fdefer-pop @gol
>>>>>  -fdelayed-branch @gol
>>>>>  -fdse @gol
>>>>> +-feliminate-malloc @gol
>>>>>  -fguess-branch-probability @gol
>>>>>  -fif-conversion2 @gol
>>>>>  -fif-conversion @gol
>>>>> @@ -6762,6 +6764,11 @@
>>>>>  Perform dead store elimination (DSE) on RTL@.
>>>>>  Enabled by default at @option{-O} and higher.
>>>>>
>>>>> +@item -feliminate-malloc
>>>>> +@opindex feliminate-malloc
>>>>> +Eliminate unnecessary malloc/free pairs.
>>>>> +Enabled by default at @option{-O} and higher.
>>>>> +
>>>>>  @item -fif-conversion
>>>>>  @opindex fif-conversion
>>>>>  Attempt to transform conditional jumps into branch-less equivalents.  This
>>>>> Index: gcc/testsuite/gcc.dg/free-malloc.c
>>>>> ===================================================================
>>>>> --- gcc/testsuite/gcc.dg/free-malloc.c  (revision 0)
>>>>> +++ gcc/testsuite/gcc.dg/free-malloc.c  (revision 0)
>>>>> @@ -0,0 +1,12 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-O2 -fno-eliminate-malloc" } */
>>>>> +/* { dg-final { scan-assembler-times "malloc" 2} } */
>>>>> +/* { dg-final { scan-assembler-times "free" 2} } */
>>>>> +
>>>>> +extern void * malloc (unsigned long);
>>>>> +extern void free (void *);
>>>>> +
>>>>> +void test ()
>>>>> +{
>>>>> +  free (malloc (10));
>>>>> +}
>>>>> Index: gcc/common.opt
>>>>> ===================================================================
>>>>> --- gcc/common.opt      (revision 187277)
>>>>> +++ gcc/common.opt      (working copy)
>>>>> @@ -1474,6 +1474,10 @@
>>>>>  Common Var(flag_dce) Init(1) Optimization
>>>>>  Use the RTL dead code elimination pass
>>>>>
>>>>> +feliminate-malloc
>>>>> +Common Var(flag_eliminate_malloc) Init(1) Optimization
>>>>> +Eliminate unnecessary malloc/free pairs
>>>>> +
>>>>>  fdse
>>>>>  Common Var(flag_dse) Init(1) Optimization
>>>>>  Use the RTL dead store elimination pass
>>>>> Index: gcc/tree-ssa-dce.c
>>>>> ===================================================================
>>>>> --- gcc/tree-ssa-dce.c  (revision 187277)
>>>>> +++ gcc/tree-ssa-dce.c  (working copy)
>>>>> @@ -309,6 +309,8 @@
>>>>>            case BUILT_IN_CALLOC:
>>>>>            case BUILT_IN_ALLOCA:
>>>>>            case BUILT_IN_ALLOCA_WITH_ALIGN:
>>>>> +             if (!flag_eliminate_malloc)
>>>>> +               mark_stmt_necessary (stmt, true);
>>>>>              return;
>>>>>
>>>>>            default:;

Patch

Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 187277)
+++ gcc/doc/invoke.texi	(working copy)
@@ -360,7 +360,8 @@ 
 -fcx-limited-range @gol
 -fdata-sections -fdce -fdelayed-branch @gol
 -fdelete-null-pointer-checks -fdevirtualize -fdse @gol
--fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects @gol
+-fearly-inlining -feliminate-malloc -fipa-sra -fexpensive-optimizations @gol
+-ffat-lto-objects @gol
 -ffast-math -ffinite-math-only -ffloat-store
-fexcess-precision=@var{style} @gol
 -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol
 -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol
@@ -6238,6 +6239,7 @@ 
 -fdefer-pop @gol
 -fdelayed-branch @gol
 -fdse @gol
+-feliminate-malloc @gol
 -fguess-branch-probability @gol
 -fif-conversion2 @gol
 -fif-conversion @gol
@@ -6762,6 +6764,11 @@ 
 Perform dead store elimination (DSE) on RTL@.
 Enabled by default at @option{-O} and higher.

+@item -feliminate-malloc
+@opindex feliminate-malloc
+Eliminate unnecessary malloc/free pairs.
+Enabled by default at @option{-O} and higher.
+
 @item -fif-conversion
 @opindex fif-conversion
 Attempt to transform conditional jumps into branch-less equivalents.  This
Index: gcc/testsuite/gcc.dg/free-malloc.c
===================================================================
--- gcc/testsuite/gcc.dg/free-malloc.c	(revision 0)
+++ gcc/testsuite/gcc.dg/free-malloc.c	(revision 0)
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-eliminate-malloc" } */
+/* { dg-final { scan-assembler-times "malloc" 2} } */
+/* { dg-final { scan-assembler-times "free" 2} } */
+
+extern void * malloc (unsigned long);
+extern void free (void *);
+
+void test ()
+{
+  free (malloc (10));
+}
Index: gcc/common.opt
===================================================================
--- gcc/common.opt	(revision 187277)
+++ gcc/common.opt	(working copy)
@@ -1474,6 +1474,10 @@ 
 Common Var(flag_dce) Init(1) Optimization
 Use the RTL dead code elimination pass

+feliminate-malloc
+Common Var(flag_eliminate_malloc) Init(1) Optimization
+Eliminate unnecessary malloc/free pairs
+
 fdse
 Common Var(flag_dse) Init(1) Optimization
 Use the RTL dead store elimination pass
Index: gcc/tree-ssa-dce.c
===================================================================
--- gcc/tree-ssa-dce.c	(revision 187277)
+++ gcc/tree-ssa-dce.c	(working copy)
@@ -309,6 +309,8 @@ 
 	    case BUILT_IN_CALLOC:
 	    case BUILT_IN_ALLOCA:
 	    case BUILT_IN_ALLOCA_WITH_ALIGN:
+	      if (!flag_eliminate_malloc)
+		mark_stmt_necessary (stmt, true);
 	      return;

 	    default:;