diff mbox series

Fix PR 84487, large rodata increase in tonto and other programs

Message ID 9dcba167-4fa0-7c7e-aee4-74a29a8b32cb@netcologne.de
State New
Headers show
Series Fix PR 84487, large rodata increase in tonto and other programs | expand

Commit Message

Thomas Koenig April 13, 2019, 6:48 p.m. UTC
Hello world,

the attached patch fixes a 8/9 regression where _def_init, an internal
Fortran variable containing only zeros, was placed into the .rodata
section. This led to a large increase in executable size.

There should be no impact on other languages because the change to
varasm.c is guarded by lang_GNU_Fortran ().

Regarding the test case: I did find one other test which checks
for .bss, so I suppose this is safe.

Regression-tested with a full test (--enable-languages=all and
make -j64 -k check) on POWER9.

I would like to apply it to both affected branches.

OK for the general and the Fortran part?

Regards

	Thomas

2019-04-13  Thomas Koenig  <tkoenig@gcc.gnu.org>

         PR fortran/84487
         * trans-decl.c (gfc_get_symbol_decl): Mark _def_init as
         artificial.

2019-04-13  Thomas Koenig  <tkoenig@gcc.gnu.org>

         PR fortran/84487
         * varasm.c (bss_initializer_p): If we are compiling Fortran, the
         decl is artifical and it has a size larger than 255, it can be
         put into BSS.

2019-04-13  Thomas Koenig  <tkoenig@gcc.gnu.org>

         PR fortran/84487
         * gfortran.dg/def_init_1.f90: New test.

Comments

Paul Richard Thomas April 13, 2019, 7:35 p.m. UTC | #1
Hi Thomas,

Thanks for your determination in dealing with this. It has been on my
TODO list for a long time but, like you at the outset, I had no idea
how to deal with it.

OK on the fortran side.

Paul

On Sat, 13 Apr 2019 at 19:48, Thomas Koenig <tkoenig@netcologne.de> wrote:
>
> Hello world,
>
> the attached patch fixes a 8/9 regression where _def_init, an internal
> Fortran variable containing only zeros, was placed into the .rodata
> section. This led to a large increase in executable size.
>
> There should be no impact on other languages because the change to
> varasm.c is guarded by lang_GNU_Fortran ().
>
> Regarding the test case: I did find one other test which checks
> for .bss, so I suppose this is safe.
>
> Regression-tested with a full test (--enable-languages=all and
> make -j64 -k check) on POWER9.
>
> I would like to apply it to both affected branches.
>
> OK for the general and the Fortran part?
>
> Regards
>
>         Thomas
>
> 2019-04-13  Thomas Koenig  <tkoenig@gcc.gnu.org>
>
>          PR fortran/84487
>          * trans-decl.c (gfc_get_symbol_decl): Mark _def_init as
>          artificial.
>
> 2019-04-13  Thomas Koenig  <tkoenig@gcc.gnu.org>
>
>          PR fortran/84487
>          * varasm.c (bss_initializer_p): If we are compiling Fortran, the
>          decl is artifical and it has a size larger than 255, it can be
>          put into BSS.
>
> 2019-04-13  Thomas Koenig  <tkoenig@gcc.gnu.org>
>
>          PR fortran/84487
>          * gfortran.dg/def_init_1.f90: New test.
>
>
Richard Biener April 15, 2019, 7:11 a.m. UTC | #2
On Sat, Apr 13, 2019 at 8:48 PM Thomas Koenig <tkoenig@netcologne.de> wrote:
>
> Hello world,
>
> the attached patch fixes a 8/9 regression where _def_init, an internal
> Fortran variable containing only zeros, was placed into the .rodata
> section. This led to a large increase in executable size.
>
> There should be no impact on other languages because the change to
> varasm.c is guarded by lang_GNU_Fortran ().
>
> Regarding the test case: I did find one other test which checks
> for .bss, so I suppose this is safe.
>
> Regression-tested with a full test (--enable-languages=all and
> make -j64 -k check) on POWER9.
>
> I would like to apply it to both affected branches.
>
> OK for the general and the Fortran part?

This won't work with LTO.  Note we have the issue in the middle-end as well
since we promote variables we see are not written to to TREE_READONLY.
This can be seen with (the somewhat artificial...):

int a[1024*1024] = { 0 };

int __attribute__((noinline)) foo() { return *(volatile int *)a; }

int main()
{
  return foo ();
}

where without -flto a gets placed into .bss while with -flto it
gets into .rodata.  So I believe we should add a DECL flag
specifying whether for section placement we can "ignore"
TREE_READONLY.  We'd initialize that with the original
state of TREE_READONLY so that the R/O promotion doesn't
change section placement.  Also the Fortran FE can then
simply set this flag on variables that may live in .bss.

There are 14 unused bits in tree_decl_with_vis so a
patch for the middle-end part could look like the attached
(w/o solving the LTO issue yet).

Of course adding sth like a .robss section would be nice.

Richard.

> Regards
>
>         Thomas
>
> 2019-04-13  Thomas Koenig  <tkoenig@gcc.gnu.org>
>
>          PR fortran/84487
>          * trans-decl.c (gfc_get_symbol_decl): Mark _def_init as
>          artificial.
>
> 2019-04-13  Thomas Koenig  <tkoenig@gcc.gnu.org>
>
>          PR fortran/84487
>          * varasm.c (bss_initializer_p): If we are compiling Fortran, the
>          decl is artifical and it has a size larger than 255, it can be
>          put into BSS.
>
> 2019-04-13  Thomas Koenig  <tkoenig@gcc.gnu.org>
>
>          PR fortran/84487
>          * gfortran.dg/def_init_1.f90: New test.
>
>
Jan Hubicka April 15, 2019, 11:48 a.m. UTC | #3
> 
> This won't work with LTO.  Note we have the issue in the middle-end as well
> since we promote variables we see are not written to to TREE_READONLY.
> This can be seen with (the somewhat artificial...):
> 
> int a[1024*1024] = { 0 };
> 
> int __attribute__((noinline)) foo() { return *(volatile int *)a; }
> 
> int main()
> {
>   return foo ();
> }
> 
> where without -flto a gets placed into .bss while with -flto it
> gets into .rodata.  So I believe we should add a DECL flag
> specifying whether for section placement we can "ignore"
> TREE_READONLY.  We'd initialize that with the original
> state of TREE_READONLY so that the R/O promotion doesn't
> change section placement.  Also the Fortran FE can then
> simply set this flag on variables that may live in .bss.
> 
> There are 14 unused bits in tree_decl_with_vis so a
> patch for the middle-end part could look like the attached
> (w/o solving the LTO issue yet).
> 
> Of course adding sth like a .robss section would be nice.

Yep, but I think what you propose works well in practice (I am not sure
if we are forced to put const delcared variables to readonly memory and
if we can't do this as binary size optimization always). The patch
looks fine to me.  Would be possible to place the flags into
varpool_node rather then TREE? It is a lot easier to manage flags
there.

Honza
> 
> Richard.
> 
> > Regards
> >
> >         Thomas
> >
> > 2019-04-13  Thomas Koenig  <tkoenig@gcc.gnu.org>
> >
> >          PR fortran/84487
> >          * trans-decl.c (gfc_get_symbol_decl): Mark _def_init as
> >          artificial.
> >
> > 2019-04-13  Thomas Koenig  <tkoenig@gcc.gnu.org>
> >
> >          PR fortran/84487
> >          * varasm.c (bss_initializer_p): If we are compiling Fortran, the
> >          decl is artifical and it has a size larger than 255, it can be
> >          put into BSS.
> >
> > 2019-04-13  Thomas Koenig  <tkoenig@gcc.gnu.org>
> >
> >          PR fortran/84487
> >          * gfortran.dg/def_init_1.f90: New test.
> >
> >
Florian Weimer April 15, 2019, 11:54 a.m. UTC | #4
* Richard Biener:

> Of course adding sth like a .robss section would be nice.

I think this is strictly a link editor issue because a read-only PT_LOAD
directive with a memory size larger than the file size already produces
read-only zero pages, without requiring a file allocation.

Thanks,
Florian
Segher Boessenkool April 15, 2019, 7:53 p.m. UTC | #5
On Mon, Apr 15, 2019 at 01:54:11PM +0200, Florian Weimer wrote:
> * Richard Biener:
> 
> > Of course adding sth like a .robss section would be nice.
> 
> I think this is strictly a link editor issue because a read-only PT_LOAD
> directive with a memory size larger than the file size already produces
> read-only zero pages, without requiring a file allocation.

But .rodata normally is not the last thing in its segment (the .eh*
things are after it, and those are usually not all zero).


Segher
Florian Weimer April 16, 2019, 9:33 a.m. UTC | #6
* Segher Boessenkool:

> On Mon, Apr 15, 2019 at 01:54:11PM +0200, Florian Weimer wrote:
>> * Richard Biener:
>> 
>> > Of course adding sth like a .robss section would be nice.
>> 
>> I think this is strictly a link editor issue because a read-only PT_LOAD
>> directive with a memory size larger than the file size already produces
>> read-only zero pages, without requiring a file allocation.
>
> But .rodata normally is not the last thing in its segment (the .eh*
> things are after it, and those are usually not all zero).

If you don't mind the proliferation of load segments (we've add many of
them in recent years), placement does not matter.

Thanks,
Florian
Richard Biener April 16, 2019, 10:05 a.m. UTC | #7
On Tue, Apr 16, 2019 at 11:33 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Segher Boessenkool:
>
> > On Mon, Apr 15, 2019 at 01:54:11PM +0200, Florian Weimer wrote:
> >> * Richard Biener:
> >>
> >> > Of course adding sth like a .robss section would be nice.
> >>
> >> I think this is strictly a link editor issue because a read-only PT_LOAD
> >> directive with a memory size larger than the file size already produces
> >> read-only zero pages, without requiring a file allocation.
> >
> > But .rodata normally is not the last thing in its segment (the .eh*
> > things are after it, and those are usually not all zero).
>
> If you don't mind the proliferation of load segments (we've add many of
> them in recent years), placement does not matter.

Btw, besides being a link editor issue the issue also shows in object
file size.  Not sure if a similar trick exists for those (.rodata with NOBITS?)

Richard.

> Thanks,
> Florian
Segher Boessenkool April 16, 2019, 10:12 a.m. UTC | #8
On Tue, Apr 16, 2019 at 11:33:39AM +0200, Florian Weimer wrote:
> * Segher Boessenkool:
> 
> > On Mon, Apr 15, 2019 at 01:54:11PM +0200, Florian Weimer wrote:
> >> * Richard Biener:
> >> 
> >> > Of course adding sth like a .robss section would be nice.
> >> 
> >> I think this is strictly a link editor issue because a read-only PT_LOAD
> >> directive with a memory size larger than the file size already produces
> >> read-only zero pages, without requiring a file allocation.
> >
> > But .rodata normally is not the last thing in its segment (the .eh*
> > things are after it, and those are usually not all zero).
> 
> If you don't mind the proliferation of load segments (we've add many of
> them in recent years), placement does not matter.

Many (or at least some) ABIs have requirements wrt placement, and number
and kind of segments even.

Is there any disadvantage to having a .robss section (that is mapped at the
end of the text segment, or some other r/o segment, so that its data does
not have to exist in object and executable files)?


Segher
Jakub Jelinek April 16, 2019, 10:16 a.m. UTC | #9
On Tue, Apr 16, 2019 at 11:33:39AM +0200, Florian Weimer wrote:
> * Segher Boessenkool:
> 
> > On Mon, Apr 15, 2019 at 01:54:11PM +0200, Florian Weimer wrote:
> >> * Richard Biener:
> >> 
> >> > Of course adding sth like a .robss section would be nice.
> >> 
> >> I think this is strictly a link editor issue because a read-only PT_LOAD
> >> directive with a memory size larger than the file size already produces
> >> read-only zero pages, without requiring a file allocation.
> >
> > But .rodata normally is not the last thing in its segment (the .eh*
> > things are after it, and those are usually not all zero).
> 
> If you don't mind the proliferation of load segments (we've add many of
> them in recent years), placement does not matter.

That is something I really dislike, each load segment has a significant cost
and from what I remember, at least one of the 4 PT_LOADs in current setup is
completely useless:
  LOAD           0x000000 0x0000000000000000 0x0000000000000000 0x02bf60 0x02bf60 R   0x1000
  LOAD           0x02c000 0x000000000002c000 0x000000000002c000 0x0a74a5 0x0a74a5 R E 0x1000
  LOAD           0x0d4000 0x00000000000d4000 0x00000000000d4000 0x033fd0 0x033fd0 R   0x1000
  LOAD           0x108d50 0x0000000000109d50 0x0000000000109d50 0x00b814 0x0153d8 RW  0x1000
there is no reason not to reorder at least on most targets the sections such
that there is just one R, one R E and one RW segment.

	Jakub
Segher Boessenkool April 16, 2019, 10:17 a.m. UTC | #10
On Tue, Apr 16, 2019 at 12:05:56PM +0200, Richard Biener wrote:
> On Tue, Apr 16, 2019 at 11:33 AM Florian Weimer <fweimer@redhat.com> wrote:
> >
> > * Segher Boessenkool:
> >
> > > On Mon, Apr 15, 2019 at 01:54:11PM +0200, Florian Weimer wrote:
> > >> * Richard Biener:
> > >>
> > >> > Of course adding sth like a .robss section would be nice.
> > >>
> > >> I think this is strictly a link editor issue because a read-only PT_LOAD
> > >> directive with a memory size larger than the file size already produces
> > >> read-only zero pages, without requiring a file allocation.
> > >
> > > But .rodata normally is not the last thing in its segment (the .eh*
> > > things are after it, and those are usually not all zero).
> >
> > If you don't mind the proliferation of load segments (we've add many of
> > them in recent years), placement does not matter.
> 
> Btw, besides being a link editor issue the issue also shows in object
> file size.  Not sure if a similar trick exists for those (.rodata with NOBITS?)

Section type SHT_NOBITS does the trick, yes.


Segher
Segher Boessenkool April 16, 2019, 10:24 a.m. UTC | #11
On Tue, Apr 16, 2019 at 12:16:16PM +0200, Jakub Jelinek wrote:
> On Tue, Apr 16, 2019 at 11:33:39AM +0200, Florian Weimer wrote:
> > * Segher Boessenkool:
> > 
> > > On Mon, Apr 15, 2019 at 01:54:11PM +0200, Florian Weimer wrote:
> > >> * Richard Biener:
> > >> 
> > >> > Of course adding sth like a .robss section would be nice.
> > >> 
> > >> I think this is strictly a link editor issue because a read-only PT_LOAD
> > >> directive with a memory size larger than the file size already produces
> > >> read-only zero pages, without requiring a file allocation.
> > >
> > > But .rodata normally is not the last thing in its segment (the .eh*
> > > things are after it, and those are usually not all zero).
> > 
> > If you don't mind the proliferation of load segments (we've add many of
> > them in recent years), placement does not matter.
> 
> That is something I really dislike, each load segment has a significant cost
> and from what I remember, at least one of the 4 PT_LOADs in current setup is
> completely useless:
>   LOAD           0x000000 0x0000000000000000 0x0000000000000000 0x02bf60 0x02bf60 R   0x1000
>   LOAD           0x02c000 0x000000000002c000 0x000000000002c000 0x0a74a5 0x0a74a5 R E 0x1000
>   LOAD           0x0d4000 0x00000000000d4000 0x00000000000d4000 0x033fd0 0x033fd0 R   0x1000
>   LOAD           0x108d50 0x0000000000109d50 0x0000000000109d50 0x00b814 0x0153d8 RW  0x1000
> there is no reason not to reorder at least on most targets the sections such
> that there is just one R, one R E and one RW segment.

It is more painful if your segments are aligned to 64kB or anything else
bigger, too.


Segher
Thomas König April 17, 2019, 7:19 a.m. UTC | #12
Hi,

thanks a lot for the extensive discussion :-)

How should we now proceed, first for gcc 9, snd then for backporting?
Use Richard‘s patch with the corresponding Fortran FE change?

Regards

Thomas
Richard Biener April 17, 2019, 8:41 a.m. UTC | #13
On Wed, Apr 17, 2019 at 9:19 AM Thomas König <tk@tkoenig.net> wrote:
>
> Hi,
>
> thanks a lot for the extensive discussion :-)
>
> How should we now proceed, first for gcc 9, snd then for backporting?
> Use Richard‘s patch with the corresponding Fortran FE change?

Btw, for the testcase the fortran FE could also simply opt to not
make def_init TREE_READONLY.  Or even better, for all-zero
initialization omit the explicit initialization data and instead
mark it specially in the vtable (just use a NULL initializer
denoting zero-initialization?).  Even .bss costs (runtime) memory.

But yes, my patch would be a way to solve the middle-end issue
of promoting a variable TREE_READONLY, preventing .bss use.
And the FE could then "abuse" this feature.  Note the middle-end
already special-cases variables with an explicit section so the
Fortran FE can already use that feature to put the initializer into
.bss explicitely (set_decl_section_name (decl, ".bss"),
conditional on availability (not 100% sure how to test that...).
Your testcase probably will fail on targets w/o .bss section support.

Richard.

> Regards
>
> Thomas
Florian Weimer April 17, 2019, 11:06 a.m. UTC | #14
* Richard Biener:

> On Wed, Apr 17, 2019 at 9:19 AM Thomas König <tk@tkoenig.net> wrote:
>>
>> Hi,
>>
>> thanks a lot for the extensive discussion :-)
>>
>> How should we now proceed, first for gcc 9, snd then for backporting?
>> Use Richard‘s patch with the corresponding Fortran FE change?
>
> Btw, for the testcase the fortran FE could also simply opt to not
> make def_init TREE_READONLY.  Or even better, for all-zero
> initialization omit the explicit initialization data and instead
> mark it specially in the vtable (just use a NULL initializer
> denoting zero-initialization?).  Even .bss costs (runtime) memory.

Not just that, .bss adds to the commit charge, while .rodata would not.
So it's not clear that using .bss for zero constants is always a win.

Thanks,
Florian
Andreas Schwab April 17, 2019, 12:19 p.m. UTC | #15
On Apr 17 2019, Florian Weimer <fweimer@redhat.com> wrote:

> Not just that, .bss adds to the commit charge,

Only one page at most.

Andreas.
Florian Weimer April 17, 2019, 12:22 p.m. UTC | #16
* Andreas Schwab:

> On Apr 17 2019, Florian Weimer <fweimer@redhat.com> wrote:
>
>> Not just that, .bss adds to the commit charge,
>
> Only one page at most.

That would be a bug.  All of it is anonymous memory which needs backing
from RAM or swap, in case the process writes to it.

Thanks,
Florian
Andreas Schwab April 17, 2019, 12:52 p.m. UTC | #17
On Apr 17 2019, Florian Weimer <fweimer@redhat.com> wrote:

> * Andreas Schwab:
>
>> On Apr 17 2019, Florian Weimer <fweimer@redhat.com> wrote:
>>
>>> Not just that, .bss adds to the commit charge,
>>
>> Only one page at most.
>
> That would be a bug.

You cannot avoid it for the page shared with .data, unless you force
.bss to be page aligned.

Andreas.
Florian Weimer April 17, 2019, 12:55 p.m. UTC | #18
* Andreas Schwab:

> On Apr 17 2019, Florian Weimer <fweimer@redhat.com> wrote:
>
>> * Andreas Schwab:
>>
>>> On Apr 17 2019, Florian Weimer <fweimer@redhat.com> wrote:
>>>
>>>> Not just that, .bss adds to the commit charge,
>>>
>>> Only one page at most.
>>
>> That would be a bug.
>
> You cannot avoid it for the page shared with .data, unless you force
> .bss to be page aligned.

Would you please elaborate?

With “commit charge”, I mean address space accounted towards the commit
limit, when Linux is running in vm.overcommit_memory=2 mode.

Thanks,
Florian
diff mbox series

Patch

Index: fortran/trans-decl.c
===================================================================
--- fortran/trans-decl.c	(Revision 270182)
+++ fortran/trans-decl.c	(Arbeitskopie)
@@ -1865,7 +1865,10 @@  gfc_get_symbol_decl (gfc_symbol * sym)
 
   if (sym->attr.vtab
       || (sym->name[0] == '_' && gfc_str_startswith (sym->name, "__def_init")))
-    TREE_READONLY (decl) = 1;
+    {
+      TREE_READONLY (decl) = 1;
+      DECL_ARTIFICIAL (decl) = 1;
+    }
 
   return decl;
 }
Index: varasm.c
===================================================================
--- varasm.c	(Revision 270182)
+++ varasm.c	(Arbeitskopie)
@@ -1007,9 +1007,13 @@  decode_reg_name (const char *name)
 bool
 bss_initializer_p (const_tree decl, bool named)
 {
-  /* Do not put non-common constants into the .bss section, they belong in
-     a readonly section, except when NAMED is true.  */
-  return ((!TREE_READONLY (decl) || DECL_COMMON (decl) || named)
+  /* Do not put non-common constants into the .bss section, they
+     belong in a readonly section, except when NAMED is true or when
+     we are dealing with an artificial declaration, in the Fortran
+     compiler only, above a certain size.  */
+  return ((!TREE_READONLY (decl) || DECL_COMMON (decl) || named
+	   || (lang_GNU_Fortran () && DECL_ARTIFICIAL (decl)
+	       && (tree_to_uhwi (DECL_SIZE_UNIT (decl)) > 255 )))
 	  && (DECL_INITIAL (decl) == NULL
 	      /* In LTO we have no errors in program; error_mark_node is used
 	         to mark offlined constructors.  */