Message ID | alpine.LSU.2.11.1403191547280.6041@zhemvz.fhfr.qr |
---|---|
State | New |
Headers | show |
On 03/19/2014 03:55 PM, Richard Biener wrote: > On Wed, 19 Mar 2014, Martin Liška wrote: > >> There are stats for Firefox with LTO and -O2. According to graphs it >> looks that memory consumption for parallel WPA phase is similar. >> When I disable parallel WPA, wpa footprint is ~4GB, but ltrans memory >> footprint is similar to parallel WPA that reduces libxul.so linking by ~10%. > Ok, so I suppose this tracks RSS, not virtual memory use (what is > "used" and what is "active")? Data are given by vmstat, according to: http://stackoverflow.com/questions/18529723/what-is-active-memory-and-inactive-memory *Active memory*is memory that is being used by a particular process. *Inactive memory*is memory that was allocated to a process that is no longer running. So please follow just 'blue' line that displays really used memory. According to man, vmstat tracks virtual memory statistics. > And it is WPA plus LTRANS stages, WPA ends where memory use first goes > down to zero? > I wonder if you can identify the point where parallel streaming > starts and where it ends ... ;) Exactly, WPA ends when it goes to zero. > Btw, I have another patch in my local tree, limiting the > exponential growth of blocks we allocate when outputting sections. > But it shouldn't be _that_ bad ... maybe you can try if it has > any effect? I can apply it. Martin > > Thanks, > Richard. > > Index: gcc/lto-section-out.c > =================================================================== > --- gcc/lto-section-out.c (revision 208642) > +++ gcc/lto-section-out.c (working copy) > @@ -99,13 +99,19 @@ lto_end_section (void) > } > > > +/* We exponentially grow the size of the blocks as we need to make > + room for more data to be written. Start with a single page and go up > + to 2MB pages for this. */ > +#define FIRST_BLOCK_SIZE 4096 > +#define MAX_BLOCK_SIZE (2 * 1024 * 1024) > + > /* Write all of the chars in OBS to the assembler. Recycle the blocks > in obs as this is being done. */ > > void > lto_write_stream (struct lto_output_stream *obs) > { > - unsigned int block_size = 1024; > + unsigned int block_size = FIRST_BLOCK_SIZE; > struct lto_char_ptr_base *block; > struct lto_char_ptr_base *next_block; > if (!obs->first_block) > @@ -135,6 +141,7 @@ lto_write_stream (struct lto_output_stre > else > lang_hooks.lto.append_data (base, num_chars, block); > block_size *= 2; > + block_size = MIN (MAX_BLOCK_SIZE, block_size); > } > } > > @@ -152,7 +159,7 @@ lto_append_block (struct lto_output_stre > { > /* This is the first time the stream has been written > into. */ > - obs->block_size = 1024; > + obs->block_size = FIRST_BLOCK_SIZE; > new_block = (struct lto_char_ptr_base*) xmalloc (obs->block_size); > obs->first_block = new_block; > } > @@ -162,6 +169,7 @@ lto_append_block (struct lto_output_stre > /* Get a new block that is twice as big as the last block > and link it into the list. */ > obs->block_size *= 2; > + obs->block_size = MIN (MAX_BLOCK_SIZE, obs->block_size); > new_block = (struct lto_char_ptr_base*) xmalloc (obs->block_size); > /* The first bytes of the block are reserved as a pointer to > the next block. Set the chain of the full block to the
On Wed, 19 Mar 2014, Martin Liška wrote: > > On 03/19/2014 03:55 PM, Richard Biener wrote: > > On Wed, 19 Mar 2014, Martin Liška wrote: > > > > > There are stats for Firefox with LTO and -O2. According to graphs it > > > looks that memory consumption for parallel WPA phase is similar. > > > When I disable parallel WPA, wpa footprint is ~4GB, but ltrans memory > > > footprint is similar to parallel WPA that reduces libxul.so linking by > > > ~10%. > > Ok, so I suppose this tracks RSS, not virtual memory use (what is > > "used" and what is "active")? > > Data are given by vmstat, according to: > http://stackoverflow.com/questions/18529723/what-is-active-memory-and-inactive-memory > > *Active memory*is memory that is being used by a particular process. > *Inactive memory*is memory that was allocated to a process that is no longer > running. > > So please follow just 'blue' line that displays really used memory. According > to man, vmstat tracks virtual memory statistics. But 'blue' is neither active nor inactive ... what is 'used'? Does it correspond to 'swpd'? If it is virtual memory in use then this is expected to grow when fork()ing as the virtual memory space is obviously copied (just the pages are still shared). For me allocating a GB memory and clearing it increases "active" by 1GB and then forking doesn't increase any of the metrics vmstat -a outputs in any significant way. > > And it is WPA plus LTRANS stages, WPA ends where memory use first goes > > down to zero? > > I wonder if you can identify the point where parallel streaming > > starts and where it ends ... ;) > > Exactly, WPA ends when it goes to zero. So the difference isn't that big (8GB vs. 7.2GB), and is likely attributed to heap memory we allocate during the stream-out. For example we need some for the tree-ref-encoders (I remember that can be a significant amount of memory, but I improved that already as far as possible...). So yes, we _do_ allocate memory during stream-out and that is now required N times. > > Btw, I have another patch in my local tree, limiting the > > exponential growth of blocks we allocate when outputting sections. > > But it shouldn't be _that_ bad ... maybe you can try if it has > > any effect? > > I can apply it. Thanks, Richard.
Index: gcc/lto-section-out.c =================================================================== --- gcc/lto-section-out.c (revision 208642) +++ gcc/lto-section-out.c (working copy) @@ -99,13 +99,19 @@ lto_end_section (void) } +/* We exponentially grow the size of the blocks as we need to make + room for more data to be written. Start with a single page and go up + to 2MB pages for this. */ +#define FIRST_BLOCK_SIZE 4096 +#define MAX_BLOCK_SIZE (2 * 1024 * 1024) + /* Write all of the chars in OBS to the assembler. Recycle the blocks in obs as this is being done. */ void lto_write_stream (struct lto_output_stream *obs) { - unsigned int block_size = 1024; + unsigned int block_size = FIRST_BLOCK_SIZE; struct lto_char_ptr_base *block; struct lto_char_ptr_base *next_block; if (!obs->first_block) @@ -135,6 +141,7 @@ lto_write_stream (struct lto_output_stre else lang_hooks.lto.append_data (base, num_chars, block); block_size *= 2; + block_size = MIN (MAX_BLOCK_SIZE, block_size); } } @@ -152,7 +159,7 @@ lto_append_block (struct lto_output_stre { /* This is the first time the stream has been written into. */ - obs->block_size = 1024; + obs->block_size = FIRST_BLOCK_SIZE; new_block = (struct lto_char_ptr_base*) xmalloc (obs->block_size); obs->first_block = new_block; } @@ -162,6 +169,7 @@ lto_append_block (struct lto_output_stre /* Get a new block that is twice as big as the last block and link it into the list. */ obs->block_size *= 2; + obs->block_size = MIN (MAX_BLOCK_SIZE, obs->block_size); new_block = (struct lto_char_ptr_base*) xmalloc (obs->block_size); /* The first bytes of the block are reserved as a pointer to the next block. Set the chain of the full block to the