Message ID | 1477318076-22231-3-git-send-email-siddhesh@sourceware.org |
---|---|
State | New |
Headers | show |
Hi Siddhesh, Most of this is formatting or cleanup/improvement of the pre-existing chunks that were moved, but the comment on M_ARENA_MAX regards content. On 10/24/2016 07:07 AM, Siddhesh Poyarekar wrote: > The M_ARENA_* mallopt parameters are in wide use in production to > control the number of arenas that a long lived process creates and > hence there is no point in stating that this interface is non-public. > Document this interface and remove the obsolete comment. > > * manual/memory.texi (M_ARENA_TEST): Add documentation. > (M_ARENA_MAX): Likewise. > * malloc/malloc.c: Remove obsolete comment. > --- > malloc/malloc.c | 1 - > manual/memory.texi | 122 +++++++++++++++++++++++++++-------------------------- > 2 files changed, 62 insertions(+), 61 deletions(-) > > diff --git a/malloc/malloc.c b/malloc/malloc.c > index ef04360..a849901 100644 > --- a/malloc/malloc.c > +++ b/malloc/malloc.c > @@ -1718,7 +1718,6 @@ static struct malloc_par mp_ = > }; > > > -/* Non public mallopt parameters. */ > #define M_ARENA_TEST -7 > #define M_ARENA_MAX -8 > > diff --git a/manual/memory.texi b/manual/memory.texi > index 6f33455..198a933 100644 > --- a/manual/memory.texi > +++ b/manual/memory.texi > @@ -162,6 +162,8 @@ special to @theglibc{} and GNU Compiler. > > @menu > * Memory Allocation and C:: How to get different kinds of allocation in C. > +* The GNU allocator:: An overview of the GNU @code{malloc} I think "Allocator" should be capitalized. The manual is inconsistent on @subsection capitalization, but the majority of it is (capitalized). Also, all the other info menu entries visible in the context here are fully capitalized. > + implementation. > * Unconstrained Allocation:: The @code{malloc} facility allows fully general > dynamic allocation. > * Allocation Debugging:: Finding memory leaks and not freed memory. > @@ -258,6 +260,43 @@ address of the space. Then you can use the operators @samp{*} and > @} > @end smallexample > > +@node The GNU allocator > +@subsection The GNU allocator Allocator > +@cindex gnu allocator > + > +The @code{malloc} implementation in @theglibc{} is derived from ptmalloc > +(pthreads malloc), which in turn is derived from dlmalloc (Doug Lea malloc). > +This malloc may allocate memory in two different ways depending on their size > +and certain parameters that may be controlled by users. The most common way is > +to allocate portions of memory (called chunks) from a large contiguous area of > +memory and manage these areas to optimize their use and reduce wastage in the > +form of unusable chunks. Traditionally the system heap was set up to be the one > +large memory area but @theglibc{} @code{malloc} implementation maintains This should be "the @glibcadj{} @code{malloc}". > +multiple such areas to optimize their use in multi-threaded applications. Each > +such area is internally referred to as an @code{arena}. @dfn{arena} Unless this is a function name (or literal string one would be using in code, for example), this is simply a term we use to describe the concept. On first use, where we define the term, it should have @dfn{}, and otherwise it doesn't need to be stylized at all, such as... > + > +As opposed to other versions, the @code{malloc} in @theglibc{} does not round > +up chunk sizes to powers of two, neither for large nor for small sizes. > +Neighboring chunks can be coalesced on a @code{free} no matter what their size > +is. This makes the implementation suitable for all kinds of allocation > +patterns without generally incurring high memory waste through fragmentation. > +The presence of multiple @code{arenas} allows multiple threads to allocate here (no @code{}). > +memory simultaneously in their own separate arenas, thus improving performance. > + > +The other way of memory allocation is for very large blocks, i.e. much larger > +than a page. These requests are allocated with @code{mmap} (anonymous or via > +@code{/dev/zero}). This has the great advantage that these chunks are returned Should be @file{}. A reference to mmap would be good. Something like, "... (anonymous or via @file{/dev/zero}; @pxref{Memory-mapped I/O})." > +to the system immediately when they are freed. Therefore, it cannot happen > +that a large chunk becomes ``locked'' in between smaller ones and even after > +calling @code{free} wastes memory. The size threshold for @code{mmap} to be > +used is dynamic and gets adjusted according to allocation patterns of the > +program. This can also be statically adjusted with @code{mallopt}. The use of A reference to mallopt would be good here. > +@code{mmap} can also be disabled completely. Should briefly say how and/or give a reference. I believe mallopt applies to both, so maybe something like, "@code{mallopt} can be used to statically adjust the threshold using @code{M_MMAP_THRESHOLD}, and the use of @code{mmap} can be disabled completely with @code{M_MMAP_MAX}; @pxref{Malloc Tunable Parameters}." > + > +A more detailed technical description of the GNU allocator is maintained in > +@theglibc{} wiki. See the @glibcadj{} > +@uref{https://sourceware.org/glibc/wiki/MallocInternals}. > + > @node Unconstrained Allocation > @subsection Unconstrained Allocation > @cindex unconstrained memory allocation > @@ -278,8 +317,6 @@ any time (or never). > bigger or smaller. > * Allocating Cleared Space:: Use @code{calloc} to allocate a > block and clear it. > -* Efficiency and Malloc:: Efficiency considerations in use of > - these functions. > * Aligned Memory Blocks:: Allocating specially aligned memory. > * Malloc Tunable Parameters:: Use @code{mallopt} to adjust allocation > parameters. > @@ -867,59 +904,6 @@ But in general, it is not guaranteed that @code{calloc} calls > @code{malloc}/@code{realloc}/@code{free} outside the C library, it > should always define @code{calloc}, too. > > -@node Efficiency and Malloc > -@subsubsection Efficiency Considerations for @code{malloc} > -@cindex efficiency and @code{malloc} > - > - > - > - > -@ignore > - > -@c No longer true, see below instead. > -To make the best use of @code{malloc}, it helps to know that the GNU > -version of @code{malloc} always dispenses small amounts of memory in > -blocks whose sizes are powers of two. It keeps separate pools for each > -power of two. This holds for sizes up to a page size. Therefore, if > -you are free to choose the size of a small block in order to make > -@code{malloc} more efficient, make it a power of two. > -@c !!! xref getpagesize > - > -Once a page is split up for a particular block size, it can't be reused > -for another size unless all the blocks in it are freed. In many > -programs, this is unlikely to happen. Thus, you can sometimes make a > -program use memory more efficiently by using blocks of the same size for > -many different purposes. > - > -When you ask for memory blocks of a page or larger, @code{malloc} uses a > -different strategy; it rounds the size up to a multiple of a page, and > -it can coalesce and split blocks as needed. > - > -The reason for the two strategies is that it is important to allocate > -and free small blocks as fast as possible, but speed is less important > -for a large block since the program normally spends a fair amount of > -time using it. Also, large blocks are normally fewer in number. > -Therefore, for large blocks, it makes sense to use a method which takes > -more time to minimize the wasted space. > - > -@end ignore > - > -As opposed to other versions, the @code{malloc} in @theglibc{} > -does not round up block sizes to powers of two, neither for large nor > -for small sizes. Neighboring chunks can be coalesced on a @code{free} > -no matter what their size is. This makes the implementation suitable > -for all kinds of allocation patterns without generally incurring high > -memory waste through fragmentation. > - > -Very large blocks (much larger than a page) are allocated with > -@code{mmap} (anonymous or via @code{/dev/zero}) by this implementation. > -This has the great advantage that these chunks are returned to the > -system immediately when they are freed. Therefore, it cannot happen > -that a large chunk becomes ``locked'' in between smaller ones and even > -after calling @code{free} wastes memory. The size threshold for > -@code{mmap} to be used can be adjusted with @code{mallopt}. The use of > -@code{mmap} can also be disabled completely. > - > @node Aligned Memory Blocks > @subsubsection Allocating Aligned Memory Blocks > > @@ -1105,10 +1089,6 @@ parameter to be set, and @var{value} the new value to be set. Possible > choices for @var{param}, as defined in @file{malloc.h}, are: > > @table @code > -@comment TODO: @item M_ARENA_MAX > -@comment - Document ARENA_MAX env var. > -@comment TODO: @item M_ARENA_TEST > -@comment - Document ARENA_TEST env var. > @comment TODO: @item M_CHECK_ACTION > @item M_MMAP_MAX > The maximum number of chunks to allocate with @code{mmap}. Setting this > @@ -1169,6 +1149,28 @@ value is set statically to the provided input. > > This parameter can also be set for the process at startup by setting the > environment variable @code{MALLOC_TRIM_THRESHOLD_} to the desired value. > + > +@item M_ARENA_TEST > +This parameter specifies the number of arenas that can be created before the > +test on the limit to the number of arenas is conducted. The value is ignored if > +@code{M_ARENA_MAX} is set. > + > +The default value of this parameter is 2 on 32-bit systems and 8 on 64-bit > +systems. > + > +This parameter can also be set for the process at startup by setting the > +environment variable @code{MALLOC_ARENA_TEST} to the desired value. > +@item M_ARENA_MAX > +This parameter sets the number of arenas to use regardless of the number of > +cores in the system. > + > +The default value of this tunable is @code{0}, meaning that the limit on the > +number of arenas is determined by the number of CPU cores online. For 32-bit > +systems the limit is twice the number of cores online and on 64-bit systems, it > +is eight times the number of cores online. Even though I had followed the thread, I immediately jumped to the same confusion with M_ARENA_TEST that was resolved in [1]. Explicitly stating here that 2 and 8 are not derived from M_ARENA_TEST defaults might save a lot of general confusion down the road. > + > +This parameter can also be set for the process at startup by setting the > +environment variable @code{MALLOC_ARENA_MAX} to the desired value. > @end table > > @end deftypefun > @@ -1511,7 +1513,7 @@ This is the total size of memory allocated with @code{sbrk} by > This is the number of chunks not in use. (The memory allocator > internally gets chunks of memory from the operating system, and then > carves them up to satisfy individual @code{malloc} requests; see > -@ref{Efficiency and Malloc}.) > +@ref{The GNU allocator}.) Allocator Also, making this an @pxref{} now would be nice, since this renders wrong in info as-is. > > @item int smblks > This field is unused. > Rical [1] https://sourceware.org/ml/libc-alpha/2016-10/msg00310.html
Siddhesh Poyarekar <siddhesh@sourceware.org> writes: > -/* Non public mallopt parameters. */ > #define M_ARENA_TEST -7 > #define M_ARENA_MAX -8 Hmmm... if these are now public, do they need to be moved elsewhere? Like, malloc.h ? > +The presence of multiple @code{arenas} allows multiple threads to allocate > +memory simultaneously in their own separate arenas, thus improving performance. Not quite true - there isn't one arena per thread, there are N arenas per M threads. Probably better to say "... simultaneously in separate arenas ..." > +environment variable @code{MALLOC_ARENA_TEST} to the desired value. > +@item M_ARENA_MAX Missing blank line. Otherwise, the content looks OK to me from a technical point of view.
On Wednesday 26 October 2016 03:25 AM, DJ Delorie wrote: > Siddhesh Poyarekar <siddhesh@sourceware.org> writes: >> -/* Non public mallopt parameters. */ >> #define M_ARENA_TEST -7 >> #define M_ARENA_MAX -8 > > Hmmm... if these are now public, do they need to be moved elsewhere? > Like, malloc.h ? They're already there in malloc.h. I'll push a follow-up patch if they can be removed. >> +The presence of multiple @code{arenas} allows multiple threads to allocate >> +memory simultaneously in their own separate arenas, thus improving performance. > > Not quite true - there isn't one arena per thread, there are N arenas > per M threads. Probably better to say "... simultaneously in separate > arenas ..." OK. >> +environment variable @code{MALLOC_ARENA_TEST} to the desired value. >> +@item M_ARENA_MAX > > Missing blank line. > > Otherwise, the content looks OK to me from a technical point of view. Thanks, I'll fix up formatting comments from Rical and push this. Siddhesh
diff --git a/malloc/malloc.c b/malloc/malloc.c index ef04360..a849901 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -1718,7 +1718,6 @@ static struct malloc_par mp_ = }; -/* Non public mallopt parameters. */ #define M_ARENA_TEST -7 #define M_ARENA_MAX -8 diff --git a/manual/memory.texi b/manual/memory.texi index 6f33455..198a933 100644 --- a/manual/memory.texi +++ b/manual/memory.texi @@ -162,6 +162,8 @@ special to @theglibc{} and GNU Compiler. @menu * Memory Allocation and C:: How to get different kinds of allocation in C. +* The GNU allocator:: An overview of the GNU @code{malloc} + implementation. * Unconstrained Allocation:: The @code{malloc} facility allows fully general dynamic allocation. * Allocation Debugging:: Finding memory leaks and not freed memory. @@ -258,6 +260,43 @@ address of the space. Then you can use the operators @samp{*} and @} @end smallexample +@node The GNU allocator +@subsection The GNU allocator +@cindex gnu allocator + +The @code{malloc} implementation in @theglibc{} is derived from ptmalloc +(pthreads malloc), which in turn is derived from dlmalloc (Doug Lea malloc). +This malloc may allocate memory in two different ways depending on their size +and certain parameters that may be controlled by users. The most common way is +to allocate portions of memory (called chunks) from a large contiguous area of +memory and manage these areas to optimize their use and reduce wastage in the +form of unusable chunks. Traditionally the system heap was set up to be the one +large memory area but @theglibc{} @code{malloc} implementation maintains +multiple such areas to optimize their use in multi-threaded applications. Each +such area is internally referred to as an @code{arena}. + +As opposed to other versions, the @code{malloc} in @theglibc{} does not round +up chunk sizes to powers of two, neither for large nor for small sizes. +Neighboring chunks can be coalesced on a @code{free} no matter what their size +is. This makes the implementation suitable for all kinds of allocation +patterns without generally incurring high memory waste through fragmentation. +The presence of multiple @code{arenas} allows multiple threads to allocate +memory simultaneously in their own separate arenas, thus improving performance. + +The other way of memory allocation is for very large blocks, i.e. much larger +than a page. These requests are allocated with @code{mmap} (anonymous or via +@code{/dev/zero}). This has the great advantage that these chunks are returned +to the system immediately when they are freed. Therefore, it cannot happen +that a large chunk becomes ``locked'' in between smaller ones and even after +calling @code{free} wastes memory. The size threshold for @code{mmap} to be +used is dynamic and gets adjusted according to allocation patterns of the +program. This can also be statically adjusted with @code{mallopt}. The use of +@code{mmap} can also be disabled completely. + +A more detailed technical description of the GNU allocator is maintained in +@theglibc{} wiki. See +@uref{https://sourceware.org/glibc/wiki/MallocInternals}. + @node Unconstrained Allocation @subsection Unconstrained Allocation @cindex unconstrained memory allocation @@ -278,8 +317,6 @@ any time (or never). bigger or smaller. * Allocating Cleared Space:: Use @code{calloc} to allocate a block and clear it. -* Efficiency and Malloc:: Efficiency considerations in use of - these functions. * Aligned Memory Blocks:: Allocating specially aligned memory. * Malloc Tunable Parameters:: Use @code{mallopt} to adjust allocation parameters. @@ -867,59 +904,6 @@ But in general, it is not guaranteed that @code{calloc} calls @code{malloc}/@code{realloc}/@code{free} outside the C library, it should always define @code{calloc}, too. -@node Efficiency and Malloc -@subsubsection Efficiency Considerations for @code{malloc} -@cindex efficiency and @code{malloc} - - - - -@ignore - -@c No longer true, see below instead. -To make the best use of @code{malloc}, it helps to know that the GNU -version of @code{malloc} always dispenses small amounts of memory in -blocks whose sizes are powers of two. It keeps separate pools for each -power of two. This holds for sizes up to a page size. Therefore, if -you are free to choose the size of a small block in order to make -@code{malloc} more efficient, make it a power of two. -@c !!! xref getpagesize - -Once a page is split up for a particular block size, it can't be reused -for another size unless all the blocks in it are freed. In many -programs, this is unlikely to happen. Thus, you can sometimes make a -program use memory more efficiently by using blocks of the same size for -many different purposes. - -When you ask for memory blocks of a page or larger, @code{malloc} uses a -different strategy; it rounds the size up to a multiple of a page, and -it can coalesce and split blocks as needed. - -The reason for the two strategies is that it is important to allocate -and free small blocks as fast as possible, but speed is less important -for a large block since the program normally spends a fair amount of -time using it. Also, large blocks are normally fewer in number. -Therefore, for large blocks, it makes sense to use a method which takes -more time to minimize the wasted space. - -@end ignore - -As opposed to other versions, the @code{malloc} in @theglibc{} -does not round up block sizes to powers of two, neither for large nor -for small sizes. Neighboring chunks can be coalesced on a @code{free} -no matter what their size is. This makes the implementation suitable -for all kinds of allocation patterns without generally incurring high -memory waste through fragmentation. - -Very large blocks (much larger than a page) are allocated with -@code{mmap} (anonymous or via @code{/dev/zero}) by this implementation. -This has the great advantage that these chunks are returned to the -system immediately when they are freed. Therefore, it cannot happen -that a large chunk becomes ``locked'' in between smaller ones and even -after calling @code{free} wastes memory. The size threshold for -@code{mmap} to be used can be adjusted with @code{mallopt}. The use of -@code{mmap} can also be disabled completely. - @node Aligned Memory Blocks @subsubsection Allocating Aligned Memory Blocks @@ -1105,10 +1089,6 @@ parameter to be set, and @var{value} the new value to be set. Possible choices for @var{param}, as defined in @file{malloc.h}, are: @table @code -@comment TODO: @item M_ARENA_MAX -@comment - Document ARENA_MAX env var. -@comment TODO: @item M_ARENA_TEST -@comment - Document ARENA_TEST env var. @comment TODO: @item M_CHECK_ACTION @item M_MMAP_MAX The maximum number of chunks to allocate with @code{mmap}. Setting this @@ -1169,6 +1149,28 @@ value is set statically to the provided input. This parameter can also be set for the process at startup by setting the environment variable @code{MALLOC_TRIM_THRESHOLD_} to the desired value. + +@item M_ARENA_TEST +This parameter specifies the number of arenas that can be created before the +test on the limit to the number of arenas is conducted. The value is ignored if +@code{M_ARENA_MAX} is set. + +The default value of this parameter is 2 on 32-bit systems and 8 on 64-bit +systems. + +This parameter can also be set for the process at startup by setting the +environment variable @code{MALLOC_ARENA_TEST} to the desired value. +@item M_ARENA_MAX +This parameter sets the number of arenas to use regardless of the number of +cores in the system. + +The default value of this tunable is @code{0}, meaning that the limit on the +number of arenas is determined by the number of CPU cores online. For 32-bit +systems the limit is twice the number of cores online and on 64-bit systems, it +is eight times the number of cores online. + +This parameter can also be set for the process at startup by setting the +environment variable @code{MALLOC_ARENA_MAX} to the desired value. @end table @end deftypefun @@ -1511,7 +1513,7 @@ This is the total size of memory allocated with @code{sbrk} by This is the number of chunks not in use. (The memory allocator internally gets chunks of memory from the operating system, and then carves them up to satisfy individual @code{malloc} requests; see -@ref{Efficiency and Malloc}.) +@ref{The GNU allocator}.) @item int smblks This field is unused.