malloc: Fix missing accounting of top chunk in malloc_info [BZ #24026]
diff mbox series

Message ID 87blx9hyo7.fsf@oldenburg2.str.redhat.com
State New
Headers show
Series
  • malloc: Fix missing accounting of top chunk in malloc_info [BZ #24026]
Related show

Commit Message

Florian Weimer Aug. 1, 2019, 12:12 p.m. UTC
Author: Niklas Hambüchen <mail@nh2.me>

Fixes `<total type="rest" size="..."> incorrectly showing as 0 most
of the time.

The rest value being wrong is significant because to compute the
actual amount of memory handed out via malloc, the user must subtract
it from <system type="current" size="...">. That result being wrong
makes investigating memory fragmentation issues like
https://bugzilla.redhat.com/show_bug.cgi?id=843478 close to
impossible.

[Patch from Bugzilla.  Reformatted for GNU style.]

2019-08-01  Niklas Hambüchen  <mail@nh2.me>

	[BZ #24026]
	* malloc/malloc.c (__malloc_info): Account for top chunk.

Comments

Carlos O'Donell Aug. 1, 2019, 3:54 p.m. UTC | #1
On 8/1/19 8:12 AM, Florian Weimer wrote:
> Author: Niklas Hambüchen <mail@nh2.me>
> 
> Fixes `<total type="rest" size="..."> incorrectly showing as 0 most
> of the time.
> 
> The rest value being wrong is significant because to compute the
> actual amount of memory handed out via malloc, the user must subtract
> it from <system type="current" size="...">. That result being wrong
> makes investigating memory fragmentation issues like
> https://bugzilla.redhat.com/show_bug.cgi?id=843478 close to
> impossible.
> 
> [Patch from Bugzilla.  Reformatted for GNU style.]
> 
> 2019-08-01  Niklas Hambüchen  <mail@nh2.me>
> 
> 	[BZ #24026]
> 	* malloc/malloc.c (__malloc_info): Account for top chunk.
> 
> diff --git a/malloc/malloc.c b/malloc/malloc.c
> index 00ce48cf58..1083cd3ef2 100644
> --- a/malloc/malloc.c
> +++ b/malloc/malloc.c
> @@ -5406,6 +5406,10 @@ __malloc_info (int options, FILE *fp)
>   
>         __libc_lock_lock (ar_ptr->mutex);
>   
> +      /* Account for top chunk.  */
> +      avail = chunksize (ar_ptr->top);
> +      nblocks = 1;  /* Top always exists.  */
> +
>         for (size_t i = 0; i < NFASTBINS; ++i)
>   	{
>   	  mchunkptr p = fastbin (ar_ptr, i);
> 

The malloc accounting is quiet complex.

To review this I'd basically have to do a bunch of debugging on
my own to prove this is correct since your proposed patch doesn't
have the required details to validate the change.

Can you post some analysis of a trivial heap and show how this is
a correct fix with before and after numbers?

Can we get a test case that has a trivial setup that shows the fix
is working and in place?
Florian Weimer Aug. 1, 2019, 4:36 p.m. UTC | #2
* Carlos O'Donell:

> On 8/1/19 8:12 AM, Florian Weimer wrote:
>> Author: Niklas Hambüchen <mail@nh2.me>
>>
>> Fixes `<total type="rest" size="..."> incorrectly showing as 0 most
>> of the time.
>>
>> The rest value being wrong is significant because to compute the
>> actual amount of memory handed out via malloc, the user must subtract
>> it from <system type="current" size="...">. That result being wrong
>> makes investigating memory fragmentation issues like
>> https://bugzilla.redhat.com/show_bug.cgi?id=843478 close to
>> impossible.
>>
>> [Patch from Bugzilla.  Reformatted for GNU style.]
>>
>> 2019-08-01  Niklas Hambüchen  <mail@nh2.me>
>>
>> 	[BZ #24026]
>> 	* malloc/malloc.c (__malloc_info): Account for top chunk.
>>
>> diff --git a/malloc/malloc.c b/malloc/malloc.c
>> index 00ce48cf58..1083cd3ef2 100644
>> --- a/malloc/malloc.c
>> +++ b/malloc/malloc.c
>> @@ -5406,6 +5406,10 @@ __malloc_info (int options, FILE *fp)
>>           __libc_lock_lock (ar_ptr->mutex);
>>   +      /* Account for top chunk.  */
>> +      avail = chunksize (ar_ptr->top);
>> +      nblocks = 1;  /* Top always exists.  */
>> +
>>         for (size_t i = 0; i < NFASTBINS; ++i)
>>   	{
>>   	  mchunkptr p = fastbin (ar_ptr, i);
>>
>
> The malloc accounting is quiet complex.
>
> To review this I'd basically have to do a bunch of debugging on
> my own to prove this is correct since your proposed patch doesn't
> have the required details to validate the change.
>
> Can you post some analysis of a trivial heap and show how this is
> a correct fix with before and after numbers?

I think this matches what I saw when I wrote the Emacs heap rewriter.

DJ's writeup also says this:

| Each arena keeps track of a special "top" chunk, which is typically the
| biggest available chunk, and also refers to the most recently allocated
| heap.
 
<https://sourceware.org/glibc/wiki/MallocInternals>

malloc_stats has the same logic.

All this is why I think it's the right fix.

Before the fix, I see:

$ grep rest malloc/tst-malloc_info.out
<total type="rest" count="0" size="0"/>
<total type="rest" count="0" size="0"/>
<total type="rest" count="0" size="0"/>
<total type="rest" count="0" size="0"/>
<total type="rest" count="0" size="0"/>
<total type="rest" count="0" size="0"/>
<total type="rest" count="0" size="0"/>
<total type="rest" count="1" size="657"/>
<total type="rest" count="1" size="657"/>
<total type="rest" count="1" size="657"/>
<total type="rest" count="1" size="657"/>
<total type="rest" count="4" size="2628"/>

After:

$ grep rest malloc/tst-malloc_info.out
<total type="rest" count="1" size="133360"/>
<total type="rest" count="1" size="2656"/>
<total type="rest" count="1" size="2656"/>
<total type="rest" count="1" size="2656"/>
<total type="rest" count="1" size="2656"/>
<total type="rest" count="5" size="143984"/>
<total type="rest" count="1" size="133360"/>
<total type="rest" count="2" size="3313"/>
<total type="rest" count="2" size="3313"/>
<total type="rest" count="2" size="3313"/>
<total type="rest" count="2" size="3313"/>
<total type="rest" count="9" size="146612"/>

These numbers look much more reasonable to me.  (The test is inherently
racy, so there is no exact match.)

Writing a test case for this would unfortunately need an XML parser.
I don't know if we can use the one that comes with Python.

Thanks,
Florian
Carlos O'Donell Aug. 2, 2019, 5:49 p.m. UTC | #3
On 8/1/19 12:36 PM, Florian Weimer wrote:
> * Carlos O'Donell:
> 
>> On 8/1/19 8:12 AM, Florian Weimer wrote:
>>> Author: Niklas Hambüchen <mail@nh2.me>
>>>
>>> Fixes `<total type="rest" size="..."> incorrectly showing as 0 most
>>> of the time.
>>>
>>> The rest value being wrong is significant because to compute the
>>> actual amount of memory handed out via malloc, the user must subtract
>>> it from <system type="current" size="...">. That result being wrong
>>> makes investigating memory fragmentation issues like
>>> https://bugzilla.redhat.com/show_bug.cgi?id=843478 close to
>>> impossible.
>>>
>>> [Patch from Bugzilla.  Reformatted for GNU style.]
>>>
>>> 2019-08-01  Niklas Hambüchen  <mail@nh2.me>
>>>
>>> 	[BZ #24026]
>>> 	* malloc/malloc.c (__malloc_info): Account for top chunk.
>>>
>>> diff --git a/malloc/malloc.c b/malloc/malloc.c
>>> index 00ce48cf58..1083cd3ef2 100644
>>> --- a/malloc/malloc.c
>>> +++ b/malloc/malloc.c
>>> @@ -5406,6 +5406,10 @@ __malloc_info (int options, FILE *fp)
>>>            __libc_lock_lock (ar_ptr->mutex);
>>>    +      /* Account for top chunk.  */
>>> +      avail = chunksize (ar_ptr->top);
>>> +      nblocks = 1;  /* Top always exists.  */
>>> +
>>>          for (size_t i = 0; i < NFASTBINS; ++i)
>>>    	{
>>>    	  mchunkptr p = fastbin (ar_ptr, i);
>>>
>>
>> The malloc accounting is quiet complex.
>>
>> To review this I'd basically have to do a bunch of debugging on
>> my own to prove this is correct since your proposed patch doesn't
>> have the required details to validate the change.
>>
>> Can you post some analysis of a trivial heap and show how this is
>> a correct fix with before and after numbers?
> 
> I think this matches what I saw when I wrote the Emacs heap rewriter.
> 
> DJ's writeup also says this:
> 
> | Each arena keeps track of a special "top" chunk, which is typically the
> | biggest available chunk, and also refers to the most recently allocated
> | heap.
>   
> <https://sourceware.org/glibc/wiki/MallocInternals>
> 
> malloc_stats has the same logic.
> 
> All this is why I think it's the right fix.
> 
> Before the fix, I see:
> 
> $ grep rest malloc/tst-malloc_info.out
> <total type="rest" count="0" size="0"/>
> <total type="rest" count="0" size="0"/>
> <total type="rest" count="0" size="0"/>
> <total type="rest" count="0" size="0"/>
> <total type="rest" count="0" size="0"/>
> <total type="rest" count="0" size="0"/>
> <total type="rest" count="0" size="0"/>
> <total type="rest" count="1" size="657"/>
> <total type="rest" count="1" size="657"/>
> <total type="rest" count="1" size="657"/>
> <total type="rest" count="1" size="657"/>
> <total type="rest" count="4" size="2628"/>
> 
> After:
> 
> $ grep rest malloc/tst-malloc_info.out
> <total type="rest" count="1" size="133360"/>
> <total type="rest" count="1" size="2656"/>
> <total type="rest" count="1" size="2656"/>
> <total type="rest" count="1" size="2656"/>
> <total type="rest" count="1" size="2656"/>
> <total type="rest" count="5" size="143984"/>
> <total type="rest" count="1" size="133360"/>
> <total type="rest" count="2" size="3313"/>
> <total type="rest" count="2" size="3313"/>
> <total type="rest" count="2" size="3313"/>
> <total type="rest" count="2" size="3313"/>
> <total type="rest" count="9" size="146612"/>

I agree with you that this looks correct, but I would
really have expected the top chunk to be in one of
the counted bins if it was actually free. However, it
turns out I'm wrong. The code and logic treat top
as a chunk which is no bin. And even when we expand
the heap we have to manually free the old top, and only
at that point does it enter into a bin, because the new
top on the new heap is now the special top.

Please repost with a comment to that effect added:

/* The top-most available chunk is treated specially
    and is never in any bin. See "initial_top" comments.  */

> These numbers look much more reasonable to me.  (The test is inherently
> racy, so there is no exact match.)

The test can be made non-racy in a test case that steps
the threads through a sequence of operations using other
synchronization primitives.
  
> Writing a test case for this would unfortunately need an XML parser.
> I don't know if we can use the one that comes with Python.

I don't see why we can't use 'import xml' to use the language
provided XML parser.
Joseph Myers Aug. 2, 2019, 7:53 p.m. UTC | #4
On Fri, 2 Aug 2019, Carlos O'Donell wrote:

> > Writing a test case for this would unfortunately need an XML parser.
> > I don't know if we can use the one that comes with Python.
> 
> I don't see why we can't use 'import xml' to use the language
> provided XML parser.

It's the warning at 
<https://sourceware.org/glibc/wiki/Style_and_Conventions#Python_usage_conventions> 
about avoiding standard library modules depending on external libraries, 
to assist in bootstrap etc. configurations (bringing up native builds on a 
new system which might start without a full Python install).  However, (a) 
that's only for the build of glibc, testing glibc might use such modules 
provided it's appropriately conditional so tests end up as UNSUPPORTED if 
they are unavailable, and (b) the Python documentation says "The Expat 
parser is included with Python, so the xml.parsers.expat module will 
always be available.", which indicates that shouldn't be considered as 
depending on an external library anyway.
Florian Weimer Aug. 7, 2019, 2:27 p.m. UTC | #5
* Carlos O'Donell:

> I agree with you that this looks correct, but I would
> really have expected the top chunk to be in one of
> the counted bins if it was actually free. However, it
> turns out I'm wrong. The code and logic treat top
> as a chunk which is no bin. And even when we expand
> the heap we have to manually free the old top, and only
> at that point does it enter into a bin, because the new
> top on the new heap is now the special top.
>
> Please repost with a comment to that effect added:
>
> /* The top-most available chunk is treated specially
>    and is never in any bin. See "initial_top" comments.  */

Please see below.

Thanks,
Florian

From: Niklas Hambüchen <mail@nh2.me>
Subject: malloc: Fix missing accounting of top chunk in malloc_info [BZ #24026]

Fixes `<total type="rest" size="..."> incorrectly showing as 0 most
of the time.

The rest value being wrong is significant because to compute the
actual amount of memory handed out via malloc, the user must subtract
it from <system type="current" size="...">. That result being wrong
makes investigating memory fragmentation issues like
<https://bugzilla.redhat.com/show_bug.cgi?id=843478> close to
impossible.

2019-08-07  Niklas Hambüchen  <mail@nh2.me>
	    Carlos O'Donell  <carlos@redhat.com>

	[BZ #24026]
	* malloc/malloc.c (__malloc_info): Account for top chunk.

diff --git a/malloc/malloc.c b/malloc/malloc.c
index 343d89f489..0e65d636cd 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -5406,6 +5406,12 @@ __malloc_info (int options, FILE *fp)
 
       __libc_lock_lock (ar_ptr->mutex);
 
+      /* Account for top chunk.  The top-most available chunk is
+	 treated specially and is never in any bin. See "initial_top"
+	 comments.  */
+      avail = chunksize (ar_ptr->top);
+      nblocks = 1;  /* Top always exists.  */
+
       for (size_t i = 0; i < NFASTBINS; ++i)
 	{
 	  mchunkptr p = fastbin (ar_ptr, i);
Carlos O'Donell Aug. 8, 2019, 6:21 p.m. UTC | #6
On 8/7/19 10:27 AM, Florian Weimer wrote:
> * Carlos O'Donell:
> 
>> I agree with you that this looks correct, but I would
>> really have expected the top chunk to be in one of
>> the counted bins if it was actually free. However, it
>> turns out I'm wrong. The code and logic treat top
>> as a chunk which is no bin. And even when we expand
>> the heap we have to manually free the old top, and only
>> at that point does it enter into a bin, because the new
>> top on the new heap is now the special top.
>>
>> Please repost with a comment to that effect added:
>>
>> /* The top-most available chunk is treated specially
>>     and is never in any bin. See "initial_top" comments.  */
> 
> Please see below.

LGTM.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

> Thanks,
> Florian
> 
> From: Niklas Hambüchen <mail@nh2.me>
> Subject: malloc: Fix missing accounting of top chunk in malloc_info [BZ #24026]
> 
> Fixes `<total type="rest" size="..."> incorrectly showing as 0 most
> of the time.
> 
> The rest value being wrong is significant because to compute the
> actual amount of memory handed out via malloc, the user must subtract
> it from <system type="current" size="...">. That result being wrong
> makes investigating memory fragmentation issues like
> <https://bugzilla.redhat.com/show_bug.cgi?id=843478> close to
> impossible.
> 
> 2019-08-07  Niklas Hambüchen  <mail@nh2.me>
> 	    Carlos O'Donell  <carlos@redhat.com>
> 
> 	[BZ #24026]
> 	* malloc/malloc.c (__malloc_info): Account for top chunk.
> 
> diff --git a/malloc/malloc.c b/malloc/malloc.c
> index 343d89f489..0e65d636cd 100644
> --- a/malloc/malloc.c
> +++ b/malloc/malloc.c
> @@ -5406,6 +5406,12 @@ __malloc_info (int options, FILE *fp)
>   
>         __libc_lock_lock (ar_ptr->mutex);
>   
> +      /* Account for top chunk.  The top-most available chunk is
> +	 treated specially and is never in any bin. See "initial_top"
> +	 comments.  */
> +      avail = chunksize (ar_ptr->top);
> +      nblocks = 1;  /* Top always exists.  */
> +
>         for (size_t i = 0; i < NFASTBINS; ++i)
>   	{
>   	  mchunkptr p = fastbin (ar_ptr, i);
>
Niklas Hambüchen Aug. 12, 2019, 3:11 a.m. UTC | #7
Thanks everyone for helping with this!

> Fixes `<total type="rest" size="..."> incorrectly showing as 0 most
> of the time.

As a next thing, we probably want to update the example in
`man malloc_info`, which has been showing

    <total type="rest" count="0" size="0"/>

ever since (which was also what surprised me that nobody had found it before),
given that it should be extremely unlikely that rest is 0.

Niklas

Patch
diff mbox series

diff --git a/malloc/malloc.c b/malloc/malloc.c
index 00ce48cf58..1083cd3ef2 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -5406,6 +5406,10 @@  __malloc_info (int options, FILE *fp)
 
       __libc_lock_lock (ar_ptr->mutex);
 
+      /* Account for top chunk.  */
+      avail = chunksize (ar_ptr->top);
+      nblocks = 1;  /* Top always exists.  */
+
       for (size_t i = 0; i < NFASTBINS; ++i)
 	{
 	  mchunkptr p = fastbin (ar_ptr, i);