diff mbox

[U-Boot] malloc: work around some memalign fragmentation issues

Message ID 1453755822-28569-1-git-send-email-swarren@wwwdotorg.org
State Accepted
Commit 4f144a416469c6a29127b0656523ae628ea7cbaf
Delegated to: Tom Rini
Headers show

Commit Message

Stephen Warren Jan. 25, 2016, 9:03 p.m. UTC
From: Stephen Warren <swarren@nvidia.com>

Use of memalign can trigger fragmentation issues such as:

// Internally, this needs to find a free block quite bit larger than s.
// Once the free region is found, any unaligned "padding" immediately
// before and after the block is marked free, so that the allocation
// takes only s bytes (plus malloc header overhead).
p = memalign(a, s);
// If there's little fragmentation so far, this allocation is likely
// located immediately after p.
p2 = malloc(x);
free(p);
// In theory, this should return the same value for p. However, the hole
// left by the free() call is only s in size (plus malloc header overhead)
// whereas memalign searches for a larger block in order to guarantee it
// can adjust the returned pointer to the alignment requirements. Hence,
// the pointer returned, if any, won't be p. If there's little or no space
// left after p2, this allocation will fail.
p = memalign(a, s);

In practice, this issue occurs when running the "dfu" command repeatedly
on NVIDIA Tegra boards, since DFU allocates a large 32M data buffer, and
then initializes the USB controller. If this is the first time USB has
been used in the U-Boot session, this causes a probe of the USB driver,
which causes various allocations, including a strdup() of a GPIO name
when requesting the VBUS GPIO. When DFU is torn down, the USB driver
is left probed, and hence its memory is left allocated. If "dfu" is
executed again, allocation of the 32M data buffer fails as described
above.

In practice, there is a memory hole exactly large enough to hold the 32M
data buffer than DFU needs. However, memalign() can't know that in a
general way. Given that, it's particularly annoying that the allocation
fails!

The issue is that memalign() tries to allocate something larger to
guarantee the ability to align the returned pointer. This patch modifies
memalign() so that if the "general case" over-sized allocation fails,
another allocation is attempted, of the exact size the user desired. If
that allocation just happens to be aligned in the way the user wants,
(and in the case described above, it will be, since the free memory
region is located where a previous identical allocation was located),
the pointer can be returned.

This patch is somewhat related to 806bd245b1ab "dfu: don't keep
freeing/reallocating". That patch worked around the issue by removing
repeated free/memalign within a single execution of "dfu". However,
the same technique can't be applied across multiple invocations, since
there's no reason to keep the DFU buffer allocated while DFU isn't
running. This patch addresses the root-cause a bit more directly.

This problem highlights some of the disadvantages of dynamic allocation
and deferred probing of devices.

This patch isn't checkpatch-clean, since it conforms to the existing
coding style in dlmalloc.c, which is different to the rest of U-Boot.

Signed-off-by: Stephen Warren <swarren@nvidia.com>
---
 common/dlmalloc.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

Comments

Tom Rini Jan. 25, 2016, 10:14 p.m. UTC | #1
On Mon, Jan 25, 2016 at 02:03:42PM -0700, Stephen Warren wrote:

> From: Stephen Warren <swarren@nvidia.com>
> 
> Use of memalign can trigger fragmentation issues such as:
> 
> // Internally, this needs to find a free block quite bit larger than s.
> // Once the free region is found, any unaligned "padding" immediately
> // before and after the block is marked free, so that the allocation
> // takes only s bytes (plus malloc header overhead).
> p = memalign(a, s);
> // If there's little fragmentation so far, this allocation is likely
> // located immediately after p.
> p2 = malloc(x);
> free(p);
> // In theory, this should return the same value for p. However, the hole
> // left by the free() call is only s in size (plus malloc header overhead)
> // whereas memalign searches for a larger block in order to guarantee it
> // can adjust the returned pointer to the alignment requirements. Hence,
> // the pointer returned, if any, won't be p. If there's little or no space
> // left after p2, this allocation will fail.
> p = memalign(a, s);
> 
> In practice, this issue occurs when running the "dfu" command repeatedly
> on NVIDIA Tegra boards, since DFU allocates a large 32M data buffer, and
> then initializes the USB controller. If this is the first time USB has
> been used in the U-Boot session, this causes a probe of the USB driver,
> which causes various allocations, including a strdup() of a GPIO name
> when requesting the VBUS GPIO. When DFU is torn down, the USB driver
> is left probed, and hence its memory is left allocated. If "dfu" is
> executed again, allocation of the 32M data buffer fails as described
> above.
> 
> In practice, there is a memory hole exactly large enough to hold the 32M
> data buffer than DFU needs. However, memalign() can't know that in a
> general way. Given that, it's particularly annoying that the allocation
> fails!
> 
> The issue is that memalign() tries to allocate something larger to
> guarantee the ability to align the returned pointer. This patch modifies
> memalign() so that if the "general case" over-sized allocation fails,
> another allocation is attempted, of the exact size the user desired. If
> that allocation just happens to be aligned in the way the user wants,
> (and in the case described above, it will be, since the free memory
> region is located where a previous identical allocation was located),
> the pointer can be returned.
> 
> This patch is somewhat related to 806bd245b1ab "dfu: don't keep
> freeing/reallocating". That patch worked around the issue by removing
> repeated free/memalign within a single execution of "dfu". However,
> the same technique can't be applied across multiple invocations, since
> there's no reason to keep the DFU buffer allocated while DFU isn't
> running. This patch addresses the root-cause a bit more directly.
> 
> This problem highlights some of the disadvantages of dynamic allocation
> and deferred probing of devices.
> 
> This patch isn't checkpatch-clean, since it conforms to the existing
> coding style in dlmalloc.c, which is different to the rest of U-Boot.
> 
> Signed-off-by: Stephen Warren <swarren@nvidia.com>

Reviewed-by: Tom Rini <trini@konsulko.com>
Ɓukasz Majewski Jan. 26, 2016, 8:54 a.m. UTC | #2
Hi Stephen,

> From: Stephen Warren <swarren@nvidia.com>
> 
> Use of memalign can trigger fragmentation issues such as:
> 
> // Internally, this needs to find a free block quite bit larger than
> s. // Once the free region is found, any unaligned "padding"
> immediately // before and after the block is marked free, so that the
> allocation // takes only s bytes (plus malloc header overhead).
> p = memalign(a, s);
> // If there's little fragmentation so far, this allocation is likely
> // located immediately after p.
> p2 = malloc(x);
> free(p);
> // In theory, this should return the same value for p. However, the
> hole // left by the free() call is only s in size (plus malloc header
> overhead) // whereas memalign searches for a larger block in order to
> guarantee it // can adjust the returned pointer to the alignment
> requirements. Hence, // the pointer returned, if any, won't be p. If
> there's little or no space // left after p2, this allocation will
> fail. p = memalign(a, s);
> 
> In practice, this issue occurs when running the "dfu" command
> repeatedly on NVIDIA Tegra boards, since DFU allocates a large 32M
> data buffer, and then initializes the USB controller. If this is the
> first time USB has been used in the U-Boot session, this causes a
> probe of the USB driver, which causes various allocations, including
> a strdup() of a GPIO name when requesting the VBUS GPIO. When DFU is
> torn down, the USB driver is left probed, and hence its memory is
> left allocated. If "dfu" is executed again, allocation of the 32M
> data buffer fails as described above.
> 
> In practice, there is a memory hole exactly large enough to hold the
> 32M data buffer than DFU needs. However, memalign() can't know that
> in a general way. Given that, it's particularly annoying that the
> allocation fails!
> 
> The issue is that memalign() tries to allocate something larger to
> guarantee the ability to align the returned pointer. This patch
> modifies memalign() so that if the "general case" over-sized
> allocation fails, another allocation is attempted, of the exact size
> the user desired. If that allocation just happens to be aligned in
> the way the user wants, (and in the case described above, it will be,
> since the free memory region is located where a previous identical
> allocation was located), the pointer can be returned.
> 
> This patch is somewhat related to 806bd245b1ab "dfu: don't keep
> freeing/reallocating". That patch worked around the issue by removing
> repeated free/memalign within a single execution of "dfu". However,
> the same technique can't be applied across multiple invocations, since
> there's no reason to keep the DFU buffer allocated while DFU isn't
> running. This patch addresses the root-cause a bit more directly.
> 
> This problem highlights some of the disadvantages of dynamic
> allocation and deferred probing of devices.
> 
> This patch isn't checkpatch-clean, since it conforms to the existing
> coding style in dlmalloc.c, which is different to the rest of U-Boot.
> 
> Signed-off-by: Stephen Warren <swarren@nvidia.com>
> ---
>  common/dlmalloc.c | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/common/dlmalloc.c b/common/dlmalloc.c
> index b5bb05191c24..2b964d16b11e 100644
> --- a/common/dlmalloc.c
> +++ b/common/dlmalloc.c
> @@ -2829,6 +2829,28 @@ Void_t* mEMALIGn(alignment, bytes) size_t
> alignment; size_t bytes; nb = request2size(bytes);
>    m  = (char*)(mALLOc(nb + alignment + MINSIZE));
>  
> +  /*
> +  * The attempt to over-allocate (with a size large enough to
> guarantee the
> +  * ability to find an aligned region within allocated memory)
> failed.
> +  *
> +  * Try again, this time only allocating exactly the size the user
> wants. If
> +  * the allocation now succeeds and just happens to be aligned, we
> can still
> +  * fulfill the user's request.
> +  */
> +  if (m == NULL) {
> +    /*
> +     * Use bytes not nb, since mALLOc internally calls request2size
> too, and
> +     * each call increases the size to allocate, to account for the
> header.
> +     */
> +    m  = (char*)(mALLOc(bytes));
> +    /* Aligned -> return it */
> +    if ((((unsigned long)(m)) % alignment) == 0)
> +      return m;
> +    /* Otherwise, fail */
> +    fREe(m);
> +    return NULL;
> +  }
> +
>    if (m == NULL) return NULL; /* propagate failure */
>  
>    p = mem2chunk(m);

U-boot's ./common/dlmalloc.c file is from year 2000 (version 2.6.6).
I'm just wondering if there exists newer version of this code (and can
be easily ported to u-boot).

Despite the above note,

Acked-by: Lukasz Majewski <l.majewski@samsung.com>
Stephen Warren Jan. 26, 2016, 4:27 p.m. UTC | #3
On 01/26/2016 01:54 AM, Lukasz Majewski wrote:
> Hi Stephen,
>
>> From: Stephen Warren <swarren@nvidia.com>
>>
>> Use of memalign can trigger fragmentation issues such as:
>>
>> // Internally, this needs to find a free block quite bit larger than
>> s. // Once the free region is found, any unaligned "padding"
>> immediately // before and after the block is marked free, so that the
>> allocation // takes only s bytes (plus malloc header overhead).
>> p = memalign(a, s);
>> // If there's little fragmentation so far, this allocation is likely
>> // located immediately after p.
>> p2 = malloc(x);
>> free(p);
>> // In theory, this should return the same value for p. However, the
>> hole // left by the free() call is only s in size (plus malloc header
>> overhead) // whereas memalign searches for a larger block in order to
>> guarantee it // can adjust the returned pointer to the alignment
>> requirements. Hence, // the pointer returned, if any, won't be p. If
>> there's little or no space // left after p2, this allocation will
>> fail. p = memalign(a, s);
>>
>> In practice, this issue occurs when running the "dfu" command
>> repeatedly on NVIDIA Tegra boards, since DFU allocates a large 32M
>> data buffer, and then initializes the USB controller. If this is the
>> first time USB has been used in the U-Boot session, this causes a
>> probe of the USB driver, which causes various allocations, including
>> a strdup() of a GPIO name when requesting the VBUS GPIO. When DFU is
>> torn down, the USB driver is left probed, and hence its memory is
>> left allocated. If "dfu" is executed again, allocation of the 32M
>> data buffer fails as described above.
>>
>> In practice, there is a memory hole exactly large enough to hold the
>> 32M data buffer than DFU needs. However, memalign() can't know that
>> in a general way. Given that, it's particularly annoying that the
>> allocation fails!
>>
>> The issue is that memalign() tries to allocate something larger to
>> guarantee the ability to align the returned pointer. This patch
>> modifies memalign() so that if the "general case" over-sized
>> allocation fails, another allocation is attempted, of the exact size
>> the user desired. If that allocation just happens to be aligned in
>> the way the user wants, (and in the case described above, it will be,
>> since the free memory region is located where a previous identical
>> allocation was located), the pointer can be returned.
>>
>> This patch is somewhat related to 806bd245b1ab "dfu: don't keep
>> freeing/reallocating". That patch worked around the issue by removing
>> repeated free/memalign within a single execution of "dfu". However,
>> the same technique can't be applied across multiple invocations, since
>> there's no reason to keep the DFU buffer allocated while DFU isn't
>> running. This patch addresses the root-cause a bit more directly.
>>
>> This problem highlights some of the disadvantages of dynamic
>> allocation and deferred probing of devices.
>>
>> This patch isn't checkpatch-clean, since it conforms to the existing
>> coding style in dlmalloc.c, which is different to the rest of U-Boot.
>>
>> Signed-off-by: Stephen Warren <swarren@nvidia.com>
>> ---
>>   common/dlmalloc.c | 22 ++++++++++++++++++++++
>>   1 file changed, 22 insertions(+)
>>
>> diff --git a/common/dlmalloc.c b/common/dlmalloc.c
>> index b5bb05191c24..2b964d16b11e 100644
>> --- a/common/dlmalloc.c
>> +++ b/common/dlmalloc.c
>> @@ -2829,6 +2829,28 @@ Void_t* mEMALIGn(alignment, bytes) size_t
>> alignment; size_t bytes; nb = request2size(bytes);
>>     m  = (char*)(mALLOc(nb + alignment + MINSIZE));
>>
>> +  /*
>> +  * The attempt to over-allocate (with a size large enough to
>> guarantee the
>> +  * ability to find an aligned region within allocated memory)
>> failed.
>> +  *
>> +  * Try again, this time only allocating exactly the size the user
>> wants. If
>> +  * the allocation now succeeds and just happens to be aligned, we
>> can still
>> +  * fulfill the user's request.
>> +  */
>> +  if (m == NULL) {
>> +    /*
>> +     * Use bytes not nb, since mALLOc internally calls request2size
>> too, and
>> +     * each call increases the size to allocate, to account for the
>> header.
>> +     */
>> +    m  = (char*)(mALLOc(bytes));
>> +    /* Aligned -> return it */
>> +    if ((((unsigned long)(m)) % alignment) == 0)
>> +      return m;
>> +    /* Otherwise, fail */
>> +    fREe(m);
>> +    return NULL;
>> +  }
>> +
>>     if (m == NULL) return NULL; /* propagate failure */
>>
>>     p = mem2chunk(m);
>
> U-boot's ./common/dlmalloc.c file is from year 2000 (version 2.6.6).
> I'm just wondering if there exists newer version of this code (and can
> be easily ported to u-boot).

There certainly are newer versions:

http://gee.cs.oswego.edu/dl/html/malloc.html

The changelog in the source doesn't appear to address issues such as 
this one. Equally, a very brief inspection of the latest implementation 
of memalign() seems like it would suffer from the same issue. I don't 
know if there are any other internal changes (e.g. perhaps bucketing 
free areas by size/alignment or similar) that might avoid the issue.
Tom Rini Jan. 26, 2016, 11:26 p.m. UTC | #4
On Tue, Jan 26, 2016 at 09:27:02AM -0700, Stephen Warren wrote:
> On 01/26/2016 01:54 AM, Lukasz Majewski wrote:
> >Hi Stephen,
> >
> >>From: Stephen Warren <swarren@nvidia.com>
> >>
> >>Use of memalign can trigger fragmentation issues such as:
> >>
> >>// Internally, this needs to find a free block quite bit larger than
> >>s. // Once the free region is found, any unaligned "padding"
> >>immediately // before and after the block is marked free, so that the
> >>allocation // takes only s bytes (plus malloc header overhead).
> >>p = memalign(a, s);
> >>// If there's little fragmentation so far, this allocation is likely
> >>// located immediately after p.
> >>p2 = malloc(x);
> >>free(p);
> >>// In theory, this should return the same value for p. However, the
> >>hole // left by the free() call is only s in size (plus malloc header
> >>overhead) // whereas memalign searches for a larger block in order to
> >>guarantee it // can adjust the returned pointer to the alignment
> >>requirements. Hence, // the pointer returned, if any, won't be p. If
> >>there's little or no space // left after p2, this allocation will
> >>fail. p = memalign(a, s);
> >>
> >>In practice, this issue occurs when running the "dfu" command
> >>repeatedly on NVIDIA Tegra boards, since DFU allocates a large 32M
> >>data buffer, and then initializes the USB controller. If this is the
> >>first time USB has been used in the U-Boot session, this causes a
> >>probe of the USB driver, which causes various allocations, including
> >>a strdup() of a GPIO name when requesting the VBUS GPIO. When DFU is
> >>torn down, the USB driver is left probed, and hence its memory is
> >>left allocated. If "dfu" is executed again, allocation of the 32M
> >>data buffer fails as described above.
> >>
> >>In practice, there is a memory hole exactly large enough to hold the
> >>32M data buffer than DFU needs. However, memalign() can't know that
> >>in a general way. Given that, it's particularly annoying that the
> >>allocation fails!
> >>
> >>The issue is that memalign() tries to allocate something larger to
> >>guarantee the ability to align the returned pointer. This patch
> >>modifies memalign() so that if the "general case" over-sized
> >>allocation fails, another allocation is attempted, of the exact size
> >>the user desired. If that allocation just happens to be aligned in
> >>the way the user wants, (and in the case described above, it will be,
> >>since the free memory region is located where a previous identical
> >>allocation was located), the pointer can be returned.
> >>
> >>This patch is somewhat related to 806bd245b1ab "dfu: don't keep
> >>freeing/reallocating". That patch worked around the issue by removing
> >>repeated free/memalign within a single execution of "dfu". However,
> >>the same technique can't be applied across multiple invocations, since
> >>there's no reason to keep the DFU buffer allocated while DFU isn't
> >>running. This patch addresses the root-cause a bit more directly.
> >>
> >>This problem highlights some of the disadvantages of dynamic
> >>allocation and deferred probing of devices.
> >>
> >>This patch isn't checkpatch-clean, since it conforms to the existing
> >>coding style in dlmalloc.c, which is different to the rest of U-Boot.
> >>
> >>Signed-off-by: Stephen Warren <swarren@nvidia.com>
> >>---
> >>  common/dlmalloc.c | 22 ++++++++++++++++++++++
> >>  1 file changed, 22 insertions(+)
> >>
> >>diff --git a/common/dlmalloc.c b/common/dlmalloc.c
> >>index b5bb05191c24..2b964d16b11e 100644
> >>--- a/common/dlmalloc.c
> >>+++ b/common/dlmalloc.c
> >>@@ -2829,6 +2829,28 @@ Void_t* mEMALIGn(alignment, bytes) size_t
> >>alignment; size_t bytes; nb = request2size(bytes);
> >>    m  = (char*)(mALLOc(nb + alignment + MINSIZE));
> >>
> >>+  /*
> >>+  * The attempt to over-allocate (with a size large enough to
> >>guarantee the
> >>+  * ability to find an aligned region within allocated memory)
> >>failed.
> >>+  *
> >>+  * Try again, this time only allocating exactly the size the user
> >>wants. If
> >>+  * the allocation now succeeds and just happens to be aligned, we
> >>can still
> >>+  * fulfill the user's request.
> >>+  */
> >>+  if (m == NULL) {
> >>+    /*
> >>+     * Use bytes not nb, since mALLOc internally calls request2size
> >>too, and
> >>+     * each call increases the size to allocate, to account for the
> >>header.
> >>+     */
> >>+    m  = (char*)(mALLOc(bytes));
> >>+    /* Aligned -> return it */
> >>+    if ((((unsigned long)(m)) % alignment) == 0)
> >>+      return m;
> >>+    /* Otherwise, fail */
> >>+    fREe(m);
> >>+    return NULL;
> >>+  }
> >>+
> >>    if (m == NULL) return NULL; /* propagate failure */
> >>
> >>    p = mem2chunk(m);
> >
> >U-boot's ./common/dlmalloc.c file is from year 2000 (version 2.6.6).
> >I'm just wondering if there exists newer version of this code (and can
> >be easily ported to u-boot).
> 
> There certainly are newer versions:
> 
> http://gee.cs.oswego.edu/dl/html/malloc.html
> 
> The changelog in the source doesn't appear to address issues such as
> this one. Equally, a very brief inspection of the latest
> implementation of memalign() seems like it would suffer from the
> same issue. I don't know if there are any other internal changes
> (e.g. perhaps bucketing free areas by size/alignment or similar)
> that might avoid the issue.

I've put it on my TODO list to at least examine an upgrade more
completely but no, it's not in the listed features and we've done a bit
of enhancing and feature adding ourselves over the years.
Tom Rini Feb. 2, 2016, 1:57 a.m. UTC | #5
On Mon, Jan 25, 2016 at 02:03:42PM -0700, Stephen Warren wrote:

> From: Stephen Warren <swarren@nvidia.com>
> 
> Use of memalign can trigger fragmentation issues such as:
> 
> // Internally, this needs to find a free block quite bit larger than s.
> // Once the free region is found, any unaligned "padding" immediately
> // before and after the block is marked free, so that the allocation
> // takes only s bytes (plus malloc header overhead).
> p = memalign(a, s);
> // If there's little fragmentation so far, this allocation is likely
> // located immediately after p.
> p2 = malloc(x);
> free(p);
> // In theory, this should return the same value for p. However, the hole
> // left by the free() call is only s in size (plus malloc header overhead)
> // whereas memalign searches for a larger block in order to guarantee it
> // can adjust the returned pointer to the alignment requirements. Hence,
> // the pointer returned, if any, won't be p. If there's little or no space
> // left after p2, this allocation will fail.
> p = memalign(a, s);
> 
> In practice, this issue occurs when running the "dfu" command repeatedly
> on NVIDIA Tegra boards, since DFU allocates a large 32M data buffer, and
> then initializes the USB controller. If this is the first time USB has
> been used in the U-Boot session, this causes a probe of the USB driver,
> which causes various allocations, including a strdup() of a GPIO name
> when requesting the VBUS GPIO. When DFU is torn down, the USB driver
> is left probed, and hence its memory is left allocated. If "dfu" is
> executed again, allocation of the 32M data buffer fails as described
> above.
> 
> In practice, there is a memory hole exactly large enough to hold the 32M
> data buffer than DFU needs. However, memalign() can't know that in a
> general way. Given that, it's particularly annoying that the allocation
> fails!
> 
> The issue is that memalign() tries to allocate something larger to
> guarantee the ability to align the returned pointer. This patch modifies
> memalign() so that if the "general case" over-sized allocation fails,
> another allocation is attempted, of the exact size the user desired. If
> that allocation just happens to be aligned in the way the user wants,
> (and in the case described above, it will be, since the free memory
> region is located where a previous identical allocation was located),
> the pointer can be returned.
> 
> This patch is somewhat related to 806bd245b1ab "dfu: don't keep
> freeing/reallocating". That patch worked around the issue by removing
> repeated free/memalign within a single execution of "dfu". However,
> the same technique can't be applied across multiple invocations, since
> there's no reason to keep the DFU buffer allocated while DFU isn't
> running. This patch addresses the root-cause a bit more directly.
> 
> This problem highlights some of the disadvantages of dynamic allocation
> and deferred probing of devices.
> 
> This patch isn't checkpatch-clean, since it conforms to the existing
> coding style in dlmalloc.c, which is different to the rest of U-Boot.
> 
> Signed-off-by: Stephen Warren <swarren@nvidia.com>
> Reviewed-by: Tom Rini <trini@konsulko.com>
> Acked-by: Lukasz Majewski <l.majewski@samsung.com>

Applied to u-boot/master, thanks!
diff mbox

Patch

diff --git a/common/dlmalloc.c b/common/dlmalloc.c
index b5bb05191c24..2b964d16b11e 100644
--- a/common/dlmalloc.c
+++ b/common/dlmalloc.c
@@ -2829,6 +2829,28 @@  Void_t* mEMALIGn(alignment, bytes) size_t alignment; size_t bytes;
   nb = request2size(bytes);
   m  = (char*)(mALLOc(nb + alignment + MINSIZE));
 
+  /*
+  * The attempt to over-allocate (with a size large enough to guarantee the
+  * ability to find an aligned region within allocated memory) failed.
+  *
+  * Try again, this time only allocating exactly the size the user wants. If
+  * the allocation now succeeds and just happens to be aligned, we can still
+  * fulfill the user's request.
+  */
+  if (m == NULL) {
+    /*
+     * Use bytes not nb, since mALLOc internally calls request2size too, and
+     * each call increases the size to allocate, to account for the header.
+     */
+    m  = (char*)(mALLOc(bytes));
+    /* Aligned -> return it */
+    if ((((unsigned long)(m)) % alignment) == 0)
+      return m;
+    /* Otherwise, fail */
+    fREe(m);
+    return NULL;
+  }
+
   if (m == NULL) return NULL; /* propagate failure */
 
   p = mem2chunk(m);