diff mbox series

Minor optimisations in operator new(size_t, align_val_t)

Message ID 20180813190027.GA10327@redhat.com
State New
Headers show
Series Minor optimisations in operator new(size_t, align_val_t) | expand

Commit Message

Jonathan Wakely Aug. 13, 2018, 7 p.m. UTC
Thanks to Lars for the suggestions.

	* libsupc++/new_opa.cc (operator new(size_t, align_val_t)): Use
	__is_pow2 to check for valid alignment. Avoid branching when rounding
	size to multiple of alignment.

Tested x86_64-linux, committed to trunk.
commit 766770a4f8692ef5fda2d1f987054864efb1c3a8
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Mon Aug 13 13:56:04 2018 +0100

    Minor optimisations in operator new(size_t, align_val_t)
    
            * libsupc++/new_opa.cc (operator new(size_t, align_val_t)): Use
            __is_pow2 to check for valid alignment. Avoid branching when rounding
            size to multiple of alignment.

Comments

Marc Glisse Aug. 17, 2018, 5:28 p.m. UTC | #1
On Mon, 13 Aug 2018, Jonathan Wakely wrote:

> Thanks to Lars for the suggestions.
>
> 	* libsupc++/new_opa.cc (operator new(size_t, align_val_t)): Use
> 	__is_pow2 to check for valid alignment. Avoid branching when rounding
> 	size to multiple of alignment.
>
> Tested x86_64-linux, committed to trunk.

Are you getting better code with __is_pow2 on many platforms? As far as I 
can tell from a quick look at the patch (I didn't actually test it, I 
could be completely off), this replaces (x&(x-1))==0 with popcount(x)==1. 
On a basic x86_64, popcount calls into libgcc, which doesn't seem so good. 
On a more recent x86_64 (BMI1), x&(x-1) is a single instruction that sets 
a flag when the result is 0, that's hard to beat.

Or was the goal to accept an alignment of 0, and not an optimization?


> +  sz = (sz + align - 1) & ~(align - 1);

Note that gcc immediately replaces ~(align - 1) with -align. It does it 
even if you compute align-1 on the previous line and write (sz+X)&~X in 
the hope of sharing the subtraction.
Jonathan Wakely Aug. 17, 2018, 5:52 p.m. UTC | #2
On 17/08/18 19:28 +0200, Marc Glisse wrote:
>On Mon, 13 Aug 2018, Jonathan Wakely wrote:
>
>>Thanks to Lars for the suggestions.
>>
>>	* libsupc++/new_opa.cc (operator new(size_t, align_val_t)): Use
>>	__is_pow2 to check for valid alignment. Avoid branching when rounding
>>	size to multiple of alignment.
>>
>>Tested x86_64-linux, committed to trunk.
>
>Are you getting better code with __is_pow2 on many platforms? As far 
>as I can tell from a quick look at the patch (I didn't actually test 
>it, I could be completely off), this replaces (x&(x-1))==0 with 
>popcount(x)==1. On a basic x86_64, popcount calls into libgcc, which 
>doesn't seem so good. On a more recent x86_64 (BMI1), x&(x-1) is a 
>single instruction that sets a flag when the result is 0, that's hard 
>to beat.

Then shouldn't we do that in __ispow2?

Even better would be a peephole optimisation to turn
__builtin_popcount(x)==1 into that.

>Or was the goal to accept an alignment of 0, and not an optimization?

Accepting alignment of 0 isn't the goal :-)

std::ispow2 should be the best way to check if an unsigned integer is
a power of two, so I wanted to use that instead of manual bit
twiddling.

I hope that check will go away soon, if the compiler starts checking
for valid alignments at the call site. (That won't catch all misuses,
as there could be calls with non-constants, but we can't make it
completely foolproof, some people just deserve to get UB!)

>>+  sz = (sz + align - 1) & ~(align - 1);
>
>Note that gcc immediately replaces ~(align - 1) with -align. It does 
>it even if you compute align-1 on the previous line and write 
>(sz+X)&~X in the hope of sharing the subtraction.

The goal there was to replace the branch for the 'if' and just do the
adjustment unconditionally.
Marc Glisse Aug. 17, 2018, 6:31 p.m. UTC | #3
On Fri, 17 Aug 2018, Jonathan Wakely wrote:

> On 17/08/18 19:28 +0200, Marc Glisse wrote:
>> On Mon, 13 Aug 2018, Jonathan Wakely wrote:
>> 
>>> Thanks to Lars for the suggestions.
>>>
>>> 	* libsupc++/new_opa.cc (operator new(size_t, align_val_t)): Use
>>> 	__is_pow2 to check for valid alignment. Avoid branching when rounding
>>> 	size to multiple of alignment.
>>> 
>>> Tested x86_64-linux, committed to trunk.
>> 
>> Are you getting better code with __is_pow2 on many platforms? As far as I 
>> can tell from a quick look at the patch (I didn't actually test it, I could 
>> be completely off), this replaces (x&(x-1))==0 with popcount(x)==1. On a 
>> basic x86_64, popcount calls into libgcc, which doesn't seem so good. On a 
>> more recent x86_64 (BMI1), x&(x-1) is a single instruction that sets a flag 
>> when the result is 0, that's hard to beat.
>
> Then shouldn't we do that in __ispow2?
>
> Even better would be a peephole optimisation to turn
> __builtin_popcount(x)==1 into that.

There is the complication of how to handle 0. For x=0, ispow2(x) is false 
while (x&(x-1))==0 is true. So the transformation corresponds to 
__builtin_popcount(x)<=1, and for ==1 we need more complications unless we 
somehow know that x cannot be 0.

>> Or was the goal to accept an alignment of 0, and not an optimization?
>
> Accepting alignment of 0 isn't the goal :-)
>
> std::ispow2 should be the best way to check if an unsigned integer is
> a power of two, so I wanted to use that instead of manual bit
> twiddling.

Makes sense. The best way to test this may not be the same if we know that 
the number cannot be 0, but I guess that if we don't find a better way it 
would be possible to use __builtin_constant_p to select between x&(x-1) 
and popcount==1. Using range information to enable the compiler 
transformation is also possible. Both may require an explicit if(align==0) 
in operator new (whether it replaces align with another value or calls 
__builtin_unreachable() probably doesn't matter).

> I hope that check will go away soon,

In that case, please ignore my comments on the speed of the check.
diff mbox series

Patch

diff --git a/libstdc++-v3/libsupc++/new_opa.cc b/libstdc++-v3/libsupc++/new_opa.cc
index aa3e5dc4ce5..abb7451fafe 100644
--- a/libstdc++-v3/libsupc++/new_opa.cc
+++ b/libstdc++-v3/libsupc++/new_opa.cc
@@ -27,6 +27,7 @@ 
 #include <stdlib.h>
 #include <stdint.h>
 #include <bits/exception_defines.h>
+#include <bit>
 #include "new"
 
 #if !_GLIBCXX_HAVE_ALIGNED_ALLOC && !_GLIBCXX_HAVE__ALIGNED_MALLOC \
@@ -105,7 +106,7 @@  operator new (std::size_t sz, std::align_val_t al)
 
   /* Alignment must be a power of two.  */
   /* XXX This should be checked by the compiler (PR 86878).  */
-  if (__builtin_expect (align & (align - 1), false))
+  if (__builtin_expect (!std::__ispow2(align), false))
     _GLIBCXX_THROW_OR_ABORT(bad_alloc());
 
   /* malloc (0) is unpredictable; avoid it.  */
@@ -120,8 +121,7 @@  operator new (std::size_t sz, std::align_val_t al)
     align = sizeof(void*);
 # endif
   /* C11: the value of size shall be an integral multiple of alignment.  */
-  if (std::size_t rem = sz & (align - 1))
-    sz += align - rem;
+  sz = (sz + align - 1) & ~(align - 1);
 #endif
 
   void *p;