diff mbox series

[4/5] barrier: use int instead of unsigned char for the phase state

Message ID 20210226155937.621324-4-thiago.macieira@intel.com
State New
Headers show
Series [1/5] std::latch: reduce internal implementation from ptrdiff_t to int | expand

Commit Message

Thiago Macieira Feb. 26, 2021, 3:59 p.m. UTC
ints can be used in futexes. chars can't.
---
 libstdc++-v3/include/std/barrier | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

Comments

Hans-Peter Nilsson Feb. 28, 2021, 3:05 p.m. UTC | #1
On Fri, 26 Feb 2021, Thiago Macieira via Gcc-patches wrote:
> ints can be used in futexes. chars can't.

Shouldn't that be an atomic type instead of a bare int then?

> ---
>  libstdc++-v3/include/std/barrier | 21 ++++++++++++---------
>  1 file changed, 12 insertions(+), 9 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/barrier b/libstdc++-v3/include/std/barrier
> index e09212dfcb9..ae058bd3dc3 100644
> --- a/libstdc++-v3/include/std/barrier
> +++ b/libstdc++-v3/include/std/barrier
> @@ -70,7 +70,7 @@ It looks different from literature pseudocode for two main reasons:
>
>  */
>
> -  enum class __barrier_phase_t : unsigned char { };
> +  enum class __barrier_phase_t : int { };

brgds, H-P
Thomas Rodgers March 1, 2021, 4:28 p.m. UTC | #2
On 2021-02-28 07:05, Hans-Peter Nilsson wrote:

> On Fri, 26 Feb 2021, Thiago Macieira via Gcc-patches wrote:
> 
>> ints can be used in futexes. chars can't.
> 
> Shouldn't that be an atomic type instead of a bare int then?
> 

It's an atomic_ref.

>> ---
>> libstdc++-v3/include/std/barrier | 21 ++++++++++++---------
>> 1 file changed, 12 insertions(+), 9 deletions(-)
>> 
>> diff --git a/libstdc++-v3/include/std/barrier 
>> b/libstdc++-v3/include/std/barrier
>> index e09212dfcb9..ae058bd3dc3 100644
>> --- a/libstdc++-v3/include/std/barrier
>> +++ b/libstdc++-v3/include/std/barrier
>> @@ -70,7 +70,7 @@ It looks different from literature pseudocode for 
>> two main reasons:
>> 
>> */
>> 
>> -  enum class __barrier_phase_t : unsigned char { };
>> +  enum class __barrier_phase_t : int { };
> 
> brgds, H-P
Thiago Macieira March 1, 2021, 5:24 p.m. UTC | #3
On Sunday, 28 February 2021 07:05:47 PST Hans-Peter Nilsson wrote:
> On Fri, 26 Feb 2021, Thiago Macieira via Gcc-patches wrote:
> > ints can be used in futexes. chars can't.
> 
> Shouldn't that be an atomic type instead of a bare int then?

There are a couple of atomic_refs:

      using __atomic_phase_ref_t = std::__atomic_ref<__barrier_phase_t>;
      using __atomic_phase_const_ref_t = std::__atomic_ref<const 
__barrier_phase_t>;

And _M_phase, despite being non-atomic, is never accessed without the 
atomic_ref, aside from the constructor. Both arrive() and wait() start off by 
creating the atomic_ref.

But I confess I don't understand this code sufficiently to say it is correct. 
I'm simply saying that waiting on unsigned chars will not use a futex, at 
least until https://lkml.org/lkml/2019/12/4/1373 is merged into the kernel.
Thomas Rodgers March 1, 2021, 5:38 p.m. UTC | #4
On 2021-03-01 09:24, Thiago Macieira via Libstdc++ wrote:

> On Sunday, 28 February 2021 07:05:47 PST Hans-Peter Nilsson wrote: On 
> Fri, 26 Feb 2021, Thiago Macieira via Gcc-patches wrote: ints can be 
> used in futexes. chars can't.
> Shouldn't that be an atomic type instead of a bare int then?

> There are a couple of atomic_refs:
> 
>      using __atomic_phase_ref_t = std::__atomic_ref<__barrier_phase_t>;
>      using __atomic_phase_const_ref_t = std::__atomic_ref<const
> __barrier_phase_t>;
> 
> And _M_phase, despite being non-atomic, is never accessed without the
> atomic_ref, aside from the constructor. Both arrive() and wait() start 
> off by
> creating the atomic_ref.

If it's non-atomic, then how is wait() supposed to wait on it, 
atomically?

> But I confess I don't understand this code sufficiently to say it is 
> correct.
> I'm simply saying that waiting on unsigned chars will not use a futex, 
> at
> least until https://lkml.org/lkml/2019/12/4/1373 is merged into the 
> kernel.

And I am not disagreeing with that. I am, however saying, that I know 
this particular implementation (well the upstream one it is based on) 
has been extensively tested by the original author (ogiroux) including 
time on Summit. If we are going to start changing his design decisions 
(beyond the largely cosmetic, not algorithmic, ones that I have made as 
per Jonathan's request), they should be motivated by more than a 'well 
we feel int's would be better here because Futex' justification.
Thomas Rodgers March 1, 2021, 5:40 p.m. UTC | #5
Botched replying to the list, sending again

On 2021-03-01 09:38, Thomas Rodgers wrote:

> On 2021-03-01 09:24, Thiago Macieira via Libstdc++ wrote:
> 
>> On Sunday, 28 February 2021 07:05:47 PST Hans-Peter Nilsson wrote: On 
>> Fri, 26 Feb 2021, Thiago Macieira via Gcc-patches wrote: ints can be 
>> used in futexes. chars can't.
>> Shouldn't that be an atomic type instead of a bare int then?
> 
>> There are a couple of atomic_refs:
>> 
>> using __atomic_phase_ref_t = std::__atomic_ref<__barrier_phase_t>;
>> using __atomic_phase_const_ref_t = std::__atomic_ref<const
>> __barrier_phase_t>;
>> 
>> And _M_phase, despite being non-atomic, is never accessed without the
>> atomic_ref, aside from the constructor. Both arrive() and wait() start 
>> off by
>> creating the atomic_ref.
> 
> If it's non-atomic, then how is wait() supposed to wait on it, 
> atomically?
> 
>> But I confess I don't understand this code sufficiently to say it is 
>> correct.
>> I'm simply saying that waiting on unsigned chars will not use a futex, 
>> at
>> least until https://lkml.org/lkml/2019/12/4/1373 is merged into the 
>> kernel.
> 
> And I am not disagreeing with that. I am, however saying, that I know 
> this particular implementation (well the upstream one it is based on) 
> has been extensively tested by the original author (ogiroux) including 
> time on Summit. If we are going to start changing his design decisions 
> (beyond the largely cosmetic, not algorithmic, ones that I have made as 
> per Jonathan's request), they should be motivated by more than a 'well 
> we feel int's would be better here because Futex' justification.
Thiago Macieira March 1, 2021, 6:06 p.m. UTC | #6
On Monday, 1 March 2021 09:38:58 PST Thomas Rodgers wrote:
> > And _M_phase, despite being non-atomic, is never accessed without the
> > atomic_ref, aside from the constructor. Both arrive() and wait() start
> > off by
> > creating the atomic_ref.
> 
> If it's non-atomic, then how is wait() supposed to wait on it,
> atomically?

Hey, it's your code :-)

It is using atomics to operate on the value. It's just that the type on the 
structure isn't an atomic by itself.

Why, I don't know. Olivier's original code did use atomics 
<https://github.com/ogiroux/atomic_wait/blob/master/include/barrier#L55-L56>:
    std::atomic<ptrdiff_t>             expected_adjustment;
    std::atomic<__phase_t>             phase;


> And I am not disagreeing with that. I am, however saying, that I know
> this particular implementation (well the upstream one it is based on)
> has been extensively tested by the original author (ogiroux) including
> time on Summit. If we are going to start changing his design decisions
> (beyond the largely cosmetic, not algorithmic, ones that I have made as
> per Jonathan's request), they should be motivated by more than a 'well
> we feel int's would be better here because Futex' justification.

That's a reasonable request.

I'd like to see the benchmark results of using a directly-futexable type vs 
using unsigned char. But even the timing results need to be weighed against 
the increased memory use for std::barrier, which means it's not a directly-
objective conclusion. And given we *may* get 8-bit futexes, it might be worth 
keeping them this way and just tell people with large contentions to upgrade.

That of course rests on the contended atomic_wait being out-of-line.
Ville Voutilainen March 1, 2021, 6:12 p.m. UTC | #7
On Mon, 1 Mar 2021 at 20:09, Thiago Macieira via Libstdc++
<libstdc++@gcc.gnu.org> wrote:
>
> On Sunday, 28 February 2021 07:05:47 PST Hans-Peter Nilsson wrote:
> > On Fri, 26 Feb 2021, Thiago Macieira via Gcc-patches wrote:
> > > ints can be used in futexes. chars can't.
> >
> > Shouldn't that be an atomic type instead of a bare int then?
>
> There are a couple of atomic_refs:
>
>       using __atomic_phase_ref_t = std::__atomic_ref<__barrier_phase_t>;
>       using __atomic_phase_const_ref_t = std::__atomic_ref<const
> __barrier_phase_t>;
>
> And _M_phase, despite being non-atomic, is never accessed without the
> atomic_ref, aside from the constructor. Both arrive() and wait() start off by
> creating the atomic_ref.
>
> But I confess I don't understand this code sufficiently to say it is correct.
> I'm simply saying that waiting on unsigned chars will not use a futex, at
> least until https://lkml.org/lkml/2019/12/4/1373 is merged into the kernel.

I do have a question about the intent/concern here, regardless of what
your patch technically
does. The ABI break _is_ your concern, and the "heisenbugs" you were
worried about would
in fact be caused by the ABI break? So if you embed these things in
your QAtomicThing, the ABI
break may mess it up(*)? Is that a correct understanding of the concern here?

(*) And inlining might not help because users might compile a
different std::atomic_thing in their
program than what your DSO has been compiled with, and you may end up
using mixtures.
Thomas Rodgers March 1, 2021, 7:08 p.m. UTC | #8
On 2021-03-01 10:06, Thiago Macieira wrote:

> On Monday, 1 March 2021 09:38:58 PST Thomas Rodgers wrote: And 
> _M_phase, despite being non-atomic, is never accessed without the
> atomic_ref, aside from the constructor. Both arrive() and wait() start
> off by
> creating the atomic_ref.
> If it's non-atomic, then how is wait() supposed to wait on it,
> atomically?

> Hey, it's your code :-)
> 
> It is using atomics to operate on the value. It's just that the type on 
> the
> structure isn't an atomic by itself.

> Why, I don't know. Olivier's original code did use atomics
> <https://github.com/ogiroux/atomic_wait/blob/master/include/barrier#L55-L56>:
>    std::atomic<ptrdiff_t>             expected_adjustment;
>    std::atomic<__phase_t>             phase;

It is atomic, in that it is always an atomic_ref. This change was made 
at
Jonathan's request.

> And I am not disagreeing with that. I am, however saying, that I know
> this particular implementation (well the upstream one it is based on)
> has been extensively tested by the original author (ogiroux) including
> time on Summit. If we are going to start changing his design decisions
> (beyond the largely cosmetic, not algorithmic, ones that I have made as
> per Jonathan's request), they should be motivated by more than a 'well
> we feel int's would be better here because Futex' justification.

> That's a reasonable request.
> 
> I'd like to see the benchmark results of using a directly-futexable 
> type vs
> using unsigned char. But even the timing results need to be weighed 
> against
> the increased memory use for std::barrier, which means it's not a 
> directly-
> objective conclusion. And given we *may* get 8-bit futexes, it might be 
> worth
> keeping them this way and just tell people with large contentions to 
> upgrade.

We may also want to introduce a central barrier type for memory 
constrained environments. I specifically
removed that from this patch because it -

1) wasn't clear how we'd go about making that decision
2) this support in GCC11 is experimental

> That of course rests on the contended atomic_wait being out-of-line.
Thiago Macieira March 1, 2021, 7:44 p.m. UTC | #9
On Monday, 1 March 2021 10:12:35 PST Ville Voutilainen wrote:
> I do have a question about the intent/concern here, regardless of what
> your patch technically
> does. The ABI break _is_ your concern, and the "heisenbugs" you were
> worried about would
> in fact be caused by the ABI break? So if you embed these things in
> your QAtomicThing, the ABI
> break may mess it up(*)? Is that a correct understanding of the concern
> here?

Let me clarify. I am stating that the current implementation is inefficient 
because Linux currently only implements 32-bit futexes. With that in mind, I 
am arguing we need to change the implementation in a way that will break ABI 
in a new release.

The concern is that such a change, despite being in experimental code for 
which ABI stability has never been promised, is still troublesome. The problem 
there is that even people who have read the documentation and waited until 
2024 to write code using the feature may still be affected. This isn't about 
Qt because I have no plans to use wait and notify in inline API.

But you can see someone doing:

#if __cplusplus >= 202002L && __has_include(<latch>)
#  include <latch>
#else
#  error "Please upgrade your compiler & standard library"
#endif

and using <latch> in their inline code. And as you say, if they then mix DSOs 
compiled with different versions of GCC, one of them might be the old, 
experimental and binary-incompatible code. Remember that before April 2024, 
the most recent Ubuntu LTS will be 22.04, which will be released before GCC 
12, which means it will contain GCC 11.

So, wholly aside from how we fix the inefficiencies, we must decide how to 
deal with the ABI break. We can:

a) not have an ABI break in the first place, by having the above code not 
   compile with GCC until we promise ABI compatibility
b) cause a non-silent ABI break
c) accept the silent ABI break

I advocate for (a) and vehemently disagree with (c). Meanwhile, (b) can be 
something like (untested!):

 _GLIBCXX_BEGIN_NAMESPACE_VERSION
   namespace __detail
   {
+    struct __waiters;
+    inline namespace __experimental
+    {
+      // from atomic_wait.cc
+      __waiters &__get_waiter(const void *);
+    }
     using __platform_wait_t = int;
 
     constexpr auto __atomic_spin_count_1 = 16;
@@ -187,11 +193,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       static __waiters&
       _S_for(const void* __t)
       {
-       const unsigned char __mask = 0xf;
-       static __waiters __w[__mask + 1];
-
-       auto __key = _Hash_impl::hash(__t) & __mask;
-       return __w[__key];
+       return __get_waiter(__t);
       }
     };

That way, any DSO compiled using GCC 11 will fail to load when using GCC 12's 
libstdc++. And probably put "std::__detail::__experimental" in the 
GLIBCXX_EXPERIMENTAL ELF version so it's even clearer that it's experimental, 
thus helping Linux distributions and other binary artefact distributors 
realise they've made a mistake.

I still don't like (b) because it pushes the problem to the wrong people: the 
packagers. But it's far better than (c).
Ville Voutilainen March 1, 2021, 8:35 p.m. UTC | #10
On Mon, 1 Mar 2021 at 21:44, Thiago Macieira <thiago.macieira@intel.com> wrote:
> But you can see someone doing:
>
> #if __cplusplus >= 202002L && __has_include(<latch>)
> #  include <latch>
> #else
> #  error "Please upgrade your compiler & standard library"
> #endif
>
> and using <latch> in their inline code. And as you say, if they then mix DSOs
> compiled with different versions of GCC, one of them might be the old,
> experimental and binary-incompatible code. Remember that before April 2024,
> the most recent Ubuntu LTS will be 22.04, which will be released before GCC
> 12, which means it will contain GCC 11.
>
> So, wholly aside from how we fix the inefficiencies, we must decide how to
> deal with the ABI break. We can:
>
> a) not have an ABI break in the first place, by having the above code not
>    compile with GCC until we promise ABI compatibility
> b) cause a non-silent ABI break
> c) accept the silent ABI break
>
> I advocate for (a) and vehemently disagree with (c). Meanwhile, (b) can be
> something like (untested!):

I can't make the above code work, in any reasonable manner, because
it's doing feature detection incorrectly. :)
What I can imagine doing, however, is this:

1) a published IS always bumps feature-macro values (this won't help
now, it's a future fix, we don't currently do this in WG21)
2) an implementation uses the pre-IS draft values before it deems itself stable
3) users who want stability should require the feature-testing macro
to have at least the IS value
4) adventurous users can either use the macro without checking its
value, or use at least the value that gives them
whatever shiny toy they're interested in

With that, we can keep allowing adventurous users to opt in early, and
you have a portable way to detect that
features are stable, for an implementation-defined definition of "stable".
Thiago Macieira March 1, 2021, 9:54 p.m. UTC | #11
On Monday, 1 March 2021 12:35:05 PST Ville Voutilainen wrote:
> I can't make the above code work, in any reasonable manner, because
> it's doing feature detection incorrectly. :)
> What I can imagine doing, however, is this:
> 
> 1) a published IS always bumps feature-macro values (this won't help
> now, it's a future fix, we don't currently do this in WG21)

This is mostly already the case. I haven't seen any need to bump the values.

> 2) an implementation uses the pre-IS draft values before it deems itself
> stable 

Before the IS is stable? From what I've seen so far, the macros are always 
around the month of the latest draft of a given feature. So if an 
implementation implemented an earlier draft, the macro version would increase 
in a new draft coming before the IS.

I think we just need to be careful of updates done after the latest draft is 
adopted, usually by NBs. Those have to bump the macro too. I think you're in 
good speaking terms with the Finland NB :-)

> 3) users who want stability should require the feature-testing macro
> to have at least the IS value

True. That's not the stability I was talking about, but it could be done.

> 4) adventurous users can either use the macro without checking its
> value, or use at least the value that gives them
> whatever shiny toy they're interested in

Fair enough.

Please note I am not talking about the stability of the feature as described 
in the standard or a proposal or draft. I am talking about the stability of 
the implementation. C++20 is out and the atomic-wait feature is stable, as 
defined by the standard. What isn't stable is GCC's implementation of it.

But your suggestion does work. We don't need to apply them to all macros, only 
those that are new in a given version, like __cpp_lib_atomic_wait or 
__cpp_lib_latch in this case. Alternatively, implementations can set the macro 
to a given value below the standard-required one (incl. 1) to indicate that 
they haven't achieved stability.

That would make my check:

#if __cplusplus >= 202002L && __has_include(<latch>)
#  include <latch>
#endif
#if __cpp_lib_latch < 201907L
#  error "Please upgrade your Standard Library"
#endif
Ville Voutilainen March 1, 2021, 10:04 p.m. UTC | #12
On Mon, 1 Mar 2021 at 23:54, Thiago Macieira <thiago.macieira@intel.com> wrote:
> But your suggestion does work. We don't need to apply them to all macros, only
> those that are new in a given version, like __cpp_lib_atomic_wait or
> __cpp_lib_latch in this case. Alternatively, implementations can set the macro
> to a given value below the standard-required one (incl. 1) to indicate that
> they haven't achieved stability.
>
> That would make my check:
>
> #if __cplusplus >= 202002L && __has_include(<latch>)
> #  include <latch>
> #endif
> #if __cpp_lib_latch < 201907L
> #  error "Please upgrade your Standard Library"
> #endif

Well, this would be different. What I'm suggesting is not quite that;
for any *new* facility, we'd make sure
that its draft macro and the final IS macro are different, but the
minimum value is the first draft version,
not anything below it. I don't care too much, that approach and yours
would work the same way. Things that already
had an IS value for a macro and haven't changed since don't need to be
changed. And we don't
need to bump all values of existing facilities either, just for those
that got changes, so some existing macros
would be considered lost causes. Like the ones we're talking about,
because the cats are already out of the
bag.

I'll write a paper. That won't help you now, but it gives us tools in
the future.
Thiago Macieira March 1, 2021, 10:21 p.m. UTC | #13
On Monday, 1 March 2021 14:04:34 PST Ville Voutilainen wrote:
> Well, this would be different. What I'm suggesting is not quite that;
> for any *new* facility, we'd make sure
> that its draft macro and the final IS macro are different, but the
> minimum value is the first draft version,
> not anything below it. I don't care too much, that approach and yours
> would work the same way. Things that already
> had an IS value for a macro and haven't changed since don't need to be
> changed. And we don't
> need to bump all values of existing facilities either, just for those
> that got changes, so some existing macros
> would be considered lost causes. Like the ones we're talking about,
> because the cats are already out of the
> bag.

But the code I posted, if people are careful to use write like I did, would 
allow us to have the experimental "we're not sure this is right" 
implementation of atomic waits, latches, barriers and semaphores right now.

It would simply require that we decrement the macros by 1 in the libstdc++ 
headers.
Ville Voutilainen March 1, 2021, 10:31 p.m. UTC | #14
On Tue, 2 Mar 2021 at 00:21, Thiago Macieira <thiago.macieira@intel.com> wrote:

> But the code I posted, if people are careful to use write like I did, would
> allow us to have the experimental "we're not sure this is right"
> implementation of atomic waits, latches, barriers and semaphores right now.

The code assumes that as soon as __cplusplus bumps and a header is
present, things
are stable. I don't think that's a safe assumption to make.
Furthermore, the second  #if
tries to check a feature-testing macro without including either the
corresponding header
or <version>. That doesn't work. You need to either include <version>
and check a macro,
or check that <latch> exists, then include <latch> and then check the macro.

But other than that, sure, as QoI, vendors could already provide the
standard macros with
numbers that are lower than the standard ever specified. Going
forward, if existing facilities
are changed, this stops working because now you'd have to track the
WPs to see which
values are "vendor-bogus". I find it better to just change the macros
whose facilities changed
during a standard cycle, and in those cases bump the IS macro, that'll
then work going forward.
In the meanwhile, when vendors can, they could use the technique you
describe, but that's
barely worth proposing as an SG10 guideline because they would've
needed to do it already, but didn't. :P
Thiago Macieira March 1, 2021, 10:40 p.m. UTC | #15
On Monday, 1 March 2021 14:31:09 PST Ville Voutilainen wrote:
> On Tue, 2 Mar 2021 at 00:21, Thiago Macieira <thiago.macieira@intel.com> 
wrote:
> > But the code I posted, if people are careful to use write like I did,
> > would
> > allow us to have the experimental "we're not sure this is right"
> > implementation of atomic waits, latches, barriers and semaphores right
> > now.
> 
> The code assumes that as soon as __cplusplus bumps and a header is
> present, things
> are stable.

Not exactly. Re-posting the code for reference:

#if __cplusplus >= 202002L && __has_include(<latch>)
#  include <latch>
#endif
#if __cpp_lib_latch < 201907L
#  error "Please upgrade your Standard Library"
#endif

The first section simply assumes that <latch> can be #included. The 
__cplusplus check is necessary because MSVC STL's headers are present but 
can't be #include'd otherwise (they cause #error).

It's the second check that is authoritative.

> I don't think that's a safe assumption to make.
> Furthermore, the second  #if
> tries to check a feature-testing macro without including either the
> corresponding header
> or <version>. That doesn't work. You need to either include <version>
> and check a macro,
> or check that <latch> exists, then include <latch> and then check the macro.

<version> would work, but why can't you check the macro anyway? It may trigger 
a warning with GCC's -Wundef, but it's just a warning. The preprocessor is 
defined to replace any undefined identifier with 0. If you want to avoid the 
warning, you could write:

#if defined(__cpp_lib_latch) && __cpp_lib_latch < 201907L

Is there anything I'm not seeing?

> But other than that, sure, as QoI, vendors could already provide the
> standard macros with
> numbers that are lower than the standard ever specified. Going
> forward, if existing facilities
> are changed, this stops working because now you'd have to track the
> WPs to see which
> values are "vendor-bogus". I find it better to just change the macros
> whose facilities changed
> during a standard cycle, and in those cases bump the IS macro, that'll
> then work going forward.
> In the meanwhile, when vendors can, they could use the technique you
> describe, but that's
> barely worth proposing as an SG10 guideline because they would've
> needed to do it already, but didn't. :P

I am not opposed to that. Your solution is better.

But this solution is less work on the standard and still works, even if it 
creates a little more work on the users of said features. It's not 
unsurmountable, since we often need to check which C++ standard edition it 
came with anyway. So it doesn't matter what value it assumes: we'll be 
consulting a table anyway.
diff mbox series

Patch

diff --git a/libstdc++-v3/include/std/barrier b/libstdc++-v3/include/std/barrier
index e09212dfcb9..ae058bd3dc3 100644
--- a/libstdc++-v3/include/std/barrier
+++ b/libstdc++-v3/include/std/barrier
@@ -70,7 +70,7 @@  It looks different from literature pseudocode for two main reasons:
 
 */
 
-  enum class __barrier_phase_t : unsigned char { };
+  enum class __barrier_phase_t : int { };
 
   template<typename _CompletionF>
     class __tree_barrier
@@ -93,20 +93,24 @@  It looks different from literature pseudocode for two main reasons:
 
       alignas(__phase_alignment) __barrier_phase_t  _M_phase;
 
+      static __barrier_phase_t
+      _S_add_to_phase(__barrier_phase_t __phase, unsigned char __n)
+      {
+        __n += static_cast<unsigned char>(__phase);
+        return static_cast<__barrier_phase_t>(__n);
+      }
+
       bool
       _M_arrive(__barrier_phase_t __old_phase)
       {
-	const auto __old_phase_val = static_cast<unsigned char>(__old_phase);
-	const auto __half_step =
-			   static_cast<__barrier_phase_t>(__old_phase_val + 1);
-	const auto __full_step =
-			   static_cast<__barrier_phase_t>(__old_phase_val + 2);
-
 	size_t __current_expected = _M_expected;
 	std::hash<std::thread::id> __hasher;
 	size_t __current = __hasher(std::this_thread::get_id())
 					  % ((_M_expected + 1) >> 1);
 
+	const auto __half_step = _S_add_to_phase(__old_phase, 1);
+	const auto __full_step = _S_add_to_phase(__old_phase, 2);
+
 	for (int __round = 0; ; ++__round)
 	  {
 	    if (__current_expected <= 1)
@@ -165,7 +169,6 @@  It looks different from literature pseudocode for two main reasons:
       {
 	__atomic_phase_ref_t __phase(_M_phase);
 	const auto __old_phase = __phase.load(memory_order_relaxed);
-	const auto __cur = static_cast<unsigned char>(__old_phase);
 	for(; __update; --__update)
 	  {
 	    if(_M_arrive(__old_phase))
@@ -173,7 +176,7 @@  It looks different from literature pseudocode for two main reasons:
 		_M_completion();
 		_M_expected += _M_expected_adjustment.load(memory_order_relaxed);
 		_M_expected_adjustment.store(0, memory_order_relaxed);
-		auto __new_phase = static_cast<__barrier_phase_t>(__cur + 2);
+		auto __new_phase = _S_add_to_phase(__old_phase, 2);
 		__phase.store(__new_phase, memory_order_release);
 		__phase.notify_all();
 	      }