mbox series

[v2,00/10,RISC-V] Atomics improvements [PR100265/PR100266]

Message ID 20210505193651.2075405-1-cmuellner@gcc.gnu.org
Headers show
Series Atomics improvements [PR100265/PR100266] | expand

Message

Christoph Müllner May 5, 2021, 7:36 p.m. UTC
This series provides a cleanup of the current atomics implementation
of RISC-V:

* PR100265: Use proper fences for atomic load/store
* PR100266: Provide programmatic implementation of CAS

As both are very related, I merged the patches into one series.

The first patch could be squashed into the following patches,
but I found it easier to understand the chances with it in place.

The series has been tested as follows:
* Building and testing a multilib RV32/64 toolchain
  (bootstrapped with riscv-gnu-toolchain repo)
* Manual review of generated sequences for GCC's atomic builtins API

The programmatic re-implementation of CAS benefits from a REE improvement
(see PR100264):
  https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568680.html
If this patch is not in place, then an additional extension instruction
is emitted after the SC.W (in case of RV64 and CAS for uint32_t).

Further, the new CAS code requires cbranch INSN helpers to be present:
  https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569689.html

Changes for v2:
* Guard LL/SC sequence by compiler barriers ("blockage")
  (suggested by Andrew Waterman)
* Changed commit message for AMOSWAP->STORE change
  (suggested by Andrew Waterman)
* Extracted cbranch4 patch from patchset (suggested by Kito Cheng)
* Introduce predicate riscv_sync_memory_operand (suggested by Jim Wilson)
* Fix small code style issue

Christoph Muellner (10):
  RISC-V: Simplify memory model code [PR 100265]
  RISC-V: Emit proper memory ordering suffixes for AMOs [PR 100265]
  RISC-V: Eliminate %F specifier from riscv_print_operand() [PR 100265]
  RISC-V: Use STORE instead of AMOSWAP for atomic stores [PR 100265]
  RISC-V: Emit fences according to chosen memory model [PR 100265]
  RISC-V: Implement atomic_{load,store} [PR 100265]
  RISC-V: Model INSNs for LR and SC [PR 100266]
  RISC-V: Add s.ext-consuming INSNs for LR and SC [PR 100266]
  RISC-V: Provide programmatic implementation of CAS [PR 100266]
  RISC-V: Introduce predicate "riscv_sync_memory_operand" [PR 100266]

 gcc/config/riscv/riscv-protos.h |   1 +
 gcc/config/riscv/riscv.c        | 136 +++++++++++++-------
 gcc/config/riscv/sync.md        | 216 +++++++++++++++++++++-----------
 3 files changed, 235 insertions(+), 118 deletions(-)

Comments

Vineet Gupta Oct. 11, 2022, 7:06 p.m. UTC | #1
Hi Christoph, Kito,

On 5/5/21 12:36, Christoph Muellner via Gcc-patches wrote:
> This series provides a cleanup of the current atomics implementation
> of RISC-V:
>
> * PR100265: Use proper fences for atomic load/store
> * PR100266: Provide programmatic implementation of CAS
>
> As both are very related, I merged the patches into one series.
>
> The first patch could be squashed into the following patches,
> but I found it easier to understand the chances with it in place.
>
> The series has been tested as follows:
> * Building and testing a multilib RV32/64 toolchain
>    (bootstrapped with riscv-gnu-toolchain repo)
> * Manual review of generated sequences for GCC's atomic builtins API
>
> The programmatic re-implementation of CAS benefits from a REE improvement
> (see PR100264):
>    https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568680.html
> If this patch is not in place, then an additional extension instruction
> is emitted after the SC.W (in case of RV64 and CAS for uint32_t).
>
> Further, the new CAS code requires cbranch INSN helpers to be present:
>    https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569689.html

I was wondering is this patchset is blocked on some technical grounds.

Thx,
-Vineet

> Changes for v2:
> * Guard LL/SC sequence by compiler barriers ("blockage")
>    (suggested by Andrew Waterman)
> * Changed commit message for AMOSWAP->STORE change
>    (suggested by Andrew Waterman)
> * Extracted cbranch4 patch from patchset (suggested by Kito Cheng)
> * Introduce predicate riscv_sync_memory_operand (suggested by Jim Wilson)
> * Fix small code style issue
>
> Christoph Muellner (10):
>    RISC-V: Simplify memory model code [PR 100265]
>    RISC-V: Emit proper memory ordering suffixes for AMOs [PR 100265]
>    RISC-V: Eliminate %F specifier from riscv_print_operand() [PR 100265]
>    RISC-V: Use STORE instead of AMOSWAP for atomic stores [PR 100265]
>    RISC-V: Emit fences according to chosen memory model [PR 100265]
>    RISC-V: Implement atomic_{load,store} [PR 100265]
>    RISC-V: Model INSNs for LR and SC [PR 100266]
>    RISC-V: Add s.ext-consuming INSNs for LR and SC [PR 100266]
>    RISC-V: Provide programmatic implementation of CAS [PR 100266]
>    RISC-V: Introduce predicate "riscv_sync_memory_operand" [PR 100266]
>
>   gcc/config/riscv/riscv-protos.h |   1 +
>   gcc/config/riscv/riscv.c        | 136 +++++++++++++-------
>   gcc/config/riscv/sync.md        | 216 +++++++++++++++++++++-----------
>   3 files changed, 235 insertions(+), 118 deletions(-)
>
Palmer Dabbelt Oct. 11, 2022, 7:31 p.m. UTC | #2
On Tue, 11 Oct 2022 12:06:27 PDT (-0700), Vineet Gupta wrote:
> Hi Christoph, Kito,
>
> On 5/5/21 12:36, Christoph Muellner via Gcc-patches wrote:
>> This series provides a cleanup of the current atomics implementation
>> of RISC-V:
>>
>> * PR100265: Use proper fences for atomic load/store
>> * PR100266: Provide programmatic implementation of CAS
>>
>> As both are very related, I merged the patches into one series.
>>
>> The first patch could be squashed into the following patches,
>> but I found it easier to understand the chances with it in place.
>>
>> The series has been tested as follows:
>> * Building and testing a multilib RV32/64 toolchain
>>    (bootstrapped with riscv-gnu-toolchain repo)
>> * Manual review of generated sequences for GCC's atomic builtins API
>>
>> The programmatic re-implementation of CAS benefits from a REE improvement
>> (see PR100264):
>>    https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568680.html
>> If this patch is not in place, then an additional extension instruction
>> is emitted after the SC.W (in case of RV64 and CAS for uint32_t).
>>
>> Further, the new CAS code requires cbranch INSN helpers to be present:
>>    https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569689.html
>
> I was wondering is this patchset is blocked on some technical grounds.

There's a v3 (though I can't find all of it, so not quite sure what 
happened), but IIUC that still has the same fundamental problems that 
all these have had: changing over to the new fence model may by an ABI 
break and the split CAS implementation doesn't ensure eventual success 
(see Jim's comments).  Not sure if there's other comments floating 
around, though, that's just what I remember.

+Andrea, in case he has time to look at the memory model / ABI issues.  

We'd still need to sort out the CAS issues, though, and it's not 
abundantly clear it's worth the work: we're essentailly constrained to 
just emitting those fixed CAS sequences due to the eventual success 
rules, so it's not clear what the benefit of splitting those up is.  
With WRS there are some routines we might want to generate code for 
(cond_read_acquire() in Linux, for example) but we'd really need to dig 
into those to see if it's even sane/fast.

There's another patch set to fix the lack of inline atomic routines 
without breaking stuff, there were some minor comments from Kito and 
IIRC I had some test failures that I needed to chase down as well.  
That's a much safer fix in the short term, we'll need to deal with this 
eventually but at least we can stop the libatomic issues for the distro 
folks.

>
> Thx,
> -Vineet
>
>> Changes for v2:
>> * Guard LL/SC sequence by compiler barriers ("blockage")
>>    (suggested by Andrew Waterman)
>> * Changed commit message for AMOSWAP->STORE change
>>    (suggested by Andrew Waterman)
>> * Extracted cbranch4 patch from patchset (suggested by Kito Cheng)
>> * Introduce predicate riscv_sync_memory_operand (suggested by Jim Wilson)
>> * Fix small code style issue
>>
>> Christoph Muellner (10):
>>    RISC-V: Simplify memory model code [PR 100265]
>>    RISC-V: Emit proper memory ordering suffixes for AMOs [PR 100265]
>>    RISC-V: Eliminate %F specifier from riscv_print_operand() [PR 100265]
>>    RISC-V: Use STORE instead of AMOSWAP for atomic stores [PR 100265]
>>    RISC-V: Emit fences according to chosen memory model [PR 100265]
>>    RISC-V: Implement atomic_{load,store} [PR 100265]
>>    RISC-V: Model INSNs for LR and SC [PR 100266]
>>    RISC-V: Add s.ext-consuming INSNs for LR and SC [PR 100266]
>>    RISC-V: Provide programmatic implementation of CAS [PR 100266]
>>    RISC-V: Introduce predicate "riscv_sync_memory_operand" [PR 100266]
>>
>>   gcc/config/riscv/riscv-protos.h |   1 +
>>   gcc/config/riscv/riscv.c        | 136 +++++++++++++-------
>>   gcc/config/riscv/sync.md        | 216 +++++++++++++++++++++-----------
>>   3 files changed, 235 insertions(+), 118 deletions(-)
>>
Christoph Müllner Oct. 11, 2022, 8:46 p.m. UTC | #3
On Tue, Oct 11, 2022 at 9:31 PM Palmer Dabbelt <palmer@dabbelt.com> wrote:

> On Tue, 11 Oct 2022 12:06:27 PDT (-0700), Vineet Gupta wrote:
> > Hi Christoph, Kito,
> >
> > On 5/5/21 12:36, Christoph Muellner via Gcc-patches wrote:
> >> This series provides a cleanup of the current atomics implementation
> >> of RISC-V:
> >>
> >> * PR100265: Use proper fences for atomic load/store
> >> * PR100266: Provide programmatic implementation of CAS
> >>
> >> As both are very related, I merged the patches into one series.
> >>
> >> The first patch could be squashed into the following patches,
> >> but I found it easier to understand the chances with it in place.
> >>
> >> The series has been tested as follows:
> >> * Building and testing a multilib RV32/64 toolchain
> >>    (bootstrapped with riscv-gnu-toolchain repo)
> >> * Manual review of generated sequences for GCC's atomic builtins API
> >>
> >> The programmatic re-implementation of CAS benefits from a REE
> improvement
> >> (see PR100264):
> >>    https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568680.html
> >> If this patch is not in place, then an additional extension instruction
> >> is emitted after the SC.W (in case of RV64 and CAS for uint32_t).
> >>
> >> Further, the new CAS code requires cbranch INSN helpers to be present:
> >>    https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569689.html
> >
> > I was wondering is this patchset is blocked on some technical grounds.
>
> There's a v3 (though I can't find all of it, so not quite sure what
> happened), but IIUC that still has the same fundamental problems that
> all these have had: changing over to the new fence model may by an ABI
> break and the split CAS implementation doesn't ensure eventual success
> (see Jim's comments).  Not sure if there's other comments floating
> around, though, that's just what I remember.
>

v3 was sent on May 27, 2022, when I rebased this on an internal tree:
  https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595712.html
I dropped the CAS patch in v3 (issue: stack spilling under extreme register
pressure instead of erroring out) as I thought that this was the blocker
for the series.
I just learned a few weeks ago, when I asked Palmer at the GNU Cauldron
about this series, that the ABI break is the blocker.

My initial understanding was that fixing something broken cannot be an ABI
break.
And that the mismatch of the implementation in 2021 and the recommended
mappings in the ratified specification from 2019 is something that is
broken. I still don't know the background here, but I guess this assumption
is incorrect from a historical point of view.

However, I'm sure that I am not the only one that assumes the mappings in
the specification to be implemented in compilers and tools. Therefore I
still consider the implementation of the RISC-V atomics in GCC as broken
(at least w.r.t. user expectation from people that lack the historical
background and just read the RISC-V specification).



>
> +Andrea, in case he has time to look at the memory model / ABI issues.
>
> We'd still need to sort out the CAS issues, though, and it's not
> abundantly clear it's worth the work: we're essentailly constrained to
> just emitting those fixed CAS sequences due to the eventual success
> rules, so it's not clear what the benefit of splitting those up is.
> With WRS there are some routines we might want to generate code for
> (cond_read_acquire() in Linux, for example) but we'd really need to dig
> into those to see if it's even sane/fast.
>
> There's another patch set to fix the lack of inline atomic routines
> without breaking stuff, there were some minor comments from Kito and
> IIRC I had some test failures that I needed to chase down as well.
> That's a much safer fix in the short term, we'll need to deal with this
> eventually but at least we can stop the libatomic issues for the distro
> folks.
>

I expect that the pressure for a proper fix upstream (instead of a backward
compatible compromise) will increase over time (once people start building
big iron based on RISC-V and start hunting performance bottlenecks in
multithreaded workloads to be competitive).
What could be done to get some relief is to enable the new atomics ABI by a
command line switch and promote its use. And at one point in the future (if
there are enough fixes to justify a break) the new ABI can be enabled by
default with a new flag to enable the old ABI.


>
> >
> > Thx,
> > -Vineet
> >
> >> Changes for v2:
> >> * Guard LL/SC sequence by compiler barriers ("blockage")
> >>    (suggested by Andrew Waterman)
> >> * Changed commit message for AMOSWAP->STORE change
> >>    (suggested by Andrew Waterman)
> >> * Extracted cbranch4 patch from patchset (suggested by Kito Cheng)
> >> * Introduce predicate riscv_sync_memory_operand (suggested by Jim
> Wilson)
> >> * Fix small code style issue
> >>
> >> Christoph Muellner (10):
> >>    RISC-V: Simplify memory model code [PR 100265]
> >>    RISC-V: Emit proper memory ordering suffixes for AMOs [PR 100265]
> >>    RISC-V: Eliminate %F specifier from riscv_print_operand() [PR 100265]
> >>    RISC-V: Use STORE instead of AMOSWAP for atomic stores [PR 100265]
> >>    RISC-V: Emit fences according to chosen memory model [PR 100265]
> >>    RISC-V: Implement atomic_{load,store} [PR 100265]
> >>    RISC-V: Model INSNs for LR and SC [PR 100266]
> >>    RISC-V: Add s.ext-consuming INSNs for LR and SC [PR 100266]
> >>    RISC-V: Provide programmatic implementation of CAS [PR 100266]
> >>    RISC-V: Introduce predicate "riscv_sync_memory_operand" [PR 100266]
> >>
> >>   gcc/config/riscv/riscv-protos.h |   1 +
> >>   gcc/config/riscv/riscv.c        | 136 +++++++++++++-------
> >>   gcc/config/riscv/sync.md        | 216 +++++++++++++++++++++-----------
> >>   3 files changed, 235 insertions(+), 118 deletions(-)
> >>
>
Jeff Law Oct. 11, 2022, 11:14 p.m. UTC | #4
On 10/11/22 13:31, Palmer Dabbelt wrote:
> On Tue, 11 Oct 2022 12:06:27 PDT (-0700), Vineet Gupta wrote:
>> Hi Christoph, Kito,
>>
>> On 5/5/21 12:36, Christoph Muellner via Gcc-patches wrote:
>>> This series provides a cleanup of the current atomics implementation
>>> of RISC-V:
>>>
>>> * PR100265: Use proper fences for atomic load/store
>>> * PR100266: Provide programmatic implementation of CAS
>>>
>>> As both are very related, I merged the patches into one series.
>>>
>>> The first patch could be squashed into the following patches,
>>> but I found it easier to understand the chances with it in place.
>>>
>>> The series has been tested as follows:
>>> * Building and testing a multilib RV32/64 toolchain
>>>    (bootstrapped with riscv-gnu-toolchain repo)
>>> * Manual review of generated sequences for GCC's atomic builtins API
>>>
>>> The programmatic re-implementation of CAS benefits from a REE 
>>> improvement
>>> (see PR100264):
>>> https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568680.html
>>> If this patch is not in place, then an additional extension instruction
>>> is emitted after the SC.W (in case of RV64 and CAS for uint32_t).
>>>
>>> Further, the new CAS code requires cbranch INSN helpers to be present:
>>> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569689.html
>>
>> I was wondering is this patchset is blocked on some technical grounds.
>
> There's a v3 (though I can't find all of it, so not quite sure what 
> happened), but IIUC that still has the same fundamental problems that 
> all these have had: changing over to the new fence model may by an ABI 
> break and the split CAS implementation doesn't ensure eventual success 
> (see Jim's comments).  Not sure if there's other comments floating 
> around, though, that's just what I remember.

Do we have a pointer to the ABI discussion.  I've been meaning to 
familiarize myself with the issues in this space and that seems like a 
good place to start given its blocking progress on the atomics.


jeff
Vineet Gupta Oct. 11, 2022, 11:31 p.m. UTC | #5
On 10/11/22 13:46, Christoph Müllner wrote:
> On Tue, Oct 11, 2022 at 9:31 PM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>
>     On Tue, 11 Oct 2022 12:06:27 PDT (-0700), Vineet Gupta wrote:
>     > Hi Christoph, Kito,
>     >
>     > On 5/5/21 12:36, Christoph Muellner via Gcc-patches wrote:
>     >> This series provides a cleanup of the current atomics
>     implementation
>     >> of RISC-V:
>     >>
>     >> * PR100265: Use proper fences for atomic load/store
>     >> * PR100266: Provide programmatic implementation of CAS
>     >>
>     >> As both are very related, I merged the patches into one series.
>     >>
>     >> The first patch could be squashed into the following patches,
>     >> but I found it easier to understand the chances with it in place.
>     >>
>     >> The series has been tested as follows:
>     >> * Building and testing a multilib RV32/64 toolchain
>     >>    (bootstrapped with riscv-gnu-toolchain repo)
>     >> * Manual review of generated sequences for GCC's atomic
>     builtins API
>     >>
>     >> The programmatic re-implementation of CAS benefits from a REE
>     improvement
>     >> (see PR100264):
>     >> https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568680.html
>     >> If this patch is not in place, then an additional extension
>     instruction
>     >> is emitted after the SC.W (in case of RV64 and CAS for uint32_t).
>     >>
>     >> Further, the new CAS code requires cbranch INSN helpers to be
>     present:
>     >> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569689.html
>     >
>     > I was wondering is this patchset is blocked on some technical
>     grounds.
>
>     There's a v3 (though I can't find all of it, so not quite sure what
>     happened), but IIUC that still has the same fundamental problems that
>     all these have had: changing over to the new fence model may by an
>     ABI
>     break and the split CAS implementation doesn't ensure eventual
>     success
>     (see Jim's comments).  Not sure if there's other comments floating
>     around, though, that's just what I remember.
>
>
> v3 was sent on May 27, 2022, when I rebased this on an internal tree:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595712.html
> I dropped the CAS patch in v3 (issue: stack spilling under extreme 
> register pressure instead of erroring out) as I thought that this was 
> the blocker for the series.
> I just learned a few weeks ago, when I asked Palmer at the GNU 
> Cauldron about this series, that the ABI break is the blocker.

Yeah I was confused about the ABI aspect as I didn't see any mention of 
that in the public reviews of v1 and v2.

> My initial understanding was that fixing something broken cannot be an 
> ABI break.
> And that the mismatch of the implementation in 2021 and the 
> recommended mappings in the ratified specification from 2019 is 
> something that is broken. I still don't know the background here, but 
> I guess this assumption is incorrect from a historical point of view.
>
> However, I'm sure that I am not the only one that assumes the mappings 
> in the specification to be implemented in compilers and tools. 
> Therefore I still consider the implementation of the RISC-V atomics in 
> GCC as broken (at least w.r.t. user expectation from people that lack 
> the historical background and just read the RISC-V specification).
>
>
>     +Andrea, in case he has time to look at the memory model / ABI
>     issues.
>
>     We'd still need to sort out the CAS issues, though, and it's not
>     abundantly clear it's worth the work: we're essentailly
>     constrained to
>     just emitting those fixed CAS sequences due to the eventual success
>     rules, so it's not clear what the benefit of splitting those up is.
>     With WRS there are some routines we might want to generate code for
>     (cond_read_acquire() in Linux, for example) but we'd really need
>     to dig
>     into those to see if it's even sane/fast.
>
>     There's another patch set to fix the lack of inline atomic routines
>     without breaking stuff, there were some minor comments from Kito and
>     IIRC I had some test failures that I needed to chase down as well.
>     That's a much safer fix in the short term, we'll need to deal with
>     this
>     eventually but at least we can stop the libatomic issues for the
>     distro
>     folks.
>
>
> I expect that the pressure for a proper fix upstream (instead of a 
> backward compatible compromise) will increase over time (once people 
> start building big iron based on RISC-V and start hunting performance 
> bottlenecks in multithreaded workloads to be competitive).
> What could be done to get some relief is to enable the new atomics ABI 
> by a command line switch and promote its use. And at one point in the 
> future (if there are enough fixes to justify a break) the new ABI can 
> be enabled by default with a new flag to enable the old ABI.

Indeed we are stuck with inefficiencies with status quo. The new abi 
option sounds like a reasonable plan going fwd.

Also my understand is that while the considerations are ABI centric, the 
option to faciliate this need not be tied to canonical -mabi=lp32, lp64d 
etc. It might just be a toggle as -matomic=legacy,2019 etc (this is not 
suggestive just indicative). Otherwise there's another level of blowup 
in multilib testing etc.

-Vineet
Palmer Dabbelt Oct. 12, 2022, 12:15 a.m. UTC | #6
On Tue, 11 Oct 2022 16:31:25 PDT (-0700), Vineet Gupta wrote:
>
>
> On 10/11/22 13:46, Christoph Müllner wrote:
>> On Tue, Oct 11, 2022 at 9:31 PM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>>
>>     On Tue, 11 Oct 2022 12:06:27 PDT (-0700), Vineet Gupta wrote:
>>     > Hi Christoph, Kito,
>>     >
>>     > On 5/5/21 12:36, Christoph Muellner via Gcc-patches wrote:
>>     >> This series provides a cleanup of the current atomics
>>     implementation
>>     >> of RISC-V:
>>     >>
>>     >> * PR100265: Use proper fences for atomic load/store
>>     >> * PR100266: Provide programmatic implementation of CAS
>>     >>
>>     >> As both are very related, I merged the patches into one series.
>>     >>
>>     >> The first patch could be squashed into the following patches,
>>     >> but I found it easier to understand the chances with it in place.
>>     >>
>>     >> The series has been tested as follows:
>>     >> * Building and testing a multilib RV32/64 toolchain
>>     >>    (bootstrapped with riscv-gnu-toolchain repo)
>>     >> * Manual review of generated sequences for GCC's atomic
>>     builtins API
>>     >>
>>     >> The programmatic re-implementation of CAS benefits from a REE
>>     improvement
>>     >> (see PR100264):
>>     >> https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568680.html
>>     >> If this patch is not in place, then an additional extension
>>     instruction
>>     >> is emitted after the SC.W (in case of RV64 and CAS for uint32_t).
>>     >>
>>     >> Further, the new CAS code requires cbranch INSN helpers to be
>>     present:
>>     >> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569689.html
>>     >
>>     > I was wondering is this patchset is blocked on some technical
>>     grounds.
>>
>>     There's a v3 (though I can't find all of it, so not quite sure what
>>     happened), but IIUC that still has the same fundamental problems that
>>     all these have had: changing over to the new fence model may by an
>>     ABI
>>     break and the split CAS implementation doesn't ensure eventual
>>     success
>>     (see Jim's comments).  Not sure if there's other comments floating
>>     around, though, that's just what I remember.
>>
>>
>> v3 was sent on May 27, 2022, when I rebased this on an internal tree:
>> https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595712.html
>> I dropped the CAS patch in v3 (issue: stack spilling under extreme 
>> register pressure instead of erroring out) as I thought that this was 
>> the blocker for the series.
>> I just learned a few weeks ago, when I asked Palmer at the GNU 
>> Cauldron about this series, that the ABI break is the blocker.
>
> Yeah I was confused about the ABI aspect as I didn't see any mention of 
> that in the public reviews of v1 and v2.

Sorry, I thought we'd talked about it somewhere but it must have just 
been in meetings and such.  Patrick was writing a similar patch set 
around the same time so it probably just got tied up in that, we ended 
up reducing it to just the strong CAS inline stuff because we couldn't 
sort out the correctness of the rest of it.

>> My initial understanding was that fixing something broken cannot be an 
>> ABI break.
>> And that the mismatch of the implementation in 2021 and the 
>> recommended mappings in the ratified specification from 2019 is 
>> something that is broken. I still don't know the background here, but 
>> I guess this assumption is incorrect from a historical point of view.

We agreed that we wouldn't break binaries back when we submitted the 
port.  The ISA has changed many times since then, including adding the 
recommended mappings, but those binaries exist and we can't just 
silently break things for users.

>> However, I'm sure that I am not the only one that assumes the mappings 
>> in the specification to be implemented in compilers and tools. 
>> Therefore I still consider the implementation of the RISC-V atomics in 
>> GCC as broken (at least w.r.t. user expectation from people that lack 
>> the historical background and just read the RISC-V specification).

You can't just read one of those RISC-V PDFs and assume that 
implementations that match those words will function correctly.  Those 
words regularly change in ways where reasonable readers would end up 
with incompatible implementations due to those differences.  That's why 
we're so explicit about versions and such these days, we're just getting 
burned by these old mappings because they're from back when we though 
the RISC-V definition of compatibility was going to match the more 
common one and we didn't build in fallbacks.

>>     +Andrea, in case he has time to look at the memory model / ABI
>>     issues.
>>
>>     We'd still need to sort out the CAS issues, though, and it's not
>>     abundantly clear it's worth the work: we're essentailly
>>     constrained to
>>     just emitting those fixed CAS sequences due to the eventual success
>>     rules, so it's not clear what the benefit of splitting those up is.
>>     With WRS there are some routines we might want to generate code for
>>     (cond_read_acquire() in Linux, for example) but we'd really need
>>     to dig
>>     into those to see if it's even sane/fast.
>>
>>     There's another patch set to fix the lack of inline atomic routines
>>     without breaking stuff, there were some minor comments from Kito and
>>     IIRC I had some test failures that I needed to chase down as well.
>>     That's a much safer fix in the short term, we'll need to deal with
>>     this
>>     eventually but at least we can stop the libatomic issues for the
>>     distro
>>     folks.
>>
>>
>> I expect that the pressure for a proper fix upstream (instead of a 
>> backward compatible compromise) will increase over time (once people 
>> start building big iron based on RISC-V and start hunting performance 
>> bottlenecks in multithreaded workloads to be competitive).
>> What could be done to get some relief is to enable the new atomics ABI 
>> by a command line switch and promote its use. And at one point in the 
>> future (if there are enough fixes to justify a break) the new ABI can 
>> be enabled by default with a new flag to enable the old ABI.
>
> Indeed we are stuck with inefficiencies with status quo. The new abi 
> option sounds like a reasonable plan going fwd.

I don't think we're just stuck with the status quo, we really just need 
to go through the mappings and figure out which can be made both fast 
and ABI-compatible.  Then we can fix those and see where we stand, maybe 
it's good enough or maybe we need to introduce some sort of 
compatibility break to make things faster (and/or compatible with LLVM, 
where I suspect we're broken right now).

If we do need a break then I think it's probably possible to do it in 
phases, where we have a middle-ground compatibility mode that works for 
both the old and new mappings so distros can gradually move over as they 
rebuild packages.

Issues like the libstdc++ shared_ptr/mutex fallback don't map well to 
that, though.  There's also some stuff like the IO fence bits that we 
can probably just add an argument for now, those were likely just a bad 
idea at the time and should be safe to turn off for the vast majority of 
users (though those are more of an API break).

> Also my understand is that while the considerations are ABI centric, the 
> option to faciliate this need not be tied to canonical -mabi=lp32, lp64d 
> etc. It might just be a toggle as -matomic=legacy,2019 etc (this is not 
> suggestive just indicative). Otherwise there's another level of blowup 
> in multilib testing etc.

The psABI doesn't mention memory ordering at all.  IIUC that's a pretty 
standard hole in psABI documents, but it means we're in a grey area 
here.

+Jeff, who was offering to help when the threads got crossed.  I'd 
punted on a lot of this in the hope Andrea could help out, as I'm not 
really a memory model guy and this is pretty far down the rabbit hole.  
Happy to have the help if you're offering, though, as what's there is 
likely a pretty big performance issue for anyone with a reasonable 
memory system.

> -Vineet
Christoph Müllner Oct. 12, 2022, 8:03 a.m. UTC | #7
On Wed, Oct 12, 2022 at 2:15 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:

> On Tue, 11 Oct 2022 16:31:25 PDT (-0700), Vineet Gupta wrote:
> >
> >
> > On 10/11/22 13:46, Christoph Müllner wrote:
> >> On Tue, Oct 11, 2022 at 9:31 PM Palmer Dabbelt <palmer@dabbelt.com>
> wrote:
> >>
> >>     On Tue, 11 Oct 2022 12:06:27 PDT (-0700), Vineet Gupta wrote:
> >>     > Hi Christoph, Kito,
> >>     >
> >>     > On 5/5/21 12:36, Christoph Muellner via Gcc-patches wrote:
> >>     >> This series provides a cleanup of the current atomics
> >>     implementation
> >>     >> of RISC-V:
> >>     >>
> >>     >> * PR100265: Use proper fences for atomic load/store
> >>     >> * PR100266: Provide programmatic implementation of CAS
> >>     >>
> >>     >> As both are very related, I merged the patches into one series.
> >>     >>
> >>     >> The first patch could be squashed into the following patches,
> >>     >> but I found it easier to understand the chances with it in place.
> >>     >>
> >>     >> The series has been tested as follows:
> >>     >> * Building and testing a multilib RV32/64 toolchain
> >>     >>    (bootstrapped with riscv-gnu-toolchain repo)
> >>     >> * Manual review of generated sequences for GCC's atomic
> >>     builtins API
> >>     >>
> >>     >> The programmatic re-implementation of CAS benefits from a REE
> >>     improvement
> >>     >> (see PR100264):
> >>     >> https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568680.html
> >>     >> If this patch is not in place, then an additional extension
> >>     instruction
> >>     >> is emitted after the SC.W (in case of RV64 and CAS for uint32_t).
> >>     >>
> >>     >> Further, the new CAS code requires cbranch INSN helpers to be
> >>     present:
> >>     >> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569689.html
> >>     >
> >>     > I was wondering is this patchset is blocked on some technical
> >>     grounds.
> >>
> >>     There's a v3 (though I can't find all of it, so not quite sure what
> >>     happened), but IIUC that still has the same fundamental problems
> that
> >>     all these have had: changing over to the new fence model may by an
> >>     ABI
> >>     break and the split CAS implementation doesn't ensure eventual
> >>     success
> >>     (see Jim's comments).  Not sure if there's other comments floating
> >>     around, though, that's just what I remember.
> >>
> >>
> >> v3 was sent on May 27, 2022, when I rebased this on an internal tree:
> >> https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595712.html
> >> I dropped the CAS patch in v3 (issue: stack spilling under extreme
> >> register pressure instead of erroring out) as I thought that this was
> >> the blocker for the series.
> >> I just learned a few weeks ago, when I asked Palmer at the GNU
> >> Cauldron about this series, that the ABI break is the blocker.
> >
> > Yeah I was confused about the ABI aspect as I didn't see any mention of
> > that in the public reviews of v1 and v2.
>
> Sorry, I thought we'd talked about it somewhere but it must have just
> been in meetings and such.  Patrick was writing a similar patch set
> around the same time so it probably just got tied up in that, we ended
> up reducing it to just the strong CAS inline stuff because we couldn't
> sort out the correctness of the rest of it.
>
> >> My initial understanding was that fixing something broken cannot be an
> >> ABI break.
> >> And that the mismatch of the implementation in 2021 and the
> >> recommended mappings in the ratified specification from 2019 is
> >> something that is broken. I still don't know the background here, but
> >> I guess this assumption is incorrect from a historical point of view.
>
> We agreed that we wouldn't break binaries back when we submitted the
> port.  The ISA has changed many times since then, including adding the
> recommended mappings, but those binaries exist and we can't just
> silently break things for users.
>
> >> However, I'm sure that I am not the only one that assumes the mappings
> >> in the specification to be implemented in compilers and tools.
> >> Therefore I still consider the implementation of the RISC-V atomics in
> >> GCC as broken (at least w.r.t. user expectation from people that lack
> >> the historical background and just read the RISC-V specification).
>
> You can't just read one of those RISC-V PDFs and assume that
> implementations that match those words will function correctly.  Those
> words regularly change in ways where reasonable readers would end up
> with incompatible implementations due to those differences.  That's why
> we're so explicit about versions and such these days, we're just getting
> burned by these old mappings because they're from back when we though
> the RISC-V definition of compatibility was going to match the more
> common one and we didn't build in fallbacks.
>

Indeed, read-and-assume might not be the best idea.
But read-and-ignore (in the hope everyone else does as well) won't help
either.

I think it is reasonable to expect this detailed mapping table from the
specification being implemented.
Why would it have been written and why would one ignore it?

But let's look at the current situation and the proposed solutions so far
(below).


>
> >>     +Andrea, in case he has time to look at the memory model / ABI
> >>     issues.
> >>
> >>     We'd still need to sort out the CAS issues, though, and it's not
> >>     abundantly clear it's worth the work: we're essentailly
> >>     constrained to
> >>     just emitting those fixed CAS sequences due to the eventual success
> >>     rules, so it's not clear what the benefit of splitting those up is.
> >>     With WRS there are some routines we might want to generate code for
> >>     (cond_read_acquire() in Linux, for example) but we'd really need
> >>     to dig
> >>     into those to see if it's even sane/fast.
> >>
> >>     There's another patch set to fix the lack of inline atomic routines
> >>     without breaking stuff, there were some minor comments from Kito and
> >>     IIRC I had some test failures that I needed to chase down as well.
> >>     That's a much safer fix in the short term, we'll need to deal with
> >>     this
> >>     eventually but at least we can stop the libatomic issues for the
> >>     distro
> >>     folks.
> >>
> >>
> >> I expect that the pressure for a proper fix upstream (instead of a
> >> backward compatible compromise) will increase over time (once people
> >> start building big iron based on RISC-V and start hunting performance
> >> bottlenecks in multithreaded workloads to be competitive).
> >> What could be done to get some relief is to enable the new atomics ABI
> >> by a command line switch and promote its use. And at one point in the
> >> future (if there are enough fixes to justify a break) the new ABI can
> >> be enabled by default with a new flag to enable the old ABI.
> >
> > Indeed we are stuck with inefficiencies with status quo. The new abi
> > option sounds like a reasonable plan going fwd.
>
> I don't think we're just stuck with the status quo, we really just need
> to go through the mappings and figure out which can be made both fast
> and ABI-compatible.  Then we can fix those and see where we stand, maybe
> it's good enough or maybe we need to introduce some sort of
> compatibility break to make things faster (and/or compatible with LLVM,
> where I suspect we're broken right now).
>

So we have the following atomics ABIs:
 I) GCC implementation
 II) LLVM implementation
 III) Specified ABI in the "Code Porting and Mapping Guidelines" appendix
of the RISC-V specification

And there are two proposed solutions:
 a) Finding a new ABI that is compatible with I) and II) is of course a
solution, but we don't know if and when such a solution exists.
 b) Going to introduce III) causes a break and therefore needs special care
(e.g. let the user decide via command line flag or provide a
compatibility mode).

I don't see that a) and b) contradict each other.
Why not going for both:
 -) Continue to work on a backward compatible solution
 -) Enable the "new" ABI from the specification appendix via command
line flag
 -) Reevaluate the situation in 12 months to decide the next steps



>
> If we do need a break then I think it's probably possible to do it in
> phases, where we have a middle-ground compatibility mode that works for
> both the old and new mappings so distros can gradually move over as they
> rebuild packages.
>
> Issues like the libstdc++ shared_ptr/mutex fallback don't map well to
> that, though.  There's also some stuff like the IO fence bits that we
> can probably just add an argument for now, those were likely just a bad
> idea at the time and should be safe to turn off for the vast majority of
> users (though those are more of an API break).
>
> > Also my understand is that while the considerations are ABI centric, the
> > option to faciliate this need not be tied to canonical -mabi=lp32, lp64d
> > etc. It might just be a toggle as -matomic=legacy,2019 etc (this is not
> > suggestive just indicative). Otherwise there's another level of blowup
> > in multilib testing etc.
>
> The psABI doesn't mention memory ordering at all.  IIUC that's a pretty
> standard hole in psABI documents, but it means we're in a grey area
> here.
>
> +Jeff, who was offering to help when the threads got crossed.  I'd
> punted on a lot of this in the hope Andrea could help out, as I'm not
> really a memory model guy and this is pretty far down the rabbit hole.
> Happy to have the help if you're offering, though, as what's there is
> likely a pretty big performance issue for anyone with a reasonable
> memory system.
>
> > -Vineet
>
Andrea Parri Oct. 12, 2022, 5:16 p.m. UTC | #8
> > >     +Andrea, in case he has time to look at the memory model / ABI
> > >     issues.

> +Jeff, who was offering to help when the threads got crossed.  I'd punted on
> a lot of this in the hope Andrea could help out, as I'm not really a memory
> model guy and this is pretty far down the rabbit hole.  Happy to have the
> help if you're offering, though, as what's there is likely a pretty big
> performance issue for anyone with a reasonable memory system.

Thanks for linking me to the discussion and the remarks, Palmer.  I'm
happy to help (and synchronized with Jeff/the community) as possible,
building a better understanding of the 'issues' at stake.

  Andrea
Jeff Law Oct. 13, 2022, 10:39 p.m. UTC | #9
On 10/11/22 17:31, Vineet Gupta wrote:
>
>>
>> I expect that the pressure for a proper fix upstream (instead of a 
>> backward compatible compromise) will increase over time (once people 
>> start building big iron based on RISC-V and start hunting performance 
>> bottlenecks in multithreaded workloads to be competitive).
>> What could be done to get some relief is to enable the new atomics 
>> ABI by a command line switch and promote its use. And at one point in 
>> the future (if there are enough fixes to justify a break) the new ABI 
>> can be enabled by default with a new flag to enable the old ABI.
>
> Indeed we are stuck with inefficiencies with status quo. The new abi 
> option sounds like a reasonable plan going fwd.
>
> Also my understand is that while the considerations are ABI centric, 
> the option to faciliate this need not be tied to canonical -mabi=lp32, 
> lp64d etc. It might just be a toggle as -matomic=legacy,2019 etc (this 
> is not suggestive just indicative). Otherwise there's another level of 
> blowup in multilib testing etc.

If I understand the history here, we're essentially catering to code 
that is potentially relying on behavior that was never really 
guaranteed.   That's not really ABI -- it's more depending on specifics 
of an implementation or undefined/underdefined behavior.    Holding back 
progress for that case seems short-sighted, particularly given how early 
I hope we are in the RISC-V journey.


But I'm also sympathetic to the desire not to break existing code.  
Could we keep the old behavior under a flag and fix the default behavior 
here, presumably with a bit in the ELF header indicating code that wants 
the old behavior?


Jeff
Jeff Law Oct. 13, 2022, 11:04 p.m. UTC | #10
On 10/11/22 18:15, Palmer Dabbelt wrote:
>
> Sorry, I thought we'd talked about it somewhere but it must have just 
> been in meetings and such.  Patrick was writing a similar patch set 
> around the same time so it probably just got tied up in that, we ended 
> up reducing it to just the strong CAS inline stuff because we couldn't 
> sort out the correctness of the rest of it.

Now that you mention it, I vaguely recall a discussion about inline 
atomics vs libatomic and the discussion on this issue might have been 
there rather than in Christophe's patchset.


>
>>> My initial understanding was that fixing something broken cannot be 
>>> an ABI break.
>>> And that the mismatch of the implementation in 2021 and the 
>>> recommended mappings in the ratified specification from 2019 is 
>>> something that is broken. I still don't know the background here, 
>>> but I guess this assumption is incorrect from a historical point of 
>>> view.
>
> We agreed that we wouldn't break binaries back when we submitted the 
> port.  The ISA has changed many times since then, including adding the 
> recommended mappings, but those binaries exist and we can't just 
> silently break things for users.

That may be too strong of a policy -- just because something worked in 
the past doesn't mean it must always work.  It really depends on the 
contracts specified by the ABI, processor reference documentation, etc 
and whether or not the code relies on something that it outside those 
contracts.  If it does rely on behavior outside the contracts, then we 
shouldn't be constrained by such an agreement.

Another way to think about this problem is do we want more code making 
incorrect assumptions about the behavior of atomics getting out in the 
wild?  My take would be that we nip this in the bud, get it right now in 
the default configuration, but leave enough bits in place that existing 
code continues to work.

>
>>> However, I'm sure that I am not the only one that assumes the 
>>> mappings in the specification to be implemented in compilers and 
>>> tools. Therefore I still consider the implementation of the RISC-V 
>>> atomics in GCC as broken (at least w.r.t. user expectation from 
>>> people that lack the historical background and just read the RISC-V 
>>> specification).
>
> You can't just read one of those RISC-V PDFs and assume that 
> implementations that match those words will function correctly. Those 
> words regularly change in ways where reasonable readers would end up 
> with incompatible implementations due to those differences.  That's 
> why we're so explicit about versions and such these days, we're just 
> getting burned by these old mappings because they're from back when we 
> though the RISC-V definition of compatibility was going to match the 
> more common one and we didn't build in fallbacks.

Fair point, but in my mind that argues that the platform must mature 
further so that the contracts can be relied upon.  That obviously needs 
to get fixed and until it does any agreements or guarantees about 
behavior of existing code can't be reasonably made.  If we're going to 
be taken seriously, then those fundamentals have to be rock solid.


>
> I don't think we're just stuck with the status quo, we really just 
> need to go through the mappings and figure out which can be made both 
> fast and ABI-compatible.  Then we can fix those and see where we 
> stand, maybe it's good enough or maybe we need to introduce some sort 
> of compatibility break to make things faster (and/or compatible with 
> LLVM, where I suspect we're broken right now).

Certainly seems like a good first step.  What we can fix without 
breaking things we do while we sort out the tougher problems.


>
> If we do need a break then I think it's probably possible to do it in 
> phases, where we have a middle-ground compatibility mode that works 
> for both the old and new mappings so distros can gradually move over 
> as they rebuild packages.

As someone that lived in the distro space for a long time, I would argue 
that now is the time to fix this stuff -- before there is a large uptake 
in distro consumption.


>
> +Jeff, who was offering to help when the threads got crossed.  I'd 
> punted on a lot of this in the hope Andrea could help out, as I'm not 
> really a memory model guy and this is pretty far down the rabbit 
> hole.  Happy to have the help if you're offering, though, as what's 
> there is likely a pretty big performance issue for anyone with a 
> reasonable memory system.

Hmm, there's a case I'm pondering if I can discuss  or not. Probably not 
since I can't recall it ever being discussed in public.  So I'll just 
say this space can be critically important for performance and the 
longer we wait, the tougher it gets to fix without causing significant 
disruption.

Jeff
Jeff Law Oct. 13, 2022, 11:11 p.m. UTC | #11
On 10/12/22 02:03, Christoph Müllner wrote:
>
>
> So we have the following atomics ABIs:
>  I) GCC implementation
>  II) LLVM implementation
>  III) Specified ABI in the "Code Porting and Mapping Guidelines" 
> appendix of the RISC-V specification

And presumably we don't have any way to distinguish between I and II at 
the DSO or object level.  That implies that if we're going to get to 
III, then we have to mark new code.  We obviously can't mark 
pre-existing bits (and I may have implied we should do that in my 
earlier message, my goof).



>
> And there are two proposed solutions:
>  a) Finding a new ABI that is compatible with I) and II) is of course 
> a solution, but we don't know if and when such a solution exists.
>  b) Going to introduce III) causes a break and therefore needs special 
> care (e.g. let the user decide via command line flag or provide a 
> compatibility mode).
>
> I don't see that a) and b) contradict each other.
> Why not going for both:
>  -) Continue to work on a backward compatible solution
>  -) Enable the "new" ABI from the specification appendix via command 
> line flag
>  -) Reevaluate the situation in 12 months to decide the next steps
I would lean towards making the new, more correct, behavior the default 
and having the old behavior enabled by a command line flag. But 
otherwise what you're suggesting seems reasonable.


Jeff
Palmer Dabbelt Oct. 13, 2022, 11:14 p.m. UTC | #12
On Thu, 13 Oct 2022 15:39:39 PDT (-0700), gcc-patches@gcc.gnu.org wrote:
>
> On 10/11/22 17:31, Vineet Gupta wrote:
>>
>>>
>>> I expect that the pressure for a proper fix upstream (instead of a
>>> backward compatible compromise) will increase over time (once people
>>> start building big iron based on RISC-V and start hunting performance
>>> bottlenecks in multithreaded workloads to be competitive).
>>> What could be done to get some relief is to enable the new atomics
>>> ABI by a command line switch and promote its use. And at one point in
>>> the future (if there are enough fixes to justify a break) the new ABI
>>> can be enabled by default with a new flag to enable the old ABI.
>>
>> Indeed we are stuck with inefficiencies with status quo. The new abi
>> option sounds like a reasonable plan going fwd.
>>
>> Also my understand is that while the considerations are ABI centric,
>> the option to faciliate this need not be tied to canonical -mabi=lp32,
>> lp64d etc. It might just be a toggle as -matomic=legacy,2019 etc (this
>> is not suggestive just indicative). Otherwise there's another level of
>> blowup in multilib testing etc.
>
> If I understand the history here, we're essentially catering to code
> that is potentially relying on behavior that was never really
> guaranteed.   That's not really ABI -- it's more depending on specifics
> of an implementation or undefined/underdefined behavior.    Holding back
> progress for that case seems short-sighted, particularly given how early
> I hope we are in the RISC-V journey.
>
>
> But I'm also sympathetic to the desire not to break existing code. 

I suppose ABI means a ton of things, but in this case that's really want 
I was trying to get at: just changing to the mappings suggested in the 
ISA manual risks producing binaries that don't work when mixed with old 
binaries.  My rationale for calling that ABI was that there's a defacto 
memory model ABI defined as "anything that works with the old binaries", 
but ABI means so many things maybe we just shouldn't say it at all here?

I'm going to call it "old binary compatibility" here, rather than "ABI 
compatibility", just so it's a different term.

> Could we keep the old behavior under a flag and fix the default behavior
> here, presumably with a bit in the ELF header indicating code that wants
> the old behavior?

The thread got forked a bit, but that's essentially what I was trying to 
suggest.  I talked with Andrea a bit and here's how I'd describe it:

We have a mapping from the C{,++}11 memory model to RVWMO that's 
currently emitted by GCC, and there are years of binaries with that 
mapping.  As far as we know that mapping is correct, but I don't think 
anyone's gone through and formally analyzed it.  Let's call those the 
"GCC mapping".

There's also a mapping listed in the ISA manual.  That's not the same as 
the GCC mapping, so let's call it the "ISA mapping".  We need to 
double-check the specifics, but IIUC this ISA mapping is broadly 
followed by LLVM.  It's also very likely to be correct, as it's been 
analyzed by lots of formal memory model people as part of the RVWMO 
standardization process.

As far as I know, everyone likes the ISA mapping better than the GCC 
mapping.  It's hard to describe why concretely because there's no 
hardware that implements RVWMO sufficiently aggressively that we can 
talk performance, but at least matching what's in the ISA manual is the 
way to go.  Also, just kind of a personal opinion, the GCC mapping does 
some weird ugly stuff.

So the question is really: how do we get rid of the GCC mapping while 
causing as little headache as possible?

My proposal is as follows:

* Let's properly analyze the GCC mapping, just on its own.  Maybe it's 
  broken, if it is then I think we've got a pretty decent argument to 
  just ignore old binary compatibility -- if it was always broken then 
  we can't break it, so who cares.
* Assuming the GCC mapping is internally consistent, let's analyze 
  arbitrary mixes of the GCC and ISA mappings.  It's not generally true 
  that two correct mappings can be freely mixed, but maybe we've got 
  some cases that work fine.  Maybe we even get lucky and everything's 
  compatible, who knows (though I'm worried about the same .aq vs fence 
  stuff we've had in LKMM a few times).
* Assuming there's any issue mixing the GCC and ISA mappings, let's add 
  a flag to GCC.  Something like -mrvwmo-compat={legacy,both,isa}, where:
    - =legacy does what we have now.  We can eventually deprecate this, 
      and assuming =both is fast enough maybe we don't even need it.
    - =both implements a mapping that's compatible with both the GCC and 
      ISA mappings.  This might be slow or something, but it'll be 
      correct with both.  We can set this to the default now, as it's 
      safe.
    - =isa implements the ISA mappings.

Then we can go stick some marker in the ELF saying "this binary is 
compatible with the ISA mappings" to triage binaries in systems.  That 
way distros can decide when they want to move to -mrvwmo-compat=isa by 
default, maybe when they naturally do a world rebuild.  Eventually we 
can make that the default in GCC upstream and later deprecate the 
compatibility modes.

That's going to take a bit of work on the formal memory model side of 
things, but I think we've at least got enough of Andrea's bandwidth to 
make it happen in a reasonable timeframe.  I don't think there's a ton 
of implementation work to do once we sort out the desired mappings, and 
a chunk of that can probably start in parallel as we'll likely need 
stuff like the ELF tagging eventually.

Landing this before stage4 might be tricky, though.  I tell myself we're 
not going to take late backend features every release... ;)
Vineet Gupta Oct. 14, 2022, 12:14 a.m. UTC | #13
On 10/13/22 15:39, Jeff Law via Gcc-patches wrote:
> 
> On 10/11/22 17:31, Vineet Gupta wrote:
>>
>>>
>>> I expect that the pressure for a proper fix upstream (instead of a 
>>> backward compatible compromise) will increase over time (once people 
>>> start building big iron based on RISC-V and start hunting performance 
>>> bottlenecks in multithreaded workloads to be competitive).
>>> What could be done to get some relief is to enable the new atomics 
>>> ABI by a command line switch and promote its use. And at one point in 
>>> the future (if there are enough fixes to justify a break) the new ABI 
>>> can be enabled by default with a new flag to enable the old ABI.
>>
>> Indeed we are stuck with inefficiencies with status quo. The new abi 
>> option sounds like a reasonable plan going fwd.
>>
>> Also my understand is that while the considerations are ABI centric, 
>> the option to faciliate this need not be tied to canonical -mabi=lp32, 
>> lp64d etc. It might just be a toggle as -matomic=legacy,2019 etc (this 
>> is not suggestive just indicative). Otherwise there's another level of 
>> blowup in multilib testing etc.
> 
> If I understand the history here, we're essentially catering to code 
> that is potentially relying on behavior that was never really 
> guaranteed.   That's not really ABI -- it's more depending on specifics 
> of an implementation or undefined/underdefined behavior.    Holding back 
> progress for that case seems short-sighted, particularly given how early 
> I hope we are in the RISC-V journey.

Exactly. I keep hearing about the potential ABI breakage but no real 
discussion (publicly at least) which describe in detail what exactly 
that ABI / old-behavior breakage is with this patch series [1]. So 
perhaps we can start with reviewing the patches, calling out exactly 
what change causes the divergence and if that is acceptable or not. And 
while at it, perhaps also make updates to [2]


[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595712.html
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100265
Christoph Müllner Oct. 14, 2022, 11:03 a.m. UTC | #14
On Fri, Oct 14, 2022 at 1:15 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:

> On Thu, 13 Oct 2022 15:39:39 PDT (-0700), gcc-patches@gcc.gnu.org wrote:
> >
> > On 10/11/22 17:31, Vineet Gupta wrote:
> >>
> >>>
> >>> I expect that the pressure for a proper fix upstream (instead of a
> >>> backward compatible compromise) will increase over time (once people
> >>> start building big iron based on RISC-V and start hunting performance
> >>> bottlenecks in multithreaded workloads to be competitive).
> >>> What could be done to get some relief is to enable the new atomics
> >>> ABI by a command line switch and promote its use. And at one point in
> >>> the future (if there are enough fixes to justify a break) the new ABI
> >>> can be enabled by default with a new flag to enable the old ABI.
> >>
> >> Indeed we are stuck with inefficiencies with status quo. The new abi
> >> option sounds like a reasonable plan going fwd.
> >>
> >> Also my understand is that while the considerations are ABI centric,
> >> the option to faciliate this need not be tied to canonical -mabi=lp32,
> >> lp64d etc. It might just be a toggle as -matomic=legacy,2019 etc (this
> >> is not suggestive just indicative). Otherwise there's another level of
> >> blowup in multilib testing etc.
> >
> > If I understand the history here, we're essentially catering to code
> > that is potentially relying on behavior that was never really
> > guaranteed.   That's not really ABI -- it's more depending on specifics
> > of an implementation or undefined/underdefined behavior.    Holding back
> > progress for that case seems short-sighted, particularly given how early
> > I hope we are in the RISC-V journey.
> >
> >
> > But I'm also sympathetic to the desire not to break existing code.
>
> I suppose ABI means a ton of things, but in this case that's really want
> I was trying to get at: just changing to the mappings suggested in the
> ISA manual risks producing binaries that don't work when mixed with old
> binaries.  My rationale for calling that ABI was that there's a defacto
> memory model ABI defined as "anything that works with the old binaries",
> but ABI means so many things maybe we just shouldn't say it at all here?
>
> I'm going to call it "old binary compatibility" here, rather than "ABI
> compatibility", just so it's a different term.
>
> > Could we keep the old behavior under a flag and fix the default behavior
> > here, presumably with a bit in the ELF header indicating code that wants
> > the old behavior?
>
> The thread got forked a bit, but that's essentially what I was trying to
> suggest.  I talked with Andrea a bit and here's how I'd describe it:
>
> We have a mapping from the C{,++}11 memory model to RVWMO that's
> currently emitted by GCC, and there are years of binaries with that
> mapping.  As far as we know that mapping is correct, but I don't think
> anyone's gone through and formally analyzed it.  Let's call those the
> "GCC mapping".
>
> There's also a mapping listed in the ISA manual.  That's not the same as
> the GCC mapping, so let's call it the "ISA mapping".  We need to
> double-check the specifics, but IIUC this ISA mapping is broadly
> followed by LLVM.  It's also very likely to be correct, as it's been
> analyzed by lots of formal memory model people as part of the RVWMO
> standardization process.
>
> As far as I know, everyone likes the ISA mapping better than the GCC
> mapping.  It's hard to describe why concretely because there's no
> hardware that implements RVWMO sufficiently aggressively that we can
> talk performance, but at least matching what's in the ISA manual is the
> way to go.  Also, just kind of a personal opinion, the GCC mapping does
> some weird ugly stuff.
>

My guess is people like the ISA mapping (more) because it has been
documented and reviewed.
And it is the product of a working group that worked out the
RVWMO specification.
This gives some confidence that we don't need to rework it massively
because of correctness issues in the future.

The GCC mapping has (as far as I understand) never been documented and
reviewed (especially not by the RISC-V arch review group).
It might be faster and more robust, but it also might be incomplete and
slower.
Finding this out is a research topic with an unpredictable outcome.
And more or less nobody wants to take this risk or has the time and the
required skills to work on the research.


>
> So the question is really: how do we get rid of the GCC mapping while
> causing as little headache as possible?
>
> My proposal is as follows:
>
> * Let's properly analyze the GCC mapping, just on its own.  Maybe it's
>   broken, if it is then I think we've got a pretty decent argument to
>   just ignore old binary compatibility -- if it was always broken then
>   we can't break it, so who cares.
> * Assuming the GCC mapping is internally consistent, let's analyze
>   arbitrary mixes of the GCC and ISA mappings.  It's not generally true
>   that two correct mappings can be freely mixed, but maybe we've got
>   some cases that work fine.  Maybe we even get lucky and everything's
>   compatible, who knows (though I'm worried about the same .aq vs fence
>   stuff we've had in LKMM a few times).
> * Assuming there's any issue mixing the GCC and ISA mappings, let's add
>   a flag to GCC.  Something like -mrvwmo-compat={legacy,both,isa}, where:
>     - =legacy does what we have now.  We can eventually deprecate this,
>       and assuming =both is fast enough maybe we don't even need it.
>     - =both implements a mapping that's compatible with both the GCC and
>       ISA mappings.  This might be slow or something, but it'll be
>       correct with both.  We can set this to the default now, as it's
>       safe.
>     - =isa implements the ISA mappings.
>

I think a patch for legacy/isa can be done on time for the GCC release
window.
I can rework my series accordingly.
However, I see some risk for the "both" option on the timeline, so this
would have to wait until such a compatible mapping exists.



>
> Then we can go stick some marker in the ELF saying "this binary is
> compatible with the ISA mappings" to triage binaries in systems.  That
> way distros can decide when they want to move to -mrvwmo-compat=isa by
> default, maybe when they naturally do a world rebuild.  Eventually we
> can make that the default in GCC upstream and later deprecate the
> compatibility modes.
>
> That's going to take a bit of work on the formal memory model side of
> things, but I think we've at least got enough of Andrea's bandwidth to
> make it happen in a reasonable timeframe.  I don't think there's a ton
> of implementation work to do once we sort out the desired mappings, and
> a chunk of that can probably start in parallel as we'll likely need
> stuff like the ELF tagging eventually.
>
> Landing this before stage4 might be tricky, though.  I tell myself we're
> not going to take late backend features every release... ;)
>
Jeff Law Oct. 14, 2022, 8:39 p.m. UTC | #15
On 10/14/22 05:03, Christoph Müllner wrote:
>
> My guess is people like the ISA mapping (more) because it has been 
> documented and reviewed.
> And it is the product of a working group that worked out the 
> RVWMO specification.
> This gives some confidence that we don't need to rework it massively 
> because of correctness issues in the future.

This stuff can be hard and if someone with deep experience in memory 
models has reviewed the ISA mapping, then I'd prefer it over the GCC 
mapping.   It's just more likely the experts in the memory model space 
are more likely to get it right than a bunch of compiler junkies, no 
matter how smart we think we are :-)

Maybe I'm being too optimistic, but it's not hard for me to see a path 
where GCC and LLVM both implement the ISA mapping by default.  Anything 
else is just a path of long term pain.


Jeff
Palmer Dabbelt Oct. 14, 2022, 9:57 p.m. UTC | #16
On Fri, 14 Oct 2022 13:39:33 PDT (-0700), jeffreyalaw@gmail.com wrote:
>
> On 10/14/22 05:03, Christoph Müllner wrote:
>>
>> My guess is people like the ISA mapping (more) because it has been 
>> documented and reviewed.
>> And it is the product of a working group that worked out the 
>> RVWMO specification.
>> This gives some confidence that we don't need to rework it massively 
>> because of correctness issues in the future.
>
> This stuff can be hard and if someone with deep experience in memory 
> models has reviewed the ISA mapping, then I'd prefer it over the GCC 
> mapping.   It's just more likely the experts in the memory model space 
> are more likely to get it right than a bunch of compiler junkies, no 
> matter how smart we think we are :-)

That's not really proven the case for the ISA suggested Linux mappings.  
We've been through a bunch of rounds of review upstream and that's 
resulted in some differences.  Some of that is likely related to the ISA 
mappings for Linux being incomplete (they punt on the tricky bits like 
interactions with spinlocks, which filter all over the place), and Linux 
doesn't have the same old binary compatibility issues (aka the in-kernel 
ABI is not stable between releases) so mapping churn isn't nearly as 
scary over there.

Still not much of an argument in favor of the GCC mappings, though.  I'm 
pretty sure nobody with a formal memory model backgound has looked at 
those, which is pretty much the worst spot to be in.  That said, I don't 
think we can just say the ISA mappings are the way to go, they seem to 
suffer from some similar incompleteness issues (for example, there's no 
explicit mappings for std::atomic<T>::compare_exchange_{weak,strong}).  
So we'll still need to put in the work to make sure whatever mappings 
get implemented are correct.

[
As an aside, I think LLVM is doing the wrong thing for some of the more 
complex forms of compare_exchange_weak.  For example

    #include <atomic>
    
    bool f(std::atomic<long>& p, long& o, long n) {
        return p.compare_exchange_weak(o, n, std::memory_order_acq_rel, std::memory_order_release);
    }
    
   
    $ clang-15.0.0 -std=c++17 -O3
    f(std::atomic<long>&, long&, long):                   # @f(std::atomic<long>&, long&, long)
            ld      a4, 0(a1)
    .LBB0_3:                                # =>This Inner Loop Header: Depth=1
            lr.d.aq a3, (a0)
            bne     a3, a4, .LBB0_5
            sc.d.rl a5, a2, (a0)
            bnez    a5, .LBB0_3
    .LBB0_5:
            xor     a0, a3, a4
            seqz    a0, a0
            beq     a3, a4, .LBB0_2
            sd      a3, 0(a1)
    .LBB0_2:
            ret

doesn't look to me like it provides release ordering on failure, but I'm 
not really a memory model person so maybe I'm missing something here.  
The GCC mapping is pretty ugly, but I think we do happen to have correct 
behavior in this case:

    # gcc-12.2.0 -std=c++17 -O3
    f(std::atomic<long>&, long&, long):
            ld      a5,0(a1)
            fence iorw,ow;  1: lr.d.aq a4,0(a0); bne a4,a5,1f; sc.d.aq a3,a2,0(a0); bnez a3,1b; 1:
            sub     a5,a4,a5
            seqz    a0,a5
            beq     a5,zero,.L2
            sd      a4,0(a1)
    .L2:
            andi    a0,a0,1
            ret
]

> Maybe I'm being too optimistic, but it's not hard for me to see a path 
> where GCC and LLVM both implement the ISA mapping by default.  Anything 
> else is just a path of long term pain.

I'd bet that most people want that, but in practice building any real 
systems in RISC-V land requires some degree of implementation-defined 
behavior as the specifications don't cover everything (even ignoring the 
whole PDF vs specification word games).  That's not to say we should 
just ignore what's written down, just that there's more work to do even 
if we ignore compatibility with old binaries.

I think the question here is whether it's worth putting in the extra 
work to provide a path for systems with old binaries to gradually 
upgrade to the ISA mappings, or if we just toss out those old binaries.  
I think we really need to see how bunch of a headache that compatibility 
mode is going to be, and the only way to do that is put in the time to 
analyze the GCC mappings.

That said, I don't really personally care that much about old binaries.  
Really my only argument here is that we broke binary compatibility once 
(right before we upstreamed the port), that made a handful of people 
mad, and I told them we'd never do it again.  I think we were all 
operating under the assumption that RISC-V would move an order of 
magnitude faster that it has, though, so maybe we're in a spot where 
nobody actually cares about extant binaries and we can just throw them 
all out.

If we're going to do that, though, there's a bunch of other cruft that 
we should probably toss along with the GCC mappings...

>
>
> Jeff
Palmer Dabbelt Oct. 15, 2022, 12:31 a.m. UTC | #17
On Fri, 14 Oct 2022 14:57:22 PDT (-0700), Palmer Dabbelt wrote:
> On Fri, 14 Oct 2022 13:39:33 PDT (-0700), jeffreyalaw@gmail.com wrote:
>>
>> On 10/14/22 05:03, Christoph Müllner wrote:
>>>
>>> My guess is people like the ISA mapping (more) because it has been 
>>> documented and reviewed.
>>> And it is the product of a working group that worked out the 
>>> RVWMO specification.
>>> This gives some confidence that we don't need to rework it massively 
>>> because of correctness issues in the future.
>>
>> This stuff can be hard and if someone with deep experience in memory 
>> models has reviewed the ISA mapping, then I'd prefer it over the GCC 
>> mapping.   It's just more likely the experts in the memory model space 
>> are more likely to get it right than a bunch of compiler junkies, no 
>> matter how smart we think we are :-)
>
> That's not really proven the case for the ISA suggested Linux mappings.  
> We've been through a bunch of rounds of review upstream and that's 
> resulted in some differences.  Some of that is likely related to the ISA 
> mappings for Linux being incomplete (they punt on the tricky bits like 
> interactions with spinlocks, which filter all over the place), and Linux 
> doesn't have the same old binary compatibility issues (aka the in-kernel 
> ABI is not stable between releases) so mapping churn isn't nearly as 
> scary over there.
>
> Still not much of an argument in favor of the GCC mappings, though.  I'm 
> pretty sure nobody with a formal memory model backgound has looked at 
> those, which is pretty much the worst spot to be in.  That said, I don't 
> think we can just say the ISA mappings are the way to go, they seem to 
> suffer from some similar incompleteness issues (for example, there's no 
> explicit mappings for std::atomic<T>::compare_exchange_{weak,strong}).  
> So we'll still need to put in the work to make sure whatever mappings 
> get implemented are correct.
>
> [
> As an aside, I think LLVM is doing the wrong thing for some of the more 
> complex forms of compare_exchange_weak.  For example
>
>     #include <atomic>
>     
>     bool f(std::atomic<long>& p, long& o, long n) {
>         return p.compare_exchange_weak(o, n, std::memory_order_acq_rel, std::memory_order_release);

Eric points out this is bogus code, the spec forbids release as the fail 
ordering (which akes sense).  Just kind of randomly permutating these 
ordering arguments ends up with some generated code that seems off, for 
example release/acq_rel produces lr/sc.rl (and no fences).

I don't think any of that is very relevant for the GCC discussion, 
though, as there's a bunch of other mappings specified by the ISA and we 
should just sort this one out.

>     }
>     
>    
>     $ clang-15.0.0 -std=c++17 -O3
>     f(std::atomic<long>&, long&, long):                   # @f(std::atomic<long>&, long&, long)
>             ld      a4, 0(a1)
>     .LBB0_3:                                # =>This Inner Loop Header: Depth=1
>             lr.d.aq a3, (a0)
>             bne     a3, a4, .LBB0_5
>             sc.d.rl a5, a2, (a0)
>             bnez    a5, .LBB0_3
>     .LBB0_5:
>             xor     a0, a3, a4
>             seqz    a0, a0
>             beq     a3, a4, .LBB0_2
>             sd      a3, 0(a1)
>     .LBB0_2:
>             ret
>
> doesn't look to me like it provides release ordering on failure, but I'm 
> not really a memory model person so maybe I'm missing something here.  
> The GCC mapping is pretty ugly, but I think we do happen to have correct 
> behavior in this case:
>
>     # gcc-12.2.0 -std=c++17 -O3
>     f(std::atomic<long>&, long&, long):
>             ld      a5,0(a1)
>             fence iorw,ow;  1: lr.d.aq a4,0(a0); bne a4,a5,1f; sc.d.aq a3,a2,0(a0); bnez a3,1b; 1:
>             sub     a5,a4,a5
>             seqz    a0,a5
>             beq     a5,zero,.L2
>             sd      a4,0(a1)
>     .L2:
>             andi    a0,a0,1
>             ret
> ]
>
>> Maybe I'm being too optimistic, but it's not hard for me to see a path 
>> where GCC and LLVM both implement the ISA mapping by default.  Anything 
>> else is just a path of long term pain.
>
> I'd bet that most people want that, but in practice building any real 
> systems in RISC-V land requires some degree of implementation-defined 
> behavior as the specifications don't cover everything (even ignoring the 
> whole PDF vs specification word games).  That's not to say we should 
> just ignore what's written down, just that there's more work to do even 
> if we ignore compatibility with old binaries.
>
> I think the question here is whether it's worth putting in the extra 
> work to provide a path for systems with old binaries to gradually 
> upgrade to the ISA mappings, or if we just toss out those old binaries.  
> I think we really need to see how bunch of a headache that compatibility 
> mode is going to be, and the only way to do that is put in the time to 
> analyze the GCC mappings.
>
> That said, I don't really personally care that much about old binaries.  
> Really my only argument here is that we broke binary compatibility once 
> (right before we upstreamed the port), that made a handful of people 
> mad, and I told them we'd never do it again.  I think we were all 
> operating under the assumption that RISC-V would move an order of 
> magnitude faster that it has, though, so maybe we're in a spot where 
> nobody actually cares about extant binaries and we can just throw them 
> all out.
>
> If we're going to do that, though, there's a bunch of other cruft that 
> we should probably toss along with the GCC mappings...
>
>>
>>
>> Jeff
>
> -- 
> You received this message because you are subscribed to the Google Groups "gnu-toolchain" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to gnu-toolchain+unsubscribe@rivosinc.com.
> To view this discussion on the web visit https://groups.google.com/a/rivosinc.com/d/msgid/gnu-toolchain/mhng-580bc331-c4d5-4f04-afd5-30f09f68b660%40palmer-ri-x1c9a.
> For more options, visit https://groups.google.com/a/rivosinc.com/d/optout.
Andrea Parri Oct. 20, 2022, 7:01 p.m. UTC | #18
On Wed, Oct 12, 2022 at 07:16:20PM +0200, Andrea Parri wrote:
> > > >     +Andrea, in case he has time to look at the memory model / ABI
> > > >     issues.
> 
> > +Jeff, who was offering to help when the threads got crossed.  I'd punted on
> > a lot of this in the hope Andrea could help out, as I'm not really a memory
> > model guy and this is pretty far down the rabbit hole.  Happy to have the
> > help if you're offering, though, as what's there is likely a pretty big
> > performance issue for anyone with a reasonable memory system.
> 
> Thanks for linking me to the discussion and the remarks, Palmer.  I'm
> happy to help (and synchronized with Jeff/the community) as possible,
> building a better understanding of the 'issues' at stake.

Summarizing here some findings from looking at the currently-implemented
and the proposed [1] mappings:

  - Current mapping is missing synchronization, notably

	atomic_compare_exchange_weak_explicit(-, -, -,
					      memory_order_release,
					      memory_order_relaxed);

    is unable to provide the (required) release ordering guarantees; for
    reference, I've reported a litmus test illustrating it at the bottom
    of this email, cf. c-cmpxchg.

  - [1] addressed the "memory_order_release" problem/bug mentioned above
    (as well as other quirks of the current mapping I won't detail here),
    but it doesn't address other problems present in the current mapping;
    in particular, both mappings translate the following

	atomic_compare_exchange_weak_explicit(-, -, -,
					      memory_order_acquire,
					      memory_order_relaxed);

    to a sequence 

	lr.w
	bne
	sc.w.aq

    (withouth any other synchronization/fences), which contrasts with the
    the Unprivileged Spec, Section 10,2 "Load-Reserve / Store-Conditional
    Instructions":

      "Software should not set the 'rl' bit on an LR instruction unless
      the 'aq' bit is also set, nor should software set the 'aq' bit on
      an SC instruction unless the 'rl' bit is also set.  LR.rl and SC.aq
      instructions are not guaranteed to provide any stronger ordering
      than those with both bits clear [...]"

Thanks,
  Andrea

[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595712.html


C c-cmpxchg

{}

P0(atomic_int *x, atomic_int *y, int *z)
{
	int r0;

	atomic_store_explicit(x, 1, memory_order_relaxed);
	r0 = atomic_compare_exchange_weak_explicit(y, z, 1, memory_order_release, memory_order_relaxed);
}

P1(atomic_int *x, atomic_int *y)
{
	int r1;
	int r2;

	r1 = atomic_load_explicit(y, memory_order_acquire);
	r2 = atomic_load_explicit(x, memory_order_relaxed);
}

exists (1:r1=1 /\ 1:r2=0)
Jeff Law Oct. 29, 2022, 5:02 a.m. UTC | #19
On 10/20/22 13:01, Andrea Parri wrote:
> On Wed, Oct 12, 2022 at 07:16:20PM +0200, Andrea Parri wrote:
>>>>>      +Andrea, in case he has time to look at the memory model / ABI
>>>>>      issues.
>>> +Jeff, who was offering to help when the threads got crossed.  I'd punted on
>>> a lot of this in the hope Andrea could help out, as I'm not really a memory
>>> model guy and this is pretty far down the rabbit hole.  Happy to have the
>>> help if you're offering, though, as what's there is likely a pretty big
>>> performance issue for anyone with a reasonable memory system.
>> Thanks for linking me to the discussion and the remarks, Palmer.  I'm
>> happy to help (and synchronized with Jeff/the community) as possible,
>> building a better understanding of the 'issues' at stake.
> Summarizing here some findings from looking at the currently-implemented
> and the proposed [1] mappings:
>
>    - Current mapping is missing synchronization, notably
>
> 	atomic_compare_exchange_weak_explicit(-, -, -,
> 					      memory_order_release,
> 					      memory_order_relaxed);
>
>      is unable to provide the (required) release ordering guarantees; for
>      reference, I've reported a litmus test illustrating it at the bottom
>      of this email, cf. c-cmpxchg.
>
>    - [1] addressed the "memory_order_release" problem/bug mentioned above
>      (as well as other quirks of the current mapping I won't detail here),
>      but it doesn't address other problems present in the current mapping;
>      in particular, both mappings translate the following
>
> 	atomic_compare_exchange_weak_explicit(-, -, -,
> 					      memory_order_acquire,
> 					      memory_order_relaxed);
>
>      to a sequence
>
> 	lr.w
> 	bne
> 	sc.w.aq
>
>      (withouth any other synchronization/fences), which contrasts with the
>      the Unprivileged Spec, Section 10,2 "Load-Reserve / Store-Conditional
>      Instructions":
>
>        "Software should not set the 'rl' bit on an LR instruction unless
>        the 'aq' bit is also set, nor should software set the 'aq' bit on
>        an SC instruction unless the 'rl' bit is also set.  LR.rl and SC.aq
>        instructions are not guaranteed to provide any stronger ordering
>        than those with both bits clear [...]"

So it sounds like Christoph's patch is an improvement, but isn't 
complete.  Given the pain in this space, I'd be hesitant to put in an 
incomplete fix, even if it moves things in the right direction as it 
creates another compatibility headache if we don't get the complete 
solution in place for gcc-13.


Christoph, thoughts on the case Andrea pointed out?


Jeff