Message ID | 20210505193651.2075405-1-cmuellner@gcc.gnu.org |
---|---|
Headers | show |
Series | Atomics improvements [PR100265/PR100266] | expand |
Hi Christoph, Kito, On 5/5/21 12:36, Christoph Muellner via Gcc-patches wrote: > This series provides a cleanup of the current atomics implementation > of RISC-V: > > * PR100265: Use proper fences for atomic load/store > * PR100266: Provide programmatic implementation of CAS > > As both are very related, I merged the patches into one series. > > The first patch could be squashed into the following patches, > but I found it easier to understand the chances with it in place. > > The series has been tested as follows: > * Building and testing a multilib RV32/64 toolchain > (bootstrapped with riscv-gnu-toolchain repo) > * Manual review of generated sequences for GCC's atomic builtins API > > The programmatic re-implementation of CAS benefits from a REE improvement > (see PR100264): > https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568680.html > If this patch is not in place, then an additional extension instruction > is emitted after the SC.W (in case of RV64 and CAS for uint32_t). > > Further, the new CAS code requires cbranch INSN helpers to be present: > https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569689.html I was wondering is this patchset is blocked on some technical grounds. Thx, -Vineet > Changes for v2: > * Guard LL/SC sequence by compiler barriers ("blockage") > (suggested by Andrew Waterman) > * Changed commit message for AMOSWAP->STORE change > (suggested by Andrew Waterman) > * Extracted cbranch4 patch from patchset (suggested by Kito Cheng) > * Introduce predicate riscv_sync_memory_operand (suggested by Jim Wilson) > * Fix small code style issue > > Christoph Muellner (10): > RISC-V: Simplify memory model code [PR 100265] > RISC-V: Emit proper memory ordering suffixes for AMOs [PR 100265] > RISC-V: Eliminate %F specifier from riscv_print_operand() [PR 100265] > RISC-V: Use STORE instead of AMOSWAP for atomic stores [PR 100265] > RISC-V: Emit fences according to chosen memory model [PR 100265] > RISC-V: Implement atomic_{load,store} [PR 100265] > RISC-V: Model INSNs for LR and SC [PR 100266] > RISC-V: Add s.ext-consuming INSNs for LR and SC [PR 100266] > RISC-V: Provide programmatic implementation of CAS [PR 100266] > RISC-V: Introduce predicate "riscv_sync_memory_operand" [PR 100266] > > gcc/config/riscv/riscv-protos.h | 1 + > gcc/config/riscv/riscv.c | 136 +++++++++++++------- > gcc/config/riscv/sync.md | 216 +++++++++++++++++++++----------- > 3 files changed, 235 insertions(+), 118 deletions(-) >
On Tue, 11 Oct 2022 12:06:27 PDT (-0700), Vineet Gupta wrote: > Hi Christoph, Kito, > > On 5/5/21 12:36, Christoph Muellner via Gcc-patches wrote: >> This series provides a cleanup of the current atomics implementation >> of RISC-V: >> >> * PR100265: Use proper fences for atomic load/store >> * PR100266: Provide programmatic implementation of CAS >> >> As both are very related, I merged the patches into one series. >> >> The first patch could be squashed into the following patches, >> but I found it easier to understand the chances with it in place. >> >> The series has been tested as follows: >> * Building and testing a multilib RV32/64 toolchain >> (bootstrapped with riscv-gnu-toolchain repo) >> * Manual review of generated sequences for GCC's atomic builtins API >> >> The programmatic re-implementation of CAS benefits from a REE improvement >> (see PR100264): >> https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568680.html >> If this patch is not in place, then an additional extension instruction >> is emitted after the SC.W (in case of RV64 and CAS for uint32_t). >> >> Further, the new CAS code requires cbranch INSN helpers to be present: >> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569689.html > > I was wondering is this patchset is blocked on some technical grounds. There's a v3 (though I can't find all of it, so not quite sure what happened), but IIUC that still has the same fundamental problems that all these have had: changing over to the new fence model may by an ABI break and the split CAS implementation doesn't ensure eventual success (see Jim's comments). Not sure if there's other comments floating around, though, that's just what I remember. +Andrea, in case he has time to look at the memory model / ABI issues. We'd still need to sort out the CAS issues, though, and it's not abundantly clear it's worth the work: we're essentailly constrained to just emitting those fixed CAS sequences due to the eventual success rules, so it's not clear what the benefit of splitting those up is. With WRS there are some routines we might want to generate code for (cond_read_acquire() in Linux, for example) but we'd really need to dig into those to see if it's even sane/fast. There's another patch set to fix the lack of inline atomic routines without breaking stuff, there were some minor comments from Kito and IIRC I had some test failures that I needed to chase down as well. That's a much safer fix in the short term, we'll need to deal with this eventually but at least we can stop the libatomic issues for the distro folks. > > Thx, > -Vineet > >> Changes for v2: >> * Guard LL/SC sequence by compiler barriers ("blockage") >> (suggested by Andrew Waterman) >> * Changed commit message for AMOSWAP->STORE change >> (suggested by Andrew Waterman) >> * Extracted cbranch4 patch from patchset (suggested by Kito Cheng) >> * Introduce predicate riscv_sync_memory_operand (suggested by Jim Wilson) >> * Fix small code style issue >> >> Christoph Muellner (10): >> RISC-V: Simplify memory model code [PR 100265] >> RISC-V: Emit proper memory ordering suffixes for AMOs [PR 100265] >> RISC-V: Eliminate %F specifier from riscv_print_operand() [PR 100265] >> RISC-V: Use STORE instead of AMOSWAP for atomic stores [PR 100265] >> RISC-V: Emit fences according to chosen memory model [PR 100265] >> RISC-V: Implement atomic_{load,store} [PR 100265] >> RISC-V: Model INSNs for LR and SC [PR 100266] >> RISC-V: Add s.ext-consuming INSNs for LR and SC [PR 100266] >> RISC-V: Provide programmatic implementation of CAS [PR 100266] >> RISC-V: Introduce predicate "riscv_sync_memory_operand" [PR 100266] >> >> gcc/config/riscv/riscv-protos.h | 1 + >> gcc/config/riscv/riscv.c | 136 +++++++++++++------- >> gcc/config/riscv/sync.md | 216 +++++++++++++++++++++----------- >> 3 files changed, 235 insertions(+), 118 deletions(-) >>
On Tue, Oct 11, 2022 at 9:31 PM Palmer Dabbelt <palmer@dabbelt.com> wrote: > On Tue, 11 Oct 2022 12:06:27 PDT (-0700), Vineet Gupta wrote: > > Hi Christoph, Kito, > > > > On 5/5/21 12:36, Christoph Muellner via Gcc-patches wrote: > >> This series provides a cleanup of the current atomics implementation > >> of RISC-V: > >> > >> * PR100265: Use proper fences for atomic load/store > >> * PR100266: Provide programmatic implementation of CAS > >> > >> As both are very related, I merged the patches into one series. > >> > >> The first patch could be squashed into the following patches, > >> but I found it easier to understand the chances with it in place. > >> > >> The series has been tested as follows: > >> * Building and testing a multilib RV32/64 toolchain > >> (bootstrapped with riscv-gnu-toolchain repo) > >> * Manual review of generated sequences for GCC's atomic builtins API > >> > >> The programmatic re-implementation of CAS benefits from a REE > improvement > >> (see PR100264): > >> https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568680.html > >> If this patch is not in place, then an additional extension instruction > >> is emitted after the SC.W (in case of RV64 and CAS for uint32_t). > >> > >> Further, the new CAS code requires cbranch INSN helpers to be present: > >> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569689.html > > > > I was wondering is this patchset is blocked on some technical grounds. > > There's a v3 (though I can't find all of it, so not quite sure what > happened), but IIUC that still has the same fundamental problems that > all these have had: changing over to the new fence model may by an ABI > break and the split CAS implementation doesn't ensure eventual success > (see Jim's comments). Not sure if there's other comments floating > around, though, that's just what I remember. > v3 was sent on May 27, 2022, when I rebased this on an internal tree: https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595712.html I dropped the CAS patch in v3 (issue: stack spilling under extreme register pressure instead of erroring out) as I thought that this was the blocker for the series. I just learned a few weeks ago, when I asked Palmer at the GNU Cauldron about this series, that the ABI break is the blocker. My initial understanding was that fixing something broken cannot be an ABI break. And that the mismatch of the implementation in 2021 and the recommended mappings in the ratified specification from 2019 is something that is broken. I still don't know the background here, but I guess this assumption is incorrect from a historical point of view. However, I'm sure that I am not the only one that assumes the mappings in the specification to be implemented in compilers and tools. Therefore I still consider the implementation of the RISC-V atomics in GCC as broken (at least w.r.t. user expectation from people that lack the historical background and just read the RISC-V specification). > > +Andrea, in case he has time to look at the memory model / ABI issues. > > We'd still need to sort out the CAS issues, though, and it's not > abundantly clear it's worth the work: we're essentailly constrained to > just emitting those fixed CAS sequences due to the eventual success > rules, so it's not clear what the benefit of splitting those up is. > With WRS there are some routines we might want to generate code for > (cond_read_acquire() in Linux, for example) but we'd really need to dig > into those to see if it's even sane/fast. > > There's another patch set to fix the lack of inline atomic routines > without breaking stuff, there were some minor comments from Kito and > IIRC I had some test failures that I needed to chase down as well. > That's a much safer fix in the short term, we'll need to deal with this > eventually but at least we can stop the libatomic issues for the distro > folks. > I expect that the pressure for a proper fix upstream (instead of a backward compatible compromise) will increase over time (once people start building big iron based on RISC-V and start hunting performance bottlenecks in multithreaded workloads to be competitive). What could be done to get some relief is to enable the new atomics ABI by a command line switch and promote its use. And at one point in the future (if there are enough fixes to justify a break) the new ABI can be enabled by default with a new flag to enable the old ABI. > > > > > Thx, > > -Vineet > > > >> Changes for v2: > >> * Guard LL/SC sequence by compiler barriers ("blockage") > >> (suggested by Andrew Waterman) > >> * Changed commit message for AMOSWAP->STORE change > >> (suggested by Andrew Waterman) > >> * Extracted cbranch4 patch from patchset (suggested by Kito Cheng) > >> * Introduce predicate riscv_sync_memory_operand (suggested by Jim > Wilson) > >> * Fix small code style issue > >> > >> Christoph Muellner (10): > >> RISC-V: Simplify memory model code [PR 100265] > >> RISC-V: Emit proper memory ordering suffixes for AMOs [PR 100265] > >> RISC-V: Eliminate %F specifier from riscv_print_operand() [PR 100265] > >> RISC-V: Use STORE instead of AMOSWAP for atomic stores [PR 100265] > >> RISC-V: Emit fences according to chosen memory model [PR 100265] > >> RISC-V: Implement atomic_{load,store} [PR 100265] > >> RISC-V: Model INSNs for LR and SC [PR 100266] > >> RISC-V: Add s.ext-consuming INSNs for LR and SC [PR 100266] > >> RISC-V: Provide programmatic implementation of CAS [PR 100266] > >> RISC-V: Introduce predicate "riscv_sync_memory_operand" [PR 100266] > >> > >> gcc/config/riscv/riscv-protos.h | 1 + > >> gcc/config/riscv/riscv.c | 136 +++++++++++++------- > >> gcc/config/riscv/sync.md | 216 +++++++++++++++++++++----------- > >> 3 files changed, 235 insertions(+), 118 deletions(-) > >> >
On 10/11/22 13:31, Palmer Dabbelt wrote: > On Tue, 11 Oct 2022 12:06:27 PDT (-0700), Vineet Gupta wrote: >> Hi Christoph, Kito, >> >> On 5/5/21 12:36, Christoph Muellner via Gcc-patches wrote: >>> This series provides a cleanup of the current atomics implementation >>> of RISC-V: >>> >>> * PR100265: Use proper fences for atomic load/store >>> * PR100266: Provide programmatic implementation of CAS >>> >>> As both are very related, I merged the patches into one series. >>> >>> The first patch could be squashed into the following patches, >>> but I found it easier to understand the chances with it in place. >>> >>> The series has been tested as follows: >>> * Building and testing a multilib RV32/64 toolchain >>> (bootstrapped with riscv-gnu-toolchain repo) >>> * Manual review of generated sequences for GCC's atomic builtins API >>> >>> The programmatic re-implementation of CAS benefits from a REE >>> improvement >>> (see PR100264): >>> https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568680.html >>> If this patch is not in place, then an additional extension instruction >>> is emitted after the SC.W (in case of RV64 and CAS for uint32_t). >>> >>> Further, the new CAS code requires cbranch INSN helpers to be present: >>> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569689.html >> >> I was wondering is this patchset is blocked on some technical grounds. > > There's a v3 (though I can't find all of it, so not quite sure what > happened), but IIUC that still has the same fundamental problems that > all these have had: changing over to the new fence model may by an ABI > break and the split CAS implementation doesn't ensure eventual success > (see Jim's comments). Not sure if there's other comments floating > around, though, that's just what I remember. Do we have a pointer to the ABI discussion. I've been meaning to familiarize myself with the issues in this space and that seems like a good place to start given its blocking progress on the atomics. jeff
On 10/11/22 13:46, Christoph Müllner wrote: > On Tue, Oct 11, 2022 at 9:31 PM Palmer Dabbelt <palmer@dabbelt.com> wrote: > > On Tue, 11 Oct 2022 12:06:27 PDT (-0700), Vineet Gupta wrote: > > Hi Christoph, Kito, > > > > On 5/5/21 12:36, Christoph Muellner via Gcc-patches wrote: > >> This series provides a cleanup of the current atomics > implementation > >> of RISC-V: > >> > >> * PR100265: Use proper fences for atomic load/store > >> * PR100266: Provide programmatic implementation of CAS > >> > >> As both are very related, I merged the patches into one series. > >> > >> The first patch could be squashed into the following patches, > >> but I found it easier to understand the chances with it in place. > >> > >> The series has been tested as follows: > >> * Building and testing a multilib RV32/64 toolchain > >> (bootstrapped with riscv-gnu-toolchain repo) > >> * Manual review of generated sequences for GCC's atomic > builtins API > >> > >> The programmatic re-implementation of CAS benefits from a REE > improvement > >> (see PR100264): > >> https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568680.html > >> If this patch is not in place, then an additional extension > instruction > >> is emitted after the SC.W (in case of RV64 and CAS for uint32_t). > >> > >> Further, the new CAS code requires cbranch INSN helpers to be > present: > >> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569689.html > > > > I was wondering is this patchset is blocked on some technical > grounds. > > There's a v3 (though I can't find all of it, so not quite sure what > happened), but IIUC that still has the same fundamental problems that > all these have had: changing over to the new fence model may by an > ABI > break and the split CAS implementation doesn't ensure eventual > success > (see Jim's comments). Not sure if there's other comments floating > around, though, that's just what I remember. > > > v3 was sent on May 27, 2022, when I rebased this on an internal tree: > https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595712.html > I dropped the CAS patch in v3 (issue: stack spilling under extreme > register pressure instead of erroring out) as I thought that this was > the blocker for the series. > I just learned a few weeks ago, when I asked Palmer at the GNU > Cauldron about this series, that the ABI break is the blocker. Yeah I was confused about the ABI aspect as I didn't see any mention of that in the public reviews of v1 and v2. > My initial understanding was that fixing something broken cannot be an > ABI break. > And that the mismatch of the implementation in 2021 and the > recommended mappings in the ratified specification from 2019 is > something that is broken. I still don't know the background here, but > I guess this assumption is incorrect from a historical point of view. > > However, I'm sure that I am not the only one that assumes the mappings > in the specification to be implemented in compilers and tools. > Therefore I still consider the implementation of the RISC-V atomics in > GCC as broken (at least w.r.t. user expectation from people that lack > the historical background and just read the RISC-V specification). > > > +Andrea, in case he has time to look at the memory model / ABI > issues. > > We'd still need to sort out the CAS issues, though, and it's not > abundantly clear it's worth the work: we're essentailly > constrained to > just emitting those fixed CAS sequences due to the eventual success > rules, so it's not clear what the benefit of splitting those up is. > With WRS there are some routines we might want to generate code for > (cond_read_acquire() in Linux, for example) but we'd really need > to dig > into those to see if it's even sane/fast. > > There's another patch set to fix the lack of inline atomic routines > without breaking stuff, there were some minor comments from Kito and > IIRC I had some test failures that I needed to chase down as well. > That's a much safer fix in the short term, we'll need to deal with > this > eventually but at least we can stop the libatomic issues for the > distro > folks. > > > I expect that the pressure for a proper fix upstream (instead of a > backward compatible compromise) will increase over time (once people > start building big iron based on RISC-V and start hunting performance > bottlenecks in multithreaded workloads to be competitive). > What could be done to get some relief is to enable the new atomics ABI > by a command line switch and promote its use. And at one point in the > future (if there are enough fixes to justify a break) the new ABI can > be enabled by default with a new flag to enable the old ABI. Indeed we are stuck with inefficiencies with status quo. The new abi option sounds like a reasonable plan going fwd. Also my understand is that while the considerations are ABI centric, the option to faciliate this need not be tied to canonical -mabi=lp32, lp64d etc. It might just be a toggle as -matomic=legacy,2019 etc (this is not suggestive just indicative). Otherwise there's another level of blowup in multilib testing etc. -Vineet
On Tue, 11 Oct 2022 16:31:25 PDT (-0700), Vineet Gupta wrote: > > > On 10/11/22 13:46, Christoph Müllner wrote: >> On Tue, Oct 11, 2022 at 9:31 PM Palmer Dabbelt <palmer@dabbelt.com> wrote: >> >> On Tue, 11 Oct 2022 12:06:27 PDT (-0700), Vineet Gupta wrote: >> > Hi Christoph, Kito, >> > >> > On 5/5/21 12:36, Christoph Muellner via Gcc-patches wrote: >> >> This series provides a cleanup of the current atomics >> implementation >> >> of RISC-V: >> >> >> >> * PR100265: Use proper fences for atomic load/store >> >> * PR100266: Provide programmatic implementation of CAS >> >> >> >> As both are very related, I merged the patches into one series. >> >> >> >> The first patch could be squashed into the following patches, >> >> but I found it easier to understand the chances with it in place. >> >> >> >> The series has been tested as follows: >> >> * Building and testing a multilib RV32/64 toolchain >> >> (bootstrapped with riscv-gnu-toolchain repo) >> >> * Manual review of generated sequences for GCC's atomic >> builtins API >> >> >> >> The programmatic re-implementation of CAS benefits from a REE >> improvement >> >> (see PR100264): >> >> https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568680.html >> >> If this patch is not in place, then an additional extension >> instruction >> >> is emitted after the SC.W (in case of RV64 and CAS for uint32_t). >> >> >> >> Further, the new CAS code requires cbranch INSN helpers to be >> present: >> >> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569689.html >> > >> > I was wondering is this patchset is blocked on some technical >> grounds. >> >> There's a v3 (though I can't find all of it, so not quite sure what >> happened), but IIUC that still has the same fundamental problems that >> all these have had: changing over to the new fence model may by an >> ABI >> break and the split CAS implementation doesn't ensure eventual >> success >> (see Jim's comments). Not sure if there's other comments floating >> around, though, that's just what I remember. >> >> >> v3 was sent on May 27, 2022, when I rebased this on an internal tree: >> https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595712.html >> I dropped the CAS patch in v3 (issue: stack spilling under extreme >> register pressure instead of erroring out) as I thought that this was >> the blocker for the series. >> I just learned a few weeks ago, when I asked Palmer at the GNU >> Cauldron about this series, that the ABI break is the blocker. > > Yeah I was confused about the ABI aspect as I didn't see any mention of > that in the public reviews of v1 and v2. Sorry, I thought we'd talked about it somewhere but it must have just been in meetings and such. Patrick was writing a similar patch set around the same time so it probably just got tied up in that, we ended up reducing it to just the strong CAS inline stuff because we couldn't sort out the correctness of the rest of it. >> My initial understanding was that fixing something broken cannot be an >> ABI break. >> And that the mismatch of the implementation in 2021 and the >> recommended mappings in the ratified specification from 2019 is >> something that is broken. I still don't know the background here, but >> I guess this assumption is incorrect from a historical point of view. We agreed that we wouldn't break binaries back when we submitted the port. The ISA has changed many times since then, including adding the recommended mappings, but those binaries exist and we can't just silently break things for users. >> However, I'm sure that I am not the only one that assumes the mappings >> in the specification to be implemented in compilers and tools. >> Therefore I still consider the implementation of the RISC-V atomics in >> GCC as broken (at least w.r.t. user expectation from people that lack >> the historical background and just read the RISC-V specification). You can't just read one of those RISC-V PDFs and assume that implementations that match those words will function correctly. Those words regularly change in ways where reasonable readers would end up with incompatible implementations due to those differences. That's why we're so explicit about versions and such these days, we're just getting burned by these old mappings because they're from back when we though the RISC-V definition of compatibility was going to match the more common one and we didn't build in fallbacks. >> +Andrea, in case he has time to look at the memory model / ABI >> issues. >> >> We'd still need to sort out the CAS issues, though, and it's not >> abundantly clear it's worth the work: we're essentailly >> constrained to >> just emitting those fixed CAS sequences due to the eventual success >> rules, so it's not clear what the benefit of splitting those up is. >> With WRS there are some routines we might want to generate code for >> (cond_read_acquire() in Linux, for example) but we'd really need >> to dig >> into those to see if it's even sane/fast. >> >> There's another patch set to fix the lack of inline atomic routines >> without breaking stuff, there were some minor comments from Kito and >> IIRC I had some test failures that I needed to chase down as well. >> That's a much safer fix in the short term, we'll need to deal with >> this >> eventually but at least we can stop the libatomic issues for the >> distro >> folks. >> >> >> I expect that the pressure for a proper fix upstream (instead of a >> backward compatible compromise) will increase over time (once people >> start building big iron based on RISC-V and start hunting performance >> bottlenecks in multithreaded workloads to be competitive). >> What could be done to get some relief is to enable the new atomics ABI >> by a command line switch and promote its use. And at one point in the >> future (if there are enough fixes to justify a break) the new ABI can >> be enabled by default with a new flag to enable the old ABI. > > Indeed we are stuck with inefficiencies with status quo. The new abi > option sounds like a reasonable plan going fwd. I don't think we're just stuck with the status quo, we really just need to go through the mappings and figure out which can be made both fast and ABI-compatible. Then we can fix those and see where we stand, maybe it's good enough or maybe we need to introduce some sort of compatibility break to make things faster (and/or compatible with LLVM, where I suspect we're broken right now). If we do need a break then I think it's probably possible to do it in phases, where we have a middle-ground compatibility mode that works for both the old and new mappings so distros can gradually move over as they rebuild packages. Issues like the libstdc++ shared_ptr/mutex fallback don't map well to that, though. There's also some stuff like the IO fence bits that we can probably just add an argument for now, those were likely just a bad idea at the time and should be safe to turn off for the vast majority of users (though those are more of an API break). > Also my understand is that while the considerations are ABI centric, the > option to faciliate this need not be tied to canonical -mabi=lp32, lp64d > etc. It might just be a toggle as -matomic=legacy,2019 etc (this is not > suggestive just indicative). Otherwise there's another level of blowup > in multilib testing etc. The psABI doesn't mention memory ordering at all. IIUC that's a pretty standard hole in psABI documents, but it means we're in a grey area here. +Jeff, who was offering to help when the threads got crossed. I'd punted on a lot of this in the hope Andrea could help out, as I'm not really a memory model guy and this is pretty far down the rabbit hole. Happy to have the help if you're offering, though, as what's there is likely a pretty big performance issue for anyone with a reasonable memory system. > -Vineet
On Wed, Oct 12, 2022 at 2:15 AM Palmer Dabbelt <palmer@dabbelt.com> wrote: > On Tue, 11 Oct 2022 16:31:25 PDT (-0700), Vineet Gupta wrote: > > > > > > On 10/11/22 13:46, Christoph Müllner wrote: > >> On Tue, Oct 11, 2022 at 9:31 PM Palmer Dabbelt <palmer@dabbelt.com> > wrote: > >> > >> On Tue, 11 Oct 2022 12:06:27 PDT (-0700), Vineet Gupta wrote: > >> > Hi Christoph, Kito, > >> > > >> > On 5/5/21 12:36, Christoph Muellner via Gcc-patches wrote: > >> >> This series provides a cleanup of the current atomics > >> implementation > >> >> of RISC-V: > >> >> > >> >> * PR100265: Use proper fences for atomic load/store > >> >> * PR100266: Provide programmatic implementation of CAS > >> >> > >> >> As both are very related, I merged the patches into one series. > >> >> > >> >> The first patch could be squashed into the following patches, > >> >> but I found it easier to understand the chances with it in place. > >> >> > >> >> The series has been tested as follows: > >> >> * Building and testing a multilib RV32/64 toolchain > >> >> (bootstrapped with riscv-gnu-toolchain repo) > >> >> * Manual review of generated sequences for GCC's atomic > >> builtins API > >> >> > >> >> The programmatic re-implementation of CAS benefits from a REE > >> improvement > >> >> (see PR100264): > >> >> https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568680.html > >> >> If this patch is not in place, then an additional extension > >> instruction > >> >> is emitted after the SC.W (in case of RV64 and CAS for uint32_t). > >> >> > >> >> Further, the new CAS code requires cbranch INSN helpers to be > >> present: > >> >> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569689.html > >> > > >> > I was wondering is this patchset is blocked on some technical > >> grounds. > >> > >> There's a v3 (though I can't find all of it, so not quite sure what > >> happened), but IIUC that still has the same fundamental problems > that > >> all these have had: changing over to the new fence model may by an > >> ABI > >> break and the split CAS implementation doesn't ensure eventual > >> success > >> (see Jim's comments). Not sure if there's other comments floating > >> around, though, that's just what I remember. > >> > >> > >> v3 was sent on May 27, 2022, when I rebased this on an internal tree: > >> https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595712.html > >> I dropped the CAS patch in v3 (issue: stack spilling under extreme > >> register pressure instead of erroring out) as I thought that this was > >> the blocker for the series. > >> I just learned a few weeks ago, when I asked Palmer at the GNU > >> Cauldron about this series, that the ABI break is the blocker. > > > > Yeah I was confused about the ABI aspect as I didn't see any mention of > > that in the public reviews of v1 and v2. > > Sorry, I thought we'd talked about it somewhere but it must have just > been in meetings and such. Patrick was writing a similar patch set > around the same time so it probably just got tied up in that, we ended > up reducing it to just the strong CAS inline stuff because we couldn't > sort out the correctness of the rest of it. > > >> My initial understanding was that fixing something broken cannot be an > >> ABI break. > >> And that the mismatch of the implementation in 2021 and the > >> recommended mappings in the ratified specification from 2019 is > >> something that is broken. I still don't know the background here, but > >> I guess this assumption is incorrect from a historical point of view. > > We agreed that we wouldn't break binaries back when we submitted the > port. The ISA has changed many times since then, including adding the > recommended mappings, but those binaries exist and we can't just > silently break things for users. > > >> However, I'm sure that I am not the only one that assumes the mappings > >> in the specification to be implemented in compilers and tools. > >> Therefore I still consider the implementation of the RISC-V atomics in > >> GCC as broken (at least w.r.t. user expectation from people that lack > >> the historical background and just read the RISC-V specification). > > You can't just read one of those RISC-V PDFs and assume that > implementations that match those words will function correctly. Those > words regularly change in ways where reasonable readers would end up > with incompatible implementations due to those differences. That's why > we're so explicit about versions and such these days, we're just getting > burned by these old mappings because they're from back when we though > the RISC-V definition of compatibility was going to match the more > common one and we didn't build in fallbacks. > Indeed, read-and-assume might not be the best idea. But read-and-ignore (in the hope everyone else does as well) won't help either. I think it is reasonable to expect this detailed mapping table from the specification being implemented. Why would it have been written and why would one ignore it? But let's look at the current situation and the proposed solutions so far (below). > > >> +Andrea, in case he has time to look at the memory model / ABI > >> issues. > >> > >> We'd still need to sort out the CAS issues, though, and it's not > >> abundantly clear it's worth the work: we're essentailly > >> constrained to > >> just emitting those fixed CAS sequences due to the eventual success > >> rules, so it's not clear what the benefit of splitting those up is. > >> With WRS there are some routines we might want to generate code for > >> (cond_read_acquire() in Linux, for example) but we'd really need > >> to dig > >> into those to see if it's even sane/fast. > >> > >> There's another patch set to fix the lack of inline atomic routines > >> without breaking stuff, there were some minor comments from Kito and > >> IIRC I had some test failures that I needed to chase down as well. > >> That's a much safer fix in the short term, we'll need to deal with > >> this > >> eventually but at least we can stop the libatomic issues for the > >> distro > >> folks. > >> > >> > >> I expect that the pressure for a proper fix upstream (instead of a > >> backward compatible compromise) will increase over time (once people > >> start building big iron based on RISC-V and start hunting performance > >> bottlenecks in multithreaded workloads to be competitive). > >> What could be done to get some relief is to enable the new atomics ABI > >> by a command line switch and promote its use. And at one point in the > >> future (if there are enough fixes to justify a break) the new ABI can > >> be enabled by default with a new flag to enable the old ABI. > > > > Indeed we are stuck with inefficiencies with status quo. The new abi > > option sounds like a reasonable plan going fwd. > > I don't think we're just stuck with the status quo, we really just need > to go through the mappings and figure out which can be made both fast > and ABI-compatible. Then we can fix those and see where we stand, maybe > it's good enough or maybe we need to introduce some sort of > compatibility break to make things faster (and/or compatible with LLVM, > where I suspect we're broken right now). > So we have the following atomics ABIs: I) GCC implementation II) LLVM implementation III) Specified ABI in the "Code Porting and Mapping Guidelines" appendix of the RISC-V specification And there are two proposed solutions: a) Finding a new ABI that is compatible with I) and II) is of course a solution, but we don't know if and when such a solution exists. b) Going to introduce III) causes a break and therefore needs special care (e.g. let the user decide via command line flag or provide a compatibility mode). I don't see that a) and b) contradict each other. Why not going for both: -) Continue to work on a backward compatible solution -) Enable the "new" ABI from the specification appendix via command line flag -) Reevaluate the situation in 12 months to decide the next steps > > If we do need a break then I think it's probably possible to do it in > phases, where we have a middle-ground compatibility mode that works for > both the old and new mappings so distros can gradually move over as they > rebuild packages. > > Issues like the libstdc++ shared_ptr/mutex fallback don't map well to > that, though. There's also some stuff like the IO fence bits that we > can probably just add an argument for now, those were likely just a bad > idea at the time and should be safe to turn off for the vast majority of > users (though those are more of an API break). > > > Also my understand is that while the considerations are ABI centric, the > > option to faciliate this need not be tied to canonical -mabi=lp32, lp64d > > etc. It might just be a toggle as -matomic=legacy,2019 etc (this is not > > suggestive just indicative). Otherwise there's another level of blowup > > in multilib testing etc. > > The psABI doesn't mention memory ordering at all. IIUC that's a pretty > standard hole in psABI documents, but it means we're in a grey area > here. > > +Jeff, who was offering to help when the threads got crossed. I'd > punted on a lot of this in the hope Andrea could help out, as I'm not > really a memory model guy and this is pretty far down the rabbit hole. > Happy to have the help if you're offering, though, as what's there is > likely a pretty big performance issue for anyone with a reasonable > memory system. > > > -Vineet >
> > > +Andrea, in case he has time to look at the memory model / ABI > > > issues. > +Jeff, who was offering to help when the threads got crossed. I'd punted on > a lot of this in the hope Andrea could help out, as I'm not really a memory > model guy and this is pretty far down the rabbit hole. Happy to have the > help if you're offering, though, as what's there is likely a pretty big > performance issue for anyone with a reasonable memory system. Thanks for linking me to the discussion and the remarks, Palmer. I'm happy to help (and synchronized with Jeff/the community) as possible, building a better understanding of the 'issues' at stake. Andrea
On 10/11/22 17:31, Vineet Gupta wrote: > >> >> I expect that the pressure for a proper fix upstream (instead of a >> backward compatible compromise) will increase over time (once people >> start building big iron based on RISC-V and start hunting performance >> bottlenecks in multithreaded workloads to be competitive). >> What could be done to get some relief is to enable the new atomics >> ABI by a command line switch and promote its use. And at one point in >> the future (if there are enough fixes to justify a break) the new ABI >> can be enabled by default with a new flag to enable the old ABI. > > Indeed we are stuck with inefficiencies with status quo. The new abi > option sounds like a reasonable plan going fwd. > > Also my understand is that while the considerations are ABI centric, > the option to faciliate this need not be tied to canonical -mabi=lp32, > lp64d etc. It might just be a toggle as -matomic=legacy,2019 etc (this > is not suggestive just indicative). Otherwise there's another level of > blowup in multilib testing etc. If I understand the history here, we're essentially catering to code that is potentially relying on behavior that was never really guaranteed. That's not really ABI -- it's more depending on specifics of an implementation or undefined/underdefined behavior. Holding back progress for that case seems short-sighted, particularly given how early I hope we are in the RISC-V journey. But I'm also sympathetic to the desire not to break existing code. Could we keep the old behavior under a flag and fix the default behavior here, presumably with a bit in the ELF header indicating code that wants the old behavior? Jeff
On 10/11/22 18:15, Palmer Dabbelt wrote: > > Sorry, I thought we'd talked about it somewhere but it must have just > been in meetings and such. Patrick was writing a similar patch set > around the same time so it probably just got tied up in that, we ended > up reducing it to just the strong CAS inline stuff because we couldn't > sort out the correctness of the rest of it. Now that you mention it, I vaguely recall a discussion about inline atomics vs libatomic and the discussion on this issue might have been there rather than in Christophe's patchset. > >>> My initial understanding was that fixing something broken cannot be >>> an ABI break. >>> And that the mismatch of the implementation in 2021 and the >>> recommended mappings in the ratified specification from 2019 is >>> something that is broken. I still don't know the background here, >>> but I guess this assumption is incorrect from a historical point of >>> view. > > We agreed that we wouldn't break binaries back when we submitted the > port. The ISA has changed many times since then, including adding the > recommended mappings, but those binaries exist and we can't just > silently break things for users. That may be too strong of a policy -- just because something worked in the past doesn't mean it must always work. It really depends on the contracts specified by the ABI, processor reference documentation, etc and whether or not the code relies on something that it outside those contracts. If it does rely on behavior outside the contracts, then we shouldn't be constrained by such an agreement. Another way to think about this problem is do we want more code making incorrect assumptions about the behavior of atomics getting out in the wild? My take would be that we nip this in the bud, get it right now in the default configuration, but leave enough bits in place that existing code continues to work. > >>> However, I'm sure that I am not the only one that assumes the >>> mappings in the specification to be implemented in compilers and >>> tools. Therefore I still consider the implementation of the RISC-V >>> atomics in GCC as broken (at least w.r.t. user expectation from >>> people that lack the historical background and just read the RISC-V >>> specification). > > You can't just read one of those RISC-V PDFs and assume that > implementations that match those words will function correctly. Those > words regularly change in ways where reasonable readers would end up > with incompatible implementations due to those differences. That's > why we're so explicit about versions and such these days, we're just > getting burned by these old mappings because they're from back when we > though the RISC-V definition of compatibility was going to match the > more common one and we didn't build in fallbacks. Fair point, but in my mind that argues that the platform must mature further so that the contracts can be relied upon. That obviously needs to get fixed and until it does any agreements or guarantees about behavior of existing code can't be reasonably made. If we're going to be taken seriously, then those fundamentals have to be rock solid. > > I don't think we're just stuck with the status quo, we really just > need to go through the mappings and figure out which can be made both > fast and ABI-compatible. Then we can fix those and see where we > stand, maybe it's good enough or maybe we need to introduce some sort > of compatibility break to make things faster (and/or compatible with > LLVM, where I suspect we're broken right now). Certainly seems like a good first step. What we can fix without breaking things we do while we sort out the tougher problems. > > If we do need a break then I think it's probably possible to do it in > phases, where we have a middle-ground compatibility mode that works > for both the old and new mappings so distros can gradually move over > as they rebuild packages. As someone that lived in the distro space for a long time, I would argue that now is the time to fix this stuff -- before there is a large uptake in distro consumption. > > +Jeff, who was offering to help when the threads got crossed. I'd > punted on a lot of this in the hope Andrea could help out, as I'm not > really a memory model guy and this is pretty far down the rabbit > hole. Happy to have the help if you're offering, though, as what's > there is likely a pretty big performance issue for anyone with a > reasonable memory system. Hmm, there's a case I'm pondering if I can discuss or not. Probably not since I can't recall it ever being discussed in public. So I'll just say this space can be critically important for performance and the longer we wait, the tougher it gets to fix without causing significant disruption. Jeff
On 10/12/22 02:03, Christoph Müllner wrote: > > > So we have the following atomics ABIs: > I) GCC implementation > II) LLVM implementation > III) Specified ABI in the "Code Porting and Mapping Guidelines" > appendix of the RISC-V specification And presumably we don't have any way to distinguish between I and II at the DSO or object level. That implies that if we're going to get to III, then we have to mark new code. We obviously can't mark pre-existing bits (and I may have implied we should do that in my earlier message, my goof). > > And there are two proposed solutions: > a) Finding a new ABI that is compatible with I) and II) is of course > a solution, but we don't know if and when such a solution exists. > b) Going to introduce III) causes a break and therefore needs special > care (e.g. let the user decide via command line flag or provide a > compatibility mode). > > I don't see that a) and b) contradict each other. > Why not going for both: > -) Continue to work on a backward compatible solution > -) Enable the "new" ABI from the specification appendix via command > line flag > -) Reevaluate the situation in 12 months to decide the next steps I would lean towards making the new, more correct, behavior the default and having the old behavior enabled by a command line flag. But otherwise what you're suggesting seems reasonable. Jeff
On Thu, 13 Oct 2022 15:39:39 PDT (-0700), gcc-patches@gcc.gnu.org wrote: > > On 10/11/22 17:31, Vineet Gupta wrote: >> >>> >>> I expect that the pressure for a proper fix upstream (instead of a >>> backward compatible compromise) will increase over time (once people >>> start building big iron based on RISC-V and start hunting performance >>> bottlenecks in multithreaded workloads to be competitive). >>> What could be done to get some relief is to enable the new atomics >>> ABI by a command line switch and promote its use. And at one point in >>> the future (if there are enough fixes to justify a break) the new ABI >>> can be enabled by default with a new flag to enable the old ABI. >> >> Indeed we are stuck with inefficiencies with status quo. The new abi >> option sounds like a reasonable plan going fwd. >> >> Also my understand is that while the considerations are ABI centric, >> the option to faciliate this need not be tied to canonical -mabi=lp32, >> lp64d etc. It might just be a toggle as -matomic=legacy,2019 etc (this >> is not suggestive just indicative). Otherwise there's another level of >> blowup in multilib testing etc. > > If I understand the history here, we're essentially catering to code > that is potentially relying on behavior that was never really > guaranteed. That's not really ABI -- it's more depending on specifics > of an implementation or undefined/underdefined behavior. Holding back > progress for that case seems short-sighted, particularly given how early > I hope we are in the RISC-V journey. > > > But I'm also sympathetic to the desire not to break existing code. I suppose ABI means a ton of things, but in this case that's really want I was trying to get at: just changing to the mappings suggested in the ISA manual risks producing binaries that don't work when mixed with old binaries. My rationale for calling that ABI was that there's a defacto memory model ABI defined as "anything that works with the old binaries", but ABI means so many things maybe we just shouldn't say it at all here? I'm going to call it "old binary compatibility" here, rather than "ABI compatibility", just so it's a different term. > Could we keep the old behavior under a flag and fix the default behavior > here, presumably with a bit in the ELF header indicating code that wants > the old behavior? The thread got forked a bit, but that's essentially what I was trying to suggest. I talked with Andrea a bit and here's how I'd describe it: We have a mapping from the C{,++}11 memory model to RVWMO that's currently emitted by GCC, and there are years of binaries with that mapping. As far as we know that mapping is correct, but I don't think anyone's gone through and formally analyzed it. Let's call those the "GCC mapping". There's also a mapping listed in the ISA manual. That's not the same as the GCC mapping, so let's call it the "ISA mapping". We need to double-check the specifics, but IIUC this ISA mapping is broadly followed by LLVM. It's also very likely to be correct, as it's been analyzed by lots of formal memory model people as part of the RVWMO standardization process. As far as I know, everyone likes the ISA mapping better than the GCC mapping. It's hard to describe why concretely because there's no hardware that implements RVWMO sufficiently aggressively that we can talk performance, but at least matching what's in the ISA manual is the way to go. Also, just kind of a personal opinion, the GCC mapping does some weird ugly stuff. So the question is really: how do we get rid of the GCC mapping while causing as little headache as possible? My proposal is as follows: * Let's properly analyze the GCC mapping, just on its own. Maybe it's broken, if it is then I think we've got a pretty decent argument to just ignore old binary compatibility -- if it was always broken then we can't break it, so who cares. * Assuming the GCC mapping is internally consistent, let's analyze arbitrary mixes of the GCC and ISA mappings. It's not generally true that two correct mappings can be freely mixed, but maybe we've got some cases that work fine. Maybe we even get lucky and everything's compatible, who knows (though I'm worried about the same .aq vs fence stuff we've had in LKMM a few times). * Assuming there's any issue mixing the GCC and ISA mappings, let's add a flag to GCC. Something like -mrvwmo-compat={legacy,both,isa}, where: - =legacy does what we have now. We can eventually deprecate this, and assuming =both is fast enough maybe we don't even need it. - =both implements a mapping that's compatible with both the GCC and ISA mappings. This might be slow or something, but it'll be correct with both. We can set this to the default now, as it's safe. - =isa implements the ISA mappings. Then we can go stick some marker in the ELF saying "this binary is compatible with the ISA mappings" to triage binaries in systems. That way distros can decide when they want to move to -mrvwmo-compat=isa by default, maybe when they naturally do a world rebuild. Eventually we can make that the default in GCC upstream and later deprecate the compatibility modes. That's going to take a bit of work on the formal memory model side of things, but I think we've at least got enough of Andrea's bandwidth to make it happen in a reasonable timeframe. I don't think there's a ton of implementation work to do once we sort out the desired mappings, and a chunk of that can probably start in parallel as we'll likely need stuff like the ELF tagging eventually. Landing this before stage4 might be tricky, though. I tell myself we're not going to take late backend features every release... ;)
On 10/13/22 15:39, Jeff Law via Gcc-patches wrote: > > On 10/11/22 17:31, Vineet Gupta wrote: >> >>> >>> I expect that the pressure for a proper fix upstream (instead of a >>> backward compatible compromise) will increase over time (once people >>> start building big iron based on RISC-V and start hunting performance >>> bottlenecks in multithreaded workloads to be competitive). >>> What could be done to get some relief is to enable the new atomics >>> ABI by a command line switch and promote its use. And at one point in >>> the future (if there are enough fixes to justify a break) the new ABI >>> can be enabled by default with a new flag to enable the old ABI. >> >> Indeed we are stuck with inefficiencies with status quo. The new abi >> option sounds like a reasonable plan going fwd. >> >> Also my understand is that while the considerations are ABI centric, >> the option to faciliate this need not be tied to canonical -mabi=lp32, >> lp64d etc. It might just be a toggle as -matomic=legacy,2019 etc (this >> is not suggestive just indicative). Otherwise there's another level of >> blowup in multilib testing etc. > > If I understand the history here, we're essentially catering to code > that is potentially relying on behavior that was never really > guaranteed. That's not really ABI -- it's more depending on specifics > of an implementation or undefined/underdefined behavior. Holding back > progress for that case seems short-sighted, particularly given how early > I hope we are in the RISC-V journey. Exactly. I keep hearing about the potential ABI breakage but no real discussion (publicly at least) which describe in detail what exactly that ABI / old-behavior breakage is with this patch series [1]. So perhaps we can start with reviewing the patches, calling out exactly what change causes the divergence and if that is acceptable or not. And while at it, perhaps also make updates to [2] [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595712.html [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100265
On Fri, Oct 14, 2022 at 1:15 AM Palmer Dabbelt <palmer@dabbelt.com> wrote: > On Thu, 13 Oct 2022 15:39:39 PDT (-0700), gcc-patches@gcc.gnu.org wrote: > > > > On 10/11/22 17:31, Vineet Gupta wrote: > >> > >>> > >>> I expect that the pressure for a proper fix upstream (instead of a > >>> backward compatible compromise) will increase over time (once people > >>> start building big iron based on RISC-V and start hunting performance > >>> bottlenecks in multithreaded workloads to be competitive). > >>> What could be done to get some relief is to enable the new atomics > >>> ABI by a command line switch and promote its use. And at one point in > >>> the future (if there are enough fixes to justify a break) the new ABI > >>> can be enabled by default with a new flag to enable the old ABI. > >> > >> Indeed we are stuck with inefficiencies with status quo. The new abi > >> option sounds like a reasonable plan going fwd. > >> > >> Also my understand is that while the considerations are ABI centric, > >> the option to faciliate this need not be tied to canonical -mabi=lp32, > >> lp64d etc. It might just be a toggle as -matomic=legacy,2019 etc (this > >> is not suggestive just indicative). Otherwise there's another level of > >> blowup in multilib testing etc. > > > > If I understand the history here, we're essentially catering to code > > that is potentially relying on behavior that was never really > > guaranteed. That's not really ABI -- it's more depending on specifics > > of an implementation or undefined/underdefined behavior. Holding back > > progress for that case seems short-sighted, particularly given how early > > I hope we are in the RISC-V journey. > > > > > > But I'm also sympathetic to the desire not to break existing code. > > I suppose ABI means a ton of things, but in this case that's really want > I was trying to get at: just changing to the mappings suggested in the > ISA manual risks producing binaries that don't work when mixed with old > binaries. My rationale for calling that ABI was that there's a defacto > memory model ABI defined as "anything that works with the old binaries", > but ABI means so many things maybe we just shouldn't say it at all here? > > I'm going to call it "old binary compatibility" here, rather than "ABI > compatibility", just so it's a different term. > > > Could we keep the old behavior under a flag and fix the default behavior > > here, presumably with a bit in the ELF header indicating code that wants > > the old behavior? > > The thread got forked a bit, but that's essentially what I was trying to > suggest. I talked with Andrea a bit and here's how I'd describe it: > > We have a mapping from the C{,++}11 memory model to RVWMO that's > currently emitted by GCC, and there are years of binaries with that > mapping. As far as we know that mapping is correct, but I don't think > anyone's gone through and formally analyzed it. Let's call those the > "GCC mapping". > > There's also a mapping listed in the ISA manual. That's not the same as > the GCC mapping, so let's call it the "ISA mapping". We need to > double-check the specifics, but IIUC this ISA mapping is broadly > followed by LLVM. It's also very likely to be correct, as it's been > analyzed by lots of formal memory model people as part of the RVWMO > standardization process. > > As far as I know, everyone likes the ISA mapping better than the GCC > mapping. It's hard to describe why concretely because there's no > hardware that implements RVWMO sufficiently aggressively that we can > talk performance, but at least matching what's in the ISA manual is the > way to go. Also, just kind of a personal opinion, the GCC mapping does > some weird ugly stuff. > My guess is people like the ISA mapping (more) because it has been documented and reviewed. And it is the product of a working group that worked out the RVWMO specification. This gives some confidence that we don't need to rework it massively because of correctness issues in the future. The GCC mapping has (as far as I understand) never been documented and reviewed (especially not by the RISC-V arch review group). It might be faster and more robust, but it also might be incomplete and slower. Finding this out is a research topic with an unpredictable outcome. And more or less nobody wants to take this risk or has the time and the required skills to work on the research. > > So the question is really: how do we get rid of the GCC mapping while > causing as little headache as possible? > > My proposal is as follows: > > * Let's properly analyze the GCC mapping, just on its own. Maybe it's > broken, if it is then I think we've got a pretty decent argument to > just ignore old binary compatibility -- if it was always broken then > we can't break it, so who cares. > * Assuming the GCC mapping is internally consistent, let's analyze > arbitrary mixes of the GCC and ISA mappings. It's not generally true > that two correct mappings can be freely mixed, but maybe we've got > some cases that work fine. Maybe we even get lucky and everything's > compatible, who knows (though I'm worried about the same .aq vs fence > stuff we've had in LKMM a few times). > * Assuming there's any issue mixing the GCC and ISA mappings, let's add > a flag to GCC. Something like -mrvwmo-compat={legacy,both,isa}, where: > - =legacy does what we have now. We can eventually deprecate this, > and assuming =both is fast enough maybe we don't even need it. > - =both implements a mapping that's compatible with both the GCC and > ISA mappings. This might be slow or something, but it'll be > correct with both. We can set this to the default now, as it's > safe. > - =isa implements the ISA mappings. > I think a patch for legacy/isa can be done on time for the GCC release window. I can rework my series accordingly. However, I see some risk for the "both" option on the timeline, so this would have to wait until such a compatible mapping exists. > > Then we can go stick some marker in the ELF saying "this binary is > compatible with the ISA mappings" to triage binaries in systems. That > way distros can decide when they want to move to -mrvwmo-compat=isa by > default, maybe when they naturally do a world rebuild. Eventually we > can make that the default in GCC upstream and later deprecate the > compatibility modes. > > That's going to take a bit of work on the formal memory model side of > things, but I think we've at least got enough of Andrea's bandwidth to > make it happen in a reasonable timeframe. I don't think there's a ton > of implementation work to do once we sort out the desired mappings, and > a chunk of that can probably start in parallel as we'll likely need > stuff like the ELF tagging eventually. > > Landing this before stage4 might be tricky, though. I tell myself we're > not going to take late backend features every release... ;) >
On 10/14/22 05:03, Christoph Müllner wrote: > > My guess is people like the ISA mapping (more) because it has been > documented and reviewed. > And it is the product of a working group that worked out the > RVWMO specification. > This gives some confidence that we don't need to rework it massively > because of correctness issues in the future. This stuff can be hard and if someone with deep experience in memory models has reviewed the ISA mapping, then I'd prefer it over the GCC mapping. It's just more likely the experts in the memory model space are more likely to get it right than a bunch of compiler junkies, no matter how smart we think we are :-) Maybe I'm being too optimistic, but it's not hard for me to see a path where GCC and LLVM both implement the ISA mapping by default. Anything else is just a path of long term pain. Jeff
On Fri, 14 Oct 2022 13:39:33 PDT (-0700), jeffreyalaw@gmail.com wrote: > > On 10/14/22 05:03, Christoph Müllner wrote: >> >> My guess is people like the ISA mapping (more) because it has been >> documented and reviewed. >> And it is the product of a working group that worked out the >> RVWMO specification. >> This gives some confidence that we don't need to rework it massively >> because of correctness issues in the future. > > This stuff can be hard and if someone with deep experience in memory > models has reviewed the ISA mapping, then I'd prefer it over the GCC > mapping. It's just more likely the experts in the memory model space > are more likely to get it right than a bunch of compiler junkies, no > matter how smart we think we are :-) That's not really proven the case for the ISA suggested Linux mappings. We've been through a bunch of rounds of review upstream and that's resulted in some differences. Some of that is likely related to the ISA mappings for Linux being incomplete (they punt on the tricky bits like interactions with spinlocks, which filter all over the place), and Linux doesn't have the same old binary compatibility issues (aka the in-kernel ABI is not stable between releases) so mapping churn isn't nearly as scary over there. Still not much of an argument in favor of the GCC mappings, though. I'm pretty sure nobody with a formal memory model backgound has looked at those, which is pretty much the worst spot to be in. That said, I don't think we can just say the ISA mappings are the way to go, they seem to suffer from some similar incompleteness issues (for example, there's no explicit mappings for std::atomic<T>::compare_exchange_{weak,strong}). So we'll still need to put in the work to make sure whatever mappings get implemented are correct. [ As an aside, I think LLVM is doing the wrong thing for some of the more complex forms of compare_exchange_weak. For example #include <atomic> bool f(std::atomic<long>& p, long& o, long n) { return p.compare_exchange_weak(o, n, std::memory_order_acq_rel, std::memory_order_release); } $ clang-15.0.0 -std=c++17 -O3 f(std::atomic<long>&, long&, long): # @f(std::atomic<long>&, long&, long) ld a4, 0(a1) .LBB0_3: # =>This Inner Loop Header: Depth=1 lr.d.aq a3, (a0) bne a3, a4, .LBB0_5 sc.d.rl a5, a2, (a0) bnez a5, .LBB0_3 .LBB0_5: xor a0, a3, a4 seqz a0, a0 beq a3, a4, .LBB0_2 sd a3, 0(a1) .LBB0_2: ret doesn't look to me like it provides release ordering on failure, but I'm not really a memory model person so maybe I'm missing something here. The GCC mapping is pretty ugly, but I think we do happen to have correct behavior in this case: # gcc-12.2.0 -std=c++17 -O3 f(std::atomic<long>&, long&, long): ld a5,0(a1) fence iorw,ow; 1: lr.d.aq a4,0(a0); bne a4,a5,1f; sc.d.aq a3,a2,0(a0); bnez a3,1b; 1: sub a5,a4,a5 seqz a0,a5 beq a5,zero,.L2 sd a4,0(a1) .L2: andi a0,a0,1 ret ] > Maybe I'm being too optimistic, but it's not hard for me to see a path > where GCC and LLVM both implement the ISA mapping by default. Anything > else is just a path of long term pain. I'd bet that most people want that, but in practice building any real systems in RISC-V land requires some degree of implementation-defined behavior as the specifications don't cover everything (even ignoring the whole PDF vs specification word games). That's not to say we should just ignore what's written down, just that there's more work to do even if we ignore compatibility with old binaries. I think the question here is whether it's worth putting in the extra work to provide a path for systems with old binaries to gradually upgrade to the ISA mappings, or if we just toss out those old binaries. I think we really need to see how bunch of a headache that compatibility mode is going to be, and the only way to do that is put in the time to analyze the GCC mappings. That said, I don't really personally care that much about old binaries. Really my only argument here is that we broke binary compatibility once (right before we upstreamed the port), that made a handful of people mad, and I told them we'd never do it again. I think we were all operating under the assumption that RISC-V would move an order of magnitude faster that it has, though, so maybe we're in a spot where nobody actually cares about extant binaries and we can just throw them all out. If we're going to do that, though, there's a bunch of other cruft that we should probably toss along with the GCC mappings... > > > Jeff
On Fri, 14 Oct 2022 14:57:22 PDT (-0700), Palmer Dabbelt wrote: > On Fri, 14 Oct 2022 13:39:33 PDT (-0700), jeffreyalaw@gmail.com wrote: >> >> On 10/14/22 05:03, Christoph Müllner wrote: >>> >>> My guess is people like the ISA mapping (more) because it has been >>> documented and reviewed. >>> And it is the product of a working group that worked out the >>> RVWMO specification. >>> This gives some confidence that we don't need to rework it massively >>> because of correctness issues in the future. >> >> This stuff can be hard and if someone with deep experience in memory >> models has reviewed the ISA mapping, then I'd prefer it over the GCC >> mapping. It's just more likely the experts in the memory model space >> are more likely to get it right than a bunch of compiler junkies, no >> matter how smart we think we are :-) > > That's not really proven the case for the ISA suggested Linux mappings. > We've been through a bunch of rounds of review upstream and that's > resulted in some differences. Some of that is likely related to the ISA > mappings for Linux being incomplete (they punt on the tricky bits like > interactions with spinlocks, which filter all over the place), and Linux > doesn't have the same old binary compatibility issues (aka the in-kernel > ABI is not stable between releases) so mapping churn isn't nearly as > scary over there. > > Still not much of an argument in favor of the GCC mappings, though. I'm > pretty sure nobody with a formal memory model backgound has looked at > those, which is pretty much the worst spot to be in. That said, I don't > think we can just say the ISA mappings are the way to go, they seem to > suffer from some similar incompleteness issues (for example, there's no > explicit mappings for std::atomic<T>::compare_exchange_{weak,strong}). > So we'll still need to put in the work to make sure whatever mappings > get implemented are correct. > > [ > As an aside, I think LLVM is doing the wrong thing for some of the more > complex forms of compare_exchange_weak. For example > > #include <atomic> > > bool f(std::atomic<long>& p, long& o, long n) { > return p.compare_exchange_weak(o, n, std::memory_order_acq_rel, std::memory_order_release); Eric points out this is bogus code, the spec forbids release as the fail ordering (which akes sense). Just kind of randomly permutating these ordering arguments ends up with some generated code that seems off, for example release/acq_rel produces lr/sc.rl (and no fences). I don't think any of that is very relevant for the GCC discussion, though, as there's a bunch of other mappings specified by the ISA and we should just sort this one out. > } > > > $ clang-15.0.0 -std=c++17 -O3 > f(std::atomic<long>&, long&, long): # @f(std::atomic<long>&, long&, long) > ld a4, 0(a1) > .LBB0_3: # =>This Inner Loop Header: Depth=1 > lr.d.aq a3, (a0) > bne a3, a4, .LBB0_5 > sc.d.rl a5, a2, (a0) > bnez a5, .LBB0_3 > .LBB0_5: > xor a0, a3, a4 > seqz a0, a0 > beq a3, a4, .LBB0_2 > sd a3, 0(a1) > .LBB0_2: > ret > > doesn't look to me like it provides release ordering on failure, but I'm > not really a memory model person so maybe I'm missing something here. > The GCC mapping is pretty ugly, but I think we do happen to have correct > behavior in this case: > > # gcc-12.2.0 -std=c++17 -O3 > f(std::atomic<long>&, long&, long): > ld a5,0(a1) > fence iorw,ow; 1: lr.d.aq a4,0(a0); bne a4,a5,1f; sc.d.aq a3,a2,0(a0); bnez a3,1b; 1: > sub a5,a4,a5 > seqz a0,a5 > beq a5,zero,.L2 > sd a4,0(a1) > .L2: > andi a0,a0,1 > ret > ] > >> Maybe I'm being too optimistic, but it's not hard for me to see a path >> where GCC and LLVM both implement the ISA mapping by default. Anything >> else is just a path of long term pain. > > I'd bet that most people want that, but in practice building any real > systems in RISC-V land requires some degree of implementation-defined > behavior as the specifications don't cover everything (even ignoring the > whole PDF vs specification word games). That's not to say we should > just ignore what's written down, just that there's more work to do even > if we ignore compatibility with old binaries. > > I think the question here is whether it's worth putting in the extra > work to provide a path for systems with old binaries to gradually > upgrade to the ISA mappings, or if we just toss out those old binaries. > I think we really need to see how bunch of a headache that compatibility > mode is going to be, and the only way to do that is put in the time to > analyze the GCC mappings. > > That said, I don't really personally care that much about old binaries. > Really my only argument here is that we broke binary compatibility once > (right before we upstreamed the port), that made a handful of people > mad, and I told them we'd never do it again. I think we were all > operating under the assumption that RISC-V would move an order of > magnitude faster that it has, though, so maybe we're in a spot where > nobody actually cares about extant binaries and we can just throw them > all out. > > If we're going to do that, though, there's a bunch of other cruft that > we should probably toss along with the GCC mappings... > >> >> >> Jeff > > -- > You received this message because you are subscribed to the Google Groups "gnu-toolchain" group. > To unsubscribe from this group and stop receiving emails from it, send an email to gnu-toolchain+unsubscribe@rivosinc.com. > To view this discussion on the web visit https://groups.google.com/a/rivosinc.com/d/msgid/gnu-toolchain/mhng-580bc331-c4d5-4f04-afd5-30f09f68b660%40palmer-ri-x1c9a. > For more options, visit https://groups.google.com/a/rivosinc.com/d/optout.
On Wed, Oct 12, 2022 at 07:16:20PM +0200, Andrea Parri wrote: > > > > +Andrea, in case he has time to look at the memory model / ABI > > > > issues. > > > +Jeff, who was offering to help when the threads got crossed. I'd punted on > > a lot of this in the hope Andrea could help out, as I'm not really a memory > > model guy and this is pretty far down the rabbit hole. Happy to have the > > help if you're offering, though, as what's there is likely a pretty big > > performance issue for anyone with a reasonable memory system. > > Thanks for linking me to the discussion and the remarks, Palmer. I'm > happy to help (and synchronized with Jeff/the community) as possible, > building a better understanding of the 'issues' at stake. Summarizing here some findings from looking at the currently-implemented and the proposed [1] mappings: - Current mapping is missing synchronization, notably atomic_compare_exchange_weak_explicit(-, -, -, memory_order_release, memory_order_relaxed); is unable to provide the (required) release ordering guarantees; for reference, I've reported a litmus test illustrating it at the bottom of this email, cf. c-cmpxchg. - [1] addressed the "memory_order_release" problem/bug mentioned above (as well as other quirks of the current mapping I won't detail here), but it doesn't address other problems present in the current mapping; in particular, both mappings translate the following atomic_compare_exchange_weak_explicit(-, -, -, memory_order_acquire, memory_order_relaxed); to a sequence lr.w bne sc.w.aq (withouth any other synchronization/fences), which contrasts with the the Unprivileged Spec, Section 10,2 "Load-Reserve / Store-Conditional Instructions": "Software should not set the 'rl' bit on an LR instruction unless the 'aq' bit is also set, nor should software set the 'aq' bit on an SC instruction unless the 'rl' bit is also set. LR.rl and SC.aq instructions are not guaranteed to provide any stronger ordering than those with both bits clear [...]" Thanks, Andrea [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595712.html C c-cmpxchg {} P0(atomic_int *x, atomic_int *y, int *z) { int r0; atomic_store_explicit(x, 1, memory_order_relaxed); r0 = atomic_compare_exchange_weak_explicit(y, z, 1, memory_order_release, memory_order_relaxed); } P1(atomic_int *x, atomic_int *y) { int r1; int r2; r1 = atomic_load_explicit(y, memory_order_acquire); r2 = atomic_load_explicit(x, memory_order_relaxed); } exists (1:r1=1 /\ 1:r2=0)
On 10/20/22 13:01, Andrea Parri wrote: > On Wed, Oct 12, 2022 at 07:16:20PM +0200, Andrea Parri wrote: >>>>> +Andrea, in case he has time to look at the memory model / ABI >>>>> issues. >>> +Jeff, who was offering to help when the threads got crossed. I'd punted on >>> a lot of this in the hope Andrea could help out, as I'm not really a memory >>> model guy and this is pretty far down the rabbit hole. Happy to have the >>> help if you're offering, though, as what's there is likely a pretty big >>> performance issue for anyone with a reasonable memory system. >> Thanks for linking me to the discussion and the remarks, Palmer. I'm >> happy to help (and synchronized with Jeff/the community) as possible, >> building a better understanding of the 'issues' at stake. > Summarizing here some findings from looking at the currently-implemented > and the proposed [1] mappings: > > - Current mapping is missing synchronization, notably > > atomic_compare_exchange_weak_explicit(-, -, -, > memory_order_release, > memory_order_relaxed); > > is unable to provide the (required) release ordering guarantees; for > reference, I've reported a litmus test illustrating it at the bottom > of this email, cf. c-cmpxchg. > > - [1] addressed the "memory_order_release" problem/bug mentioned above > (as well as other quirks of the current mapping I won't detail here), > but it doesn't address other problems present in the current mapping; > in particular, both mappings translate the following > > atomic_compare_exchange_weak_explicit(-, -, -, > memory_order_acquire, > memory_order_relaxed); > > to a sequence > > lr.w > bne > sc.w.aq > > (withouth any other synchronization/fences), which contrasts with the > the Unprivileged Spec, Section 10,2 "Load-Reserve / Store-Conditional > Instructions": > > "Software should not set the 'rl' bit on an LR instruction unless > the 'aq' bit is also set, nor should software set the 'aq' bit on > an SC instruction unless the 'rl' bit is also set. LR.rl and SC.aq > instructions are not guaranteed to provide any stronger ordering > than those with both bits clear [...]" So it sounds like Christoph's patch is an improvement, but isn't complete. Given the pain in this space, I'd be hesitant to put in an incomplete fix, even if it moves things in the right direction as it creates another compatibility headache if we don't get the complete solution in place for gcc-13. Christoph, thoughts on the case Andrea pointed out? Jeff