mbox series

[v5,00/10] Support new pmem flush and sync instructions for POWER

Message ID 20200610062343.492293-1-aneesh.kumar@linux.ibm.com (mailing list archive)
Headers show
Series Support new pmem flush and sync instructions for POWER | expand

Message

Aneesh Kumar K V June 10, 2020, 6:23 a.m. UTC
This patch series enables the usage os new pmem flush and sync instructions on POWER
architecture. POWER10 introduces two new variants of dcbf instructions (dcbstps and dcbfps)
that can be used to write modified locations back to persistent storage. Additionally,
POWER10 also introduce phwsync and plwsync which can be used to establish order of these
writes to persistent storage.
    
This series exposes these instructions to the rest of the kernel. The existing
dcbf and hwsync instructions in P8 and P9 are adequate to enable appropriate
synchronization with OpenCAPI-hosted persistent storage. Hence the new instructions
are added as a variant of the old ones that old hardware won't differentiate.

On POWER10, pmem devices will be represented by a different device tree compat
strings. This ensures that older kernels won't initialize pmem devices on POWER10.

W.r.t userspace we want to make sure applications are enabled to use MAP_SYNC only
if they are using the new instructions. To avoid the wrong usage of MAP_SYNC on
newer hardware, we disable MAP_SYNC by default on newer hardware. The namespace specific
attribute /sys/block/pmem0/dax/sync_fault can be used to enable MAP_SYNC later.

With this:
1) vPMEM continues to work since it is a volatile region. That 
doesn't need any flush instructions.

2) pmdk and other user applications get updated to use new instructions
and updated packages are made available to all distributions

3) On newer hardware, the device will appear with a new compat string. 
Hence older distributions won't initialize pmem on newer hardware.

4) If we have a newer kernel with an older distro, we use the per 
namespace sysfs knob that prevents the usage of MAP_SYNC.

5) Sometime in the future, we mark the CONFIG_ARCH_MAP_SYNC_DISABLE=n
on ppc64 when we are confident that everybody is using the new flush 
instruction.

Chaanges from V4:
* Add namespace specific sychronous fault control.

Changes from V3:
* Add new compat string to be used for the device.
* Use arch_pmem_flush_barrier() in dm-writecache.

Aneesh Kumar K.V (10):
  powerpc/pmem: Restrict papr_scm to P8 and above.
  powerpc/pmem: Add new instructions for persistent storage and sync
  powerpc/pmem: Add flush routines using new pmem store and sync
    instruction
  libnvdimm/nvdimm/flush: Allow architecture to override the flush
    barrier
  powerpc/pmem/of_pmem: Update of_pmem to use the new barrier
    instruction.
  powerpc/pmem: Avoid the barrier in flush routines
  powerpc/book3s/pmem: Add WARN_ONCE to catch the wrong usage of pmem
    flush functions.
  libnvdimm/dax: Add a dax flag to control synchronous fault support
  powerpc/pmem: Disable synchronous fault by default
  powerpc/pmem: Initialize pmem device on newer hardware

 arch/powerpc/include/asm/cacheflush.h     | 10 ++++
 arch/powerpc/include/asm/ppc-opcode.h     | 12 ++++
 arch/powerpc/lib/pmem.c                   | 46 ++++++++++++--
 arch/powerpc/platforms/Kconfig.cputype    |  9 +++
 arch/powerpc/platforms/pseries/papr_scm.c | 31 +++++++++-
 arch/powerpc/platforms/pseries/pmem.c     |  6 ++
 drivers/dax/bus.c                         |  2 +-
 drivers/dax/super.c                       | 73 +++++++++++++++++++++++
 drivers/md/dm-writecache.c                |  2 +-
 drivers/nvdimm/of_pmem.c                  |  8 +++
 drivers/nvdimm/pmem.c                     |  4 ++
 drivers/nvdimm/region_devs.c              | 24 ++++++--
 include/linux/dax.h                       | 16 +++++
 include/linux/libnvdimm.h                 |  8 +++
 mm/Kconfig                                |  3 +
 15 files changed, 243 insertions(+), 11 deletions(-)

Comments

Aneesh Kumar K V June 19, 2020, 1:10 p.m. UTC | #1
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:

> This patch series enables the usage os new pmem flush and sync instructions on POWER
> architecture. POWER10 introduces two new variants of dcbf instructions (dcbstps and dcbfps)
> that can be used to write modified locations back to persistent storage. Additionally,
> POWER10 also introduce phwsync and plwsync which can be used to establish order of these
> writes to persistent storage.
>     
> This series exposes these instructions to the rest of the kernel. The existing
> dcbf and hwsync instructions in P8 and P9 are adequate to enable appropriate
> synchronization with OpenCAPI-hosted persistent storage. Hence the new instructions
> are added as a variant of the old ones that old hardware won't differentiate.
>
> On POWER10, pmem devices will be represented by a different device tree compat
> strings. This ensures that older kernels won't initialize pmem devices on POWER10.
>
> W.r.t userspace we want to make sure applications are enabled to use MAP_SYNC only
> if they are using the new instructions. To avoid the wrong usage of MAP_SYNC on
> newer hardware, we disable MAP_SYNC by default on newer hardware. The namespace specific
> attribute /sys/block/pmem0/dax/sync_fault can be used to enable MAP_SYNC later.
>
> With this:
> 1) vPMEM continues to work since it is a volatile region. That 
> doesn't need any flush instructions.
>
> 2) pmdk and other user applications get updated to use new instructions
> and updated packages are made available to all distributions
>
> 3) On newer hardware, the device will appear with a new compat string. 
> Hence older distributions won't initialize pmem on newer hardware.
>
> 4) If we have a newer kernel with an older distro, we use the per 
> namespace sysfs knob that prevents the usage of MAP_SYNC.
>
> 5) Sometime in the future, we mark the CONFIG_ARCH_MAP_SYNC_DISABLE=n
> on ppc64 when we are confident that everybody is using the new flush 
> instruction.
>
> Chaanges from V4:
> * Add namespace specific sychronous fault control.
>
> Changes from V3:
> * Add new compat string to be used for the device.
> * Use arch_pmem_flush_barrier() in dm-writecache.
>
> Aneesh Kumar K.V (10):
>   powerpc/pmem: Restrict papr_scm to P8 and above.
>   powerpc/pmem: Add new instructions for persistent storage and sync
>   powerpc/pmem: Add flush routines using new pmem store and sync
>     instruction
>   libnvdimm/nvdimm/flush: Allow architecture to override the flush
>     barrier
>   powerpc/pmem/of_pmem: Update of_pmem to use the new barrier
>     instruction.
>   powerpc/pmem: Avoid the barrier in flush routines
>   powerpc/book3s/pmem: Add WARN_ONCE to catch the wrong usage of pmem
>     flush functions.
>   libnvdimm/dax: Add a dax flag to control synchronous fault support
>   powerpc/pmem: Disable synchronous fault by default
>   powerpc/pmem: Initialize pmem device on newer hardware
>
>  arch/powerpc/include/asm/cacheflush.h     | 10 ++++
>  arch/powerpc/include/asm/ppc-opcode.h     | 12 ++++
>  arch/powerpc/lib/pmem.c                   | 46 ++++++++++++--
>  arch/powerpc/platforms/Kconfig.cputype    |  9 +++
>  arch/powerpc/platforms/pseries/papr_scm.c | 31 +++++++++-
>  arch/powerpc/platforms/pseries/pmem.c     |  6 ++
>  drivers/dax/bus.c                         |  2 +-
>  drivers/dax/super.c                       | 73 +++++++++++++++++++++++
>  drivers/md/dm-writecache.c                |  2 +-
>  drivers/nvdimm/of_pmem.c                  |  8 +++
>  drivers/nvdimm/pmem.c                     |  4 ++
>  drivers/nvdimm/region_devs.c              | 24 ++++++--
>  include/linux/dax.h                       | 16 +++++
>  include/linux/libnvdimm.h                 |  8 +++
>  mm/Kconfig                                |  3 +
>  15 files changed, 243 insertions(+), 11 deletions(-)

Ping.

Are we good with the approach here? 

-aneesh