diff mbox series

[3/3,RFC] powerpc: Book3S 64-bit outline-only KASAN support

Message ID YntlprNj9jNOsi/I@cleo (mailing list archive)
State Changes Requested
Headers show
Series KASAN support for 64-bit Book 3S powerpc | expand

Commit Message

Paul Mackerras May 11, 2022, 7:28 a.m. UTC
From: Daniel Axtens <dja@axtens.net>

Implement a limited form of KASAN for Book3S 64-bit machines running under
the Radix MMU, supporting only outline mode.

 - Enable the compiler instrumentation to check addresses and maintain the
   shadow region. (This is the guts of KASAN which we can easily reuse.)

 - Require kasan-vmalloc support to handle modules and anything else in
   vmalloc space.

 - KASAN needs to be able to validate all pointer accesses, but we can't
   instrument all kernel addresses - only linear map and vmalloc. On boot,
   set up a single page of read-only shadow that marks all iomap and
   vmemmap accesses as valid.

 - Document KASAN in both generic and powerpc docs.

Background
----------

KASAN support on Book3S is a bit tricky to get right:

 - It would be good to support inline instrumentation so as to be able to
   catch stack issues that cannot be caught with outline mode.

 - Inline instrumentation requires a fixed offset.

 - Book3S runs code with translations off ("real mode") during boot,
   including a lot of generic device-tree parsing code which is used to
   determine MMU features.

    [ppc64 mm note: The kernel installs a linear mapping at effective
    address c000...-c008.... This is a one-to-one mapping with physical
    memory from 0000... onward. Because of how memory accesses work on
    powerpc 64-bit Book3S, a kernel pointer in the linear map accesses the
    same memory both with translations on (accessing as an 'effective
    address'), and with translations off (accessing as a 'real
    address'). This works in both guests and the hypervisor. For more
    details, see s5.7 of Book III of version 3 of the ISA, in particular
    the Storage Control Overview, s5.7.3, and s5.7.5 - noting that this
    KASAN implementation currently only supports Radix.]

 - Some code - most notably a lot of KVM code - also runs with translations
   off after boot.

 - Therefore any offset has to point to memory that is valid with
   translations on or off.

One approach is just to give up on inline instrumentation. This way
boot-time checks can be delayed until after the MMU is set is up, and we
can just not instrument any code that runs with translations off after
booting. Take this approach for now and require outline instrumentation.

Previous attempts allowed inline instrumentation. However, they came with
some unfortunate restrictions: only physically contiguous memory could be
used and it had to be specified at compile time. Maybe we can do better in
the future.

[paulus@ozlabs.org - Rebased onto 5.17.  Still RFC because:

 (a) A kernel with CONFIG_KASAN=y will crash during boot on a machine
 using HPT translation because not all the entry points to the generic
 KASAN code are protected with a call to kasan_arch_is_ready() -- at
 least __kasan_kfree_large() and __kasan_check_byte() need such a
 guard, and probably others.  Previous versions of this code had a
 panic() call in kasan_init() to prevent a KASAN-enabled kernel from
 booting on a HPT machine, but unfortunately the panic string doesn't
 actually get printed to the console, because kasan_init gets called
 very early.

 (b) Of the KASAN tests included in the kernel source, all the
 relevant ones pass except for the "global_oob_right" test, and it
 only fails when CONFIG_KASAN_KUNIT_TEST=m; it passes when the kunit
 tests are built in.  That is, KASAN is not detecting out-of-bounds
 accesses to module-global data.
]

Originally-by: Balbir Singh <bsingharora@gmail.com> # ppc64 out-of-line radix version
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 Documentation/dev-tools/kasan.rst            |  11 +-
 Documentation/powerpc/kasan.txt              |  48 ++++++++-
 arch/powerpc/Kconfig                         |   5 +-
 arch/powerpc/Kconfig.debug                   |   3 +-
 arch/powerpc/include/asm/book3s/64/hash.h    |   4 +
 arch/powerpc/include/asm/book3s/64/pgtable.h |   3 +
 arch/powerpc/include/asm/book3s/64/radix.h   |  13 ++-
 arch/powerpc/include/asm/interrupt.h         |  52 +++++++---
 arch/powerpc/include/asm/kasan.h             |  22 ++++
 arch/powerpc/include/asm/ptrace.h            |   6 --
 arch/powerpc/kernel/Makefile                 |  11 ++
 arch/powerpc/kernel/process.c                |  16 +--
 arch/powerpc/kernel/smp.c                    |  22 ++--
 arch/powerpc/kernel/traps.c                  |   6 +-
 arch/powerpc/kexec/Makefile                  |   2 +
 arch/powerpc/kvm/Makefile                    |   5 +
 arch/powerpc/lib/Makefile                    |   3 +
 arch/powerpc/mm/book3s64/Makefile            |   9 ++
 arch/powerpc/mm/kasan/Makefile               |   1 +
 arch/powerpc/mm/kasan/init_book3s_64.c       | 103 +++++++++++++++++++
 arch/powerpc/mm/ptdump/ptdump.c              |   3 +-
 arch/powerpc/platforms/Kconfig.cputype       |   1 +
 arch/powerpc/platforms/powernv/Makefile      |   8 ++
 arch/powerpc/platforms/powernv/smp.c         |   2 +-
 arch/powerpc/platforms/pseries/Makefile      |   6 ++
 arch/powerpc/sysdev/xics/xics-common.c       |   4 +-
 arch/powerpc/sysdev/xive/common.c            |   4 +-
 27 files changed, 318 insertions(+), 55 deletions(-)
 create mode 100644 arch/powerpc/mm/kasan/init_book3s_64.c

Comments

Christophe Leroy May 15, 2022, 7:33 a.m. UTC | #1
Le 11/05/2022 à 09:28, Paul Mackerras a écrit :
> From: Daniel Axtens <dja@axtens.net>
> 
> Implement a limited form of KASAN for Book3S 64-bit machines running under
> the Radix MMU, supporting only outline mode.
> 
>   - Enable the compiler instrumentation to check addresses and maintain the
>     shadow region. (This is the guts of KASAN which we can easily reuse.)
> 
>   - Require kasan-vmalloc support to handle modules and anything else in
>     vmalloc space.
> 
>   - KASAN needs to be able to validate all pointer accesses, but we can't
>     instrument all kernel addresses - only linear map and vmalloc. On boot,
>     set up a single page of read-only shadow that marks all iomap and
>     vmemmap accesses as valid.
> 
>   - Document KASAN in both generic and powerpc docs.
> 
> Background
> ----------
> 
> KASAN support on Book3S is a bit tricky to get right:
> 
>   - It would be good to support inline instrumentation so as to be able to
>     catch stack issues that cannot be caught with outline mode.
> 
>   - Inline instrumentation requires a fixed offset.
> 
>   - Book3S runs code with translations off ("real mode") during boot,
>     including a lot of generic device-tree parsing code which is used to
>     determine MMU features.
> 
>      [ppc64 mm note: The kernel installs a linear mapping at effective
>      address c000...-c008.... This is a one-to-one mapping with physical
>      memory from 0000... onward. Because of how memory accesses work on
>      powerpc 64-bit Book3S, a kernel pointer in the linear map accesses the
>      same memory both with translations on (accessing as an 'effective
>      address'), and with translations off (accessing as a 'real
>      address'). This works in both guests and the hypervisor. For more
>      details, see s5.7 of Book III of version 3 of the ISA, in particular
>      the Storage Control Overview, s5.7.3, and s5.7.5 - noting that this
>      KASAN implementation currently only supports Radix.]
> 
>   - Some code - most notably a lot of KVM code - also runs with translations
>     off after boot.
> 
>   - Therefore any offset has to point to memory that is valid with
>     translations on or off.
> 
> One approach is just to give up on inline instrumentation. This way
> boot-time checks can be delayed until after the MMU is set is up, and we
> can just not instrument any code that runs with translations off after
> booting. Take this approach for now and require outline instrumentation.
> 
> Previous attempts allowed inline instrumentation. However, they came with
> some unfortunate restrictions: only physically contiguous memory could be
> used and it had to be specified at compile time. Maybe we can do better in
> the future.
> 
> [paulus@ozlabs.org - Rebased onto 5.17.  Still RFC because:
> 
>   (a) A kernel with CONFIG_KASAN=y will crash during boot on a machine
>   using HPT translation because not all the entry points to the generic
>   KASAN code are protected with a call to kasan_arch_is_ready() -- at
>   least __kasan_kfree_large() and __kasan_check_byte() need such a
>   guard, and probably others.  Previous versions of this code had a
>   panic() call in kasan_init() to prevent a KASAN-enabled kernel from
>   booting on a HPT machine, but unfortunately the panic string doesn't
>   actually get printed to the console, because kasan_init gets called
>   very early.
> 
>   (b) Of the KASAN tests included in the kernel source, all the
>   relevant ones pass except for the "global_oob_right" test, and it
>   only fails when CONFIG_KASAN_KUNIT_TEST=m; it passes when the kunit
>   tests are built in.  That is, KASAN is not detecting out-of-bounds
>   accesses to module-global data.
> ]
> 
> Originally-by: Balbir Singh <bsingharora@gmail.com> # ppc64 out-of-line radix version
> Signed-off-by: Daniel Axtens <dja@axtens.net>
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> ---
>   Documentation/dev-tools/kasan.rst            |  11 +-
>   Documentation/powerpc/kasan.txt              |  48 ++++++++-
>   arch/powerpc/Kconfig                         |   5 +-
>   arch/powerpc/Kconfig.debug                   |   3 +-
>   arch/powerpc/include/asm/book3s/64/hash.h    |   4 +
>   arch/powerpc/include/asm/book3s/64/pgtable.h |   3 +
>   arch/powerpc/include/asm/book3s/64/radix.h   |  13 ++-
>   arch/powerpc/include/asm/interrupt.h         |  52 +++++++---
>   arch/powerpc/include/asm/kasan.h             |  22 ++++
>   arch/powerpc/include/asm/ptrace.h            |   6 --
>   arch/powerpc/kernel/Makefile                 |  11 ++
>   arch/powerpc/kernel/process.c                |  16 +--
>   arch/powerpc/kernel/smp.c                    |  22 ++--
>   arch/powerpc/kernel/traps.c                  |   6 +-
>   arch/powerpc/kexec/Makefile                  |   2 +
>   arch/powerpc/kvm/Makefile                    |   5 +
>   arch/powerpc/lib/Makefile                    |   3 +
>   arch/powerpc/mm/book3s64/Makefile            |   9 ++
>   arch/powerpc/mm/kasan/Makefile               |   1 +
>   arch/powerpc/mm/kasan/init_book3s_64.c       | 103 +++++++++++++++++++
>   arch/powerpc/mm/ptdump/ptdump.c              |   3 +-
>   arch/powerpc/platforms/Kconfig.cputype       |   1 +
>   arch/powerpc/platforms/powernv/Makefile      |   8 ++
>   arch/powerpc/platforms/powernv/smp.c         |   2 +-
>   arch/powerpc/platforms/pseries/Makefile      |   6 ++
>   arch/powerpc/sysdev/xics/xics-common.c       |   4 +-
>   arch/powerpc/sysdev/xive/common.c            |   4 +-
>   27 files changed, 318 insertions(+), 55 deletions(-)
>   create mode 100644 arch/powerpc/mm/kasan/init_book3s_64.c
> 
> diff --git a/Documentation/dev-tools/kasan.rst b/Documentation/dev-tools/kasan.rst
> index 8e5cd26d59cb..448995c11bee 100644
> --- a/Documentation/dev-tools/kasan.rst
> +++ b/Documentation/dev-tools/kasan.rst
> @@ -36,8 +36,9 @@ Both software KASAN modes work with SLUB and SLAB memory allocators,
>   while the hardware tag-based KASAN currently only supports SLUB.
>   
>   Currently, generic KASAN is supported for the x86_64, arm, arm64, xtensa, s390,
> -and riscv architectures. It is also supported on 32-bit powerpc kernels.
> -Tag-based KASAN modes are supported only for arm64.
> +and riscv architectures. It is also supported on powerpc for 32-bit kernels and
> +for 64-bit kernels running under the Radix MMU. Tag-based KASAN modes are
> +supported only for arm64.
>   
>   Usage
>   -----
> @@ -350,10 +351,10 @@ CONFIG_KASAN_VMALLOC
>   
>   With ``CONFIG_KASAN_VMALLOC``, KASAN can cover vmalloc space at the
>   cost of greater memory usage. Currently, this is supported on x86,
> -riscv, s390, and 32-bit powerpc.
> +riscv, s390, and powerpc.
>   
> -It is optional, except on 32-bit powerpc kernels with module support,
> -where it is required.
> +It is optional, except on 64-bit powerpc kernels, and on 32-bit
> +powerpc kernels with module support, where it is required.
>   
>   This works by hooking into vmalloc and vmap and dynamically
>   allocating real shadow memory to back the mappings.
> diff --git a/Documentation/powerpc/kasan.txt b/Documentation/powerpc/kasan.txt
> index 26bb0e8bb18c..f032b4eaf205 100644
> --- a/Documentation/powerpc/kasan.txt
> +++ b/Documentation/powerpc/kasan.txt
> @@ -1,4 +1,4 @@
> -KASAN is supported on powerpc on 32-bit only.
> +KASAN is supported on powerpc on 32-bit and Radix 64-bit only.
>   
>   32 bit support
>   ==============
> @@ -10,3 +10,49 @@ fixmap area and occupies one eighth of the total kernel virtual memory space.
>   
>   Instrumentation of the vmalloc area is optional, unless built with modules,
>   in which case it is required.
> +
> +64 bit support
> +==============
> +
> +Currently, only the radix MMU is supported. There have been versions for hash
> +and Book3E processors floating around on the mailing list, but nothing has been
> +merged.
> +
> +KASAN support on Book3S is a bit tricky to get right:
> +
> + - It would be good to support inline instrumentation so as to be able to catch
> +   stack issues that cannot be caught with outline mode.
> +
> + - Inline instrumentation requires a fixed offset.
> +
> + - Book3S runs code with translations off ("real mode") during boot, including a
> +   lot of generic device-tree parsing code which is used to determine MMU
> +   features.
> +
> + - Some code - most notably a lot of KVM code - also runs with translations off
> +   after boot.
> +
> + - Therefore any offset has to point to memory that is valid with
> +   translations on or off.
> +
> +One approach is just to give up on inline instrumentation. This way boot-time
> +checks can be delayed until after the MMU is set is up, and we can just not
> +instrument any code that runs with translations off after booting. This is the
> +current approach.
> +
> +To avoid this limitiation, the KASAN shadow would have to be placed inside the
> +linear mapping, using the same high-bits trick we use for the rest of the linear
> +mapping. This is tricky:
> +
> + - We'd like to place it near the start of physical memory. In theory we can do
> +   this at run-time based on how much physical memory we have, but this requires
> +   being able to arbitrarily relocate the kernel, which is basically the tricky
> +   part of KASLR. Not being game to implement both tricky things at once, this
> +   is hopefully something we can revisit once we get KASLR for Book3S.
> +
> + - Alternatively, we can place the shadow at the _end_ of memory, but this
> +   requires knowing how much contiguous physical memory a system has _at compile
> +   time_. This is a big hammer, and has some unfortunate consequences: inablity
> +   to handle discontiguous physical memory, total failure to boot on machines
> +   with less memory than specified, and that machines with more memory than
> +   specified can't use it. This was deemed unacceptable.
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index b779603978e1..0bf76e40c9c2 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -105,6 +105,7 @@ config PPC
>   	# Please keep this list sorted alphabetically.
>   	#
>   	select ARCH_32BIT_OFF_T if PPC32
> +	select ARCH_DISABLE_KASAN_INLINE	if PPC_RADIX_MMU
>   	select ARCH_ENABLE_MEMORY_HOTPLUG
>   	select ARCH_ENABLE_MEMORY_HOTREMOVE
>   	select ARCH_HAS_COPY_MC			if PPC64
> @@ -152,6 +153,7 @@ config PPC
>   	select ARCH_WANT_IPC_PARSE_VERSION
>   	select ARCH_WANT_IRQS_OFF_ACTIVATE_MM
>   	select ARCH_WANT_LD_ORPHAN_WARN
> +	select ARCH_WANTS_NO_INSTR

Can you explain why we need that ?

Is it tied to KASAN ? Is yes, why didn't we have it for PPC32 until now ?

Maybe that's independant of KASAN and would be worth a separate patch ?

>   	select ARCH_WEAK_RELEASE_ACQUIRE
>   	select BINFMT_ELF
>   	select BUILDTIME_TABLE_SORT
> @@ -183,7 +185,8 @@ config PPC
>   	select HAVE_ARCH_JUMP_LABEL
>   	select HAVE_ARCH_JUMP_LABEL_RELATIVE
>   	select HAVE_ARCH_KASAN			if PPC32 && PPC_PAGE_SHIFT <= 14
> -	select HAVE_ARCH_KASAN_VMALLOC		if PPC32 && PPC_PAGE_SHIFT <= 14
> +	select HAVE_ARCH_KASAN			if PPC_RADIX_MMU
> +	select HAVE_ARCH_KASAN_VMALLOC		if HAVE_ARCH_KASAN
>   	select HAVE_ARCH_KFENCE			if PPC_BOOK3S_32 || PPC_8xx || 40x
>   	select HAVE_ARCH_KGDB
>   	select HAVE_ARCH_MMAP_RND_BITS
> diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
> index 192f0ed0097f..9f363c143d86 100644
> --- a/arch/powerpc/Kconfig.debug
> +++ b/arch/powerpc/Kconfig.debug
> @@ -374,4 +374,5 @@ config PPC_FAST_ENDIAN_SWITCH
>   config KASAN_SHADOW_OFFSET
>   	hex
>   	depends on KASAN
> -	default 0xe0000000
> +	default 0xe0000000 if PPC32
> +	default 0xa80e000000000000 if PPC64
> diff --git a/arch/powerpc/include/asm/book3s/64/hash.h b/arch/powerpc/include/asm/book3s/64/hash.h
> index a7a0572f3846..17e7a778c856 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash.h
> @@ -18,6 +18,10 @@
>   #include <asm/book3s/64/hash-4k.h>
>   #endif
>   
> +#define H_PTRS_PER_PTE		(1 << H_PTE_INDEX_SIZE)
> +#define H_PTRS_PER_PMD		(1 << H_PMD_INDEX_SIZE)
> +#define H_PTRS_PER_PUD		(1 << H_PUD_INDEX_SIZE)
> +
>   /* Bits to set in a PMD/PUD/PGD entry valid bit*/
>   #define HASH_PMD_VAL_BITS		(0x8000000000000000UL)
>   #define HASH_PUD_VAL_BITS		(0x8000000000000000UL)
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index 875730d5af40..010eb373fcb3 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -232,6 +232,9 @@ extern unsigned long __pmd_frag_size_shift;
>   #define PTRS_PER_PUD	(1 << PUD_INDEX_SIZE)
>   #define PTRS_PER_PGD	(1 << PGD_INDEX_SIZE)
>   
> +#define MAX_PTRS_PER_PTE ((H_PTRS_PER_PTE > R_PTRS_PER_PTE) ? H_PTRS_PER_PTE : R_PTRS_PER_PTE)
> +#define MAX_PTRS_PER_PMD ((H_PTRS_PER_PMD > R_PTRS_PER_PMD) ? H_PTRS_PER_PMD : R_PTRS_PER_PMD)
> +#define MAX_PTRS_PER_PUD ((H_PTRS_PER_PUD > R_PTRS_PER_PUD) ? H_PTRS_PER_PUD : R_PTRS_PER_PUD)
>   #define MAX_PTRS_PER_PGD	(1 << (H_PGD_INDEX_SIZE > RADIX_PGD_INDEX_SIZE ? \
>   				       H_PGD_INDEX_SIZE : RADIX_PGD_INDEX_SIZE))
>   
> diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
> index d090d9612348..bafc9869afcd 100644
> --- a/arch/powerpc/include/asm/book3s/64/radix.h
> +++ b/arch/powerpc/include/asm/book3s/64/radix.h
> @@ -35,6 +35,11 @@
>   #define RADIX_PMD_SHIFT		(PAGE_SHIFT + RADIX_PTE_INDEX_SIZE)
>   #define RADIX_PUD_SHIFT		(RADIX_PMD_SHIFT + RADIX_PMD_INDEX_SIZE)
>   #define RADIX_PGD_SHIFT		(RADIX_PUD_SHIFT + RADIX_PUD_INDEX_SIZE)
> +
> +#define R_PTRS_PER_PTE		(1 << RADIX_PTE_INDEX_SIZE)
> +#define R_PTRS_PER_PMD		(1 << RADIX_PMD_INDEX_SIZE)
> +#define R_PTRS_PER_PUD		(1 << RADIX_PUD_INDEX_SIZE)
> +
>   /*
>    * Size of EA range mapped by our pagetables.
>    */
> @@ -68,11 +73,11 @@
>    *
>    *
>    * 3rd quadrant expanded:
> - * +------------------------------+
> + * +------------------------------+  Highest address (0xc010000000000000)
> + * +------------------------------+  KASAN shadow end (0xc00fc00000000000)
>    * |                              |
>    * |                              |
> - * |                              |
> - * +------------------------------+  Kernel vmemmap end (0xc010000000000000)
> + * +------------------------------+  Kernel vmemmap end/shadow start (0xc00e000000000000)
>    * |                              |
>    * |           512TB		  |
>    * |                              |
> @@ -126,6 +131,8 @@
>   #define RADIX_VMEMMAP_SIZE	RADIX_KERN_MAP_SIZE
>   #define RADIX_VMEMMAP_END	(RADIX_VMEMMAP_START + RADIX_VMEMMAP_SIZE)
>   
> +/* For the sizes of the shadow area, see kasan.h */
> +

Why does this comment pops up here ?

>   #ifndef __ASSEMBLY__
>   #define RADIX_PTE_TABLE_SIZE	(sizeof(pte_t) << RADIX_PTE_INDEX_SIZE)
>   #define RADIX_PMD_TABLE_SIZE	(sizeof(pmd_t) << RADIX_PMD_INDEX_SIZE)
> diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h
> index fc28f46d2f9d..fb244b6ca7f0 100644
> --- a/arch/powerpc/include/asm/interrupt.h
> +++ b/arch/powerpc/include/asm/interrupt.h
> @@ -327,22 +327,46 @@ static inline void interrupt_nmi_enter_prepare(struct pt_regs *regs, struct inte
>   	}
>   #endif
>   
> +	/* If data relocations are enabled, it's safe to use nmi_enter() */
> +	if (mfmsr() & MSR_DR) {
> +		nmi_enter();
> +		return;
> +	}
> +
>   	/*
> -	 * Do not use nmi_enter() for pseries hash guest taking a real-mode
> +	 * But do not use nmi_enter() for pseries hash guest taking a real-mode
>   	 * NMI because not everything it touches is within the RMA limit.
>   	 */
> -	if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64) ||
> -			!firmware_has_feature(FW_FEATURE_LPAR) ||
> -			radix_enabled() || (mfmsr() & MSR_DR))
> -		nmi_enter();
> +	if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) &&
> +	    firmware_has_feature(FW_FEATURE_LPAR) &&
> +	    !radix_enabled())
> +		return;
> +
> +	/*
> +	 * Likewise, don't use it if we have some form of instrumentation (like
> +	 * KASAN shadow) that is not safe to access in real mode (even on radix)
> +	 */
> +	if (IS_ENABLED(CONFIG_KASAN))
> +		return;
> +
> +	/* Otherwise, it should be safe to call it */
> +	nmi_enter();
>   }
>   
>   static inline void interrupt_nmi_exit_prepare(struct pt_regs *regs, struct interrupt_nmi_state *state)
>   {
> -	if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64) ||
> -			!firmware_has_feature(FW_FEATURE_LPAR) ||
> -			radix_enabled() || (mfmsr() & MSR_DR))
> +	if (mfmsr() & MSR_DR) {
> +		// nmi_exit if relocations are on
>   		nmi_exit();
> +	} else if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) &&
> +		   firmware_has_feature(FW_FEATURE_LPAR) &&
> +		   !radix_enabled()) {
> +		// no nmi_exit for a pseries hash guest taking a real mode exception
> +	} else if (IS_ENABLED(CONFIG_KASAN)) {
> +		// no nmi_exit for KASAN in real mode
> +	} else {
> +		nmi_exit();
> +	}
>   

I think all this changeset around MSR_DR versus NMI would be worth a 
separate patch with a dedicated explanation.


>   	/*
>   	 * nmi does not call nap_adjust_return because nmi should not create
> @@ -410,7 +434,8 @@ static inline void interrupt_nmi_exit_prepare(struct pt_regs *regs, struct inter
>    * Specific handlers may have additional restrictions.
>    */
>   #define DEFINE_INTERRUPT_HANDLER_RAW(func)				\
> -static __always_inline long ____##func(struct pt_regs *regs);		\
> +static __always_inline __no_sanitize_address __no_kcsan long		\

Why is this required for PPC64 while it wasn't for PPC32 ?

Any side effect on PPC32 ?


> +____##func(struct pt_regs *regs);					\
>   									\
>   interrupt_handler long func(struct pt_regs *regs)			\
>   {									\
> @@ -424,7 +449,8 @@ interrupt_handler long func(struct pt_regs *regs)			\
>   }									\
>   NOKPROBE_SYMBOL(func);							\
>   									\
> -static __always_inline long ____##func(struct pt_regs *regs)
> +static __always_inline __no_sanitize_address __no_kcsan long		\
> +____##func(struct pt_regs *regs)
>   
>   /**
>    * DECLARE_INTERRUPT_HANDLER - Declare synchronous interrupt handler function
> @@ -549,7 +575,8 @@ static __always_inline void ____##func(struct pt_regs *regs)
>    * body with a pair of curly brackets.
>    */
>   #define DEFINE_INTERRUPT_HANDLER_NMI(func)				\
> -static __always_inline long ____##func(struct pt_regs *regs);		\
> +static __always_inline __no_sanitize_address __no_kcsan long		\
> +____##func(struct pt_regs *regs);					\
>   									\
>   interrupt_handler long func(struct pt_regs *regs)			\
>   {									\
> @@ -566,7 +593,8 @@ interrupt_handler long func(struct pt_regs *regs)			\
>   }									\
>   NOKPROBE_SYMBOL(func);							\
>   									\
> -static __always_inline long ____##func(struct pt_regs *regs)
> +static __always_inline  __no_sanitize_address __no_kcsan long		\
> +____##func(struct pt_regs *regs)
>   
>   
>   /* Interrupt handlers */
> diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
> index 3c478e5ef24c..6efc822e70fd 100644
> --- a/arch/powerpc/include/asm/kasan.h
> +++ b/arch/powerpc/include/asm/kasan.h
> @@ -30,9 +30,31 @@
>   
>   #define KASAN_SHADOW_OFFSET	ASM_CONST(CONFIG_KASAN_SHADOW_OFFSET)
>   
> +#ifdef CONFIG_PPC32
>   #define KASAN_SHADOW_END	(-(-KASAN_SHADOW_START >> KASAN_SHADOW_SCALE_SHIFT))
> +#endif
>   
>   #ifdef CONFIG_KASAN
> +#ifdef CONFIG_PPC_BOOK3S_64
> +/*
> + * The shadow ends before the highest accessible address
> + * because we don't need a shadow for the shadow. Instead:
> + * c00e000000000000 << 3 + a80e000000000000 = c00fc00000000000
> + */
> +#define KASAN_SHADOW_END 0xc00fc00000000000UL

Would be easer to read if we could group that with PPC32 in a 
#ifdef/#else/#endif

> +
> +DECLARE_STATIC_KEY_FALSE(powerpc_kasan_enabled_key);
> +
> +static __always_inline bool kasan_arch_is_ready(void)
> +{
> +	if (static_branch_likely(&powerpc_kasan_enabled_key))
> +		return true;
> +	return false;
> +}
> +
> +#define kasan_arch_is_ready kasan_arch_is_ready
> +#endif
> +
>   void kasan_early_init(void);
>   void kasan_mmu_init(void);
>   void kasan_init(void);
> diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h
> index 42f89e2d8f04..d4d547b4ed28 100644
> --- a/arch/powerpc/include/asm/ptrace.h
> +++ b/arch/powerpc/include/asm/ptrace.h
> @@ -305,12 +305,6 @@ static inline void regs_set_recoverable(struct pt_regs *regs)
>   		regs_set_return_msr(regs, regs->msr | MSR_RI);
>   }
>   
> -static inline void regs_set_unrecoverable(struct pt_regs *regs)
> -{
> -	if (cpu_has_msr_ri())
> -		regs_set_return_msr(regs, regs->msr & ~MSR_RI);
> -}
> -
>   #define arch_has_single_step()	(1)
>   #define arch_has_block_step()	(true)
>   #define ARCH_HAS_USER_SINGLE_STEP_REPORT
> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> index 4d7829399570..e46335da7f72 100644
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -33,6 +33,17 @@ KASAN_SANITIZE_early_32.o := n
>   KASAN_SANITIZE_cputable.o := n
>   KASAN_SANITIZE_prom_init.o := n
>   KASAN_SANITIZE_btext.o := n
> +KASAN_SANITIZE_paca.o := n
> +KASAN_SANITIZE_setup_64.o := n
> +KASAN_SANITIZE_mce.o := n
> +KASAN_SANITIZE_mce_power.o := n
> +
> +# we have to be particularly careful in ppc64 to exclude code that
> +# runs with translations off, as we cannot access the shadow with
> +# translations off. However, ppc32 can sanitize this.
> +ifdef CONFIG_PPC64
> +KASAN_SANITIZE_traps.o := n
> +endif
>   
>   ifdef CONFIG_KASAN
>   CFLAGS_early_32.o += -DDISABLE_BRANCH_PROFILING
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index 984813a4d5dc..c962f9bfc7d4 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -2220,8 +2220,8 @@ void __no_sanitize_address show_stack(struct task_struct *tsk,
>   			break;
>   
>   		stack = (unsigned long *) sp;
> -		newsp = stack[0];
> -		ip = stack[STACK_FRAME_LR_SAVE];
> +		newsp = READ_ONCE_NOCHECK(stack[0]);
> +		ip = READ_ONCE_NOCHECK(stack[STACK_FRAME_LR_SAVE]);
>   		if (!firstframe || ip != lr) {
>   			printk("%s["REG"] ["REG"] %pS",
>   				loglvl, sp, ip, (void *)ip);
> @@ -2239,17 +2239,19 @@ void __no_sanitize_address show_stack(struct task_struct *tsk,
>   		 * See if this is an exception frame.
>   		 * We look for the "regshere" marker in the current frame.
>   		 */
> -		if (validate_sp(sp, tsk, STACK_FRAME_WITH_PT_REGS)
> -		    && stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER) {
> +		if (validate_sp(sp, tsk, STACK_FRAME_WITH_PT_REGS) &&
> +		    (READ_ONCE_NOCHECK(stack[STACK_FRAME_MARKER]) ==
> +		     STACK_FRAME_REGS_MARKER)) {
>   			struct pt_regs *regs = (struct pt_regs *)
>   				(sp + STACK_FRAME_OVERHEAD);
>   
> -			lr = regs->link;
> +			lr = READ_ONCE_NOCHECK(regs->link);
>   			printk("%s--- interrupt: %lx at %pS\n",
> -			       loglvl, regs->trap, (void *)regs->nip);
> +			       loglvl, READ_ONCE_NOCHECK(regs->trap),
> +			       (void *)READ_ONCE_NOCHECK(regs->nip));
>   			__show_regs(regs);
>   			printk("%s--- interrupt: %lx\n",
> -			       loglvl, regs->trap);
> +			       loglvl, READ_ONCE_NOCHECK(regs->trap));
>   
>   			firstframe = 1;
>   		}

This change we submitted as a standalone patch and was rejected as is, 
see 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210528074806.1311297-1-dja@axtens.net/

v2 was accepted, see 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210614120907.1952321-1-dja@axtens.net/

Therefore AFAIU this change is redundant with the accepted v2 and should 
be dropped.


> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index b7fd6a72aa76..5fedadc27952 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -413,32 +413,32 @@ static struct cpumask nmi_ipi_pending_mask;
>   static bool nmi_ipi_busy = false;
>   static void (*nmi_ipi_function)(struct pt_regs *) = NULL;
>   
> -static void nmi_ipi_lock_start(unsigned long *flags)
> +noinstr static void nmi_ipi_lock_start(unsigned long *flags)
>   {
>   	raw_local_irq_save(*flags);
>   	hard_irq_disable();
> -	while (atomic_cmpxchg(&__nmi_ipi_lock, 0, 1) == 1) {
> +	while (arch_atomic_cmpxchg(&__nmi_ipi_lock, 0, 1) == 1) {
>   		raw_local_irq_restore(*flags);
> -		spin_until_cond(atomic_read(&__nmi_ipi_lock) == 0);
> +		spin_until_cond(arch_atomic_read(&__nmi_ipi_lock) == 0);
>   		raw_local_irq_save(*flags);
>   		hard_irq_disable();
>   	}
>   }
>   
> -static void nmi_ipi_lock(void)
> +noinstr static void nmi_ipi_lock(void)
>   {
> -	while (atomic_cmpxchg(&__nmi_ipi_lock, 0, 1) == 1)
> -		spin_until_cond(atomic_read(&__nmi_ipi_lock) == 0);
> +	while (arch_atomic_cmpxchg(&__nmi_ipi_lock, 0, 1) == 1)
> +		spin_until_cond(arch_atomic_read(&__nmi_ipi_lock) == 0);
>   }
>   
> -static void nmi_ipi_unlock(void)
> +noinstr static void nmi_ipi_unlock(void)
>   {
>   	smp_mb();
> -	WARN_ON(atomic_read(&__nmi_ipi_lock) != 1);
> -	atomic_set(&__nmi_ipi_lock, 0);
> +	WARN_ON(arch_atomic_read(&__nmi_ipi_lock) != 1);
> +	arch_atomic_set(&__nmi_ipi_lock, 0);
>   }
>   
> -static void nmi_ipi_unlock_end(unsigned long *flags)
> +noinstr static void nmi_ipi_unlock_end(unsigned long *flags)
>   {
>   	nmi_ipi_unlock();
>   	raw_local_irq_restore(*flags);
> @@ -447,7 +447,7 @@ static void nmi_ipi_unlock_end(unsigned long *flags)
>   /*
>    * Platform NMI handler calls this to ack
>    */
> -int smp_handle_nmi_ipi(struct pt_regs *regs)
> +noinstr int smp_handle_nmi_ipi(struct pt_regs *regs)
>   {
>   	void (*fn)(struct pt_regs *) = NULL;
>   	unsigned long flags;

Can you explain those changes from atomic_xxx to arch_atomic_xxx ?

Should it be a separate patch ?


> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index a08bb7cefdc5..3aaa50e5c72f 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -393,7 +393,7 @@ void _exception(int signr, struct pt_regs *regs, int code, unsigned long addr)
>    * Builds that do not support KVM could take this second option to increase
>    * the recoverability of NMIs.
>    */
> -void hv_nmi_check_nonrecoverable(struct pt_regs *regs)
> +noinstr void hv_nmi_check_nonrecoverable(struct pt_regs *regs)
>   {
>   #ifdef CONFIG_PPC_POWERNV
>   	unsigned long kbase = (unsigned long)_stext;
> @@ -433,7 +433,9 @@ void hv_nmi_check_nonrecoverable(struct pt_regs *regs)
>   	return;
>   
>   nonrecoverable:
> -	regs_set_unrecoverable(regs);
> +	regs->msr &= ~MSR_RI;
> +	local_paca->hsrr_valid = 0;
> +	local_paca->srr_valid = 0;
>   #endif
>   }
>   DEFINE_INTERRUPT_HANDLER_NMI(system_reset_exception)
> diff --git a/arch/powerpc/kexec/Makefile b/arch/powerpc/kexec/Makefile
> index b6c52608cb49..0c2abe7f9908 100644
> --- a/arch/powerpc/kexec/Makefile
> +++ b/arch/powerpc/kexec/Makefile
> @@ -13,3 +13,5 @@ obj-$(CONFIG_KEXEC_FILE)	+= file_load.o ranges.o file_load_$(BITS).o elf_$(BITS)
>   GCOV_PROFILE_core_$(BITS).o := n
>   KCOV_INSTRUMENT_core_$(BITS).o := n
>   UBSAN_SANITIZE_core_$(BITS).o := n
> +KASAN_SANITIZE_core.o := n
> +KASAN_SANITIZE_core_$(BITS) := n
> diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
> index 9bdfc8b50899..ed0569b2a3d4 100644
> --- a/arch/powerpc/kvm/Makefile
> +++ b/arch/powerpc/kvm/Makefile
> @@ -134,3 +134,8 @@ obj-$(CONFIG_KVM_BOOK3S_64_PR) += kvm-pr.o
>   obj-$(CONFIG_KVM_BOOK3S_64_HV) += kvm-hv.o
>   
>   obj-y += $(kvm-book3s_64-builtin-objs-y)
> +
> +# KVM does a lot in real-mode, and 64-bit Book3S KASAN doesn't support that
> +ifdef CONFIG_PPC_BOOK3S_64
> +KASAN_SANITIZE := n
> +endif
> diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
> index 5d1881d2e39a..8560c912186d 100644
> --- a/arch/powerpc/lib/Makefile
> +++ b/arch/powerpc/lib/Makefile
> @@ -13,6 +13,9 @@ CFLAGS_REMOVE_feature-fixups.o = $(CC_FLAGS_FTRACE)
>   
>   KASAN_SANITIZE_code-patching.o := n
>   KASAN_SANITIZE_feature-fixups.o := n
> +# restart_table.o contains functions called in the NMI interrupt path
> +# which can be in real mode. Disable KASAN.
> +KASAN_SANITIZE_restart_table.o := n
>   
>   ifdef CONFIG_KASAN
>   CFLAGS_code-patching.o += -DDISABLE_BRANCH_PROFILING
> diff --git a/arch/powerpc/mm/book3s64/Makefile b/arch/powerpc/mm/book3s64/Makefile
> index 2d50cac499c5..3150159e5121 100644
> --- a/arch/powerpc/mm/book3s64/Makefile
> +++ b/arch/powerpc/mm/book3s64/Makefile
> @@ -24,3 +24,12 @@ obj-$(CONFIG_PPC_PKEY)	+= pkeys.o
>   
>   # Instrumenting the SLB fault path can lead to duplicate SLB entries
>   KCOV_INSTRUMENT_slb.o := n
> +
> +# Parts of these can run in real mode and therefore are
> +# not safe with the current outline KASAN implementation

But are we not safe about KASAN being disabled when in real mode ? Isn't 
it the purpose of kasan_arch_is_ready ?

> +KASAN_SANITIZE_mmu_context.o := n
> +KASAN_SANITIZE_pgtable.o := n
> +KASAN_SANITIZE_radix_pgtable.o := n
> +KASAN_SANITIZE_radix_tlb.o := n
> +KASAN_SANITIZE_slb.o := n
> +KASAN_SANITIZE_pkeys.o := n
> diff --git a/arch/powerpc/mm/kasan/Makefile b/arch/powerpc/mm/kasan/Makefile
> index bcbfd6f2eca3..4999aadb1867 100644
> --- a/arch/powerpc/mm/kasan/Makefile
> +++ b/arch/powerpc/mm/kasan/Makefile
> @@ -5,3 +5,4 @@ KASAN_SANITIZE := n
>   obj-$(CONFIG_PPC32)		+= init_32.o
>   obj-$(CONFIG_PPC_8xx)		+= 8xx.o
>   obj-$(CONFIG_PPC_BOOK3S_32)	+= book3s_32.o
> +obj-$(CONFIG_PPC_BOOK3S_64)	+= init_book3s_64.o
> diff --git a/arch/powerpc/mm/kasan/init_book3s_64.c b/arch/powerpc/mm/kasan/init_book3s_64.c
> new file mode 100644
> index 000000000000..cb96e16e57a1
> --- /dev/null
> +++ b/arch/powerpc/mm/kasan/init_book3s_64.c
> @@ -0,0 +1,103 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * KASAN for 64-bit Book3S powerpc
> + *
> + * Copyright (C) 2019-2020 IBM Corporation
> + * Author: Daniel Axtens <dja@axtens.net>
> + */
> +
> +/*
> +# ppc64 turns on virtual memory late in boot, after calling into generic code
> +# like the device-tree parser, so it uses this in conjunction with a hook in
> +# outline mode to avoid invalid access early in boot.
> +*/
> +
> +#define DISABLE_BRANCH_PROFILING
> +
> +#include <linux/kasan.h>
> +#include <linux/printk.h>
> +#include <linux/sched/task.h>
> +#include <linux/memblock.h>
> +#include <asm/pgalloc.h>
> +
> +DEFINE_STATIC_KEY_FALSE(powerpc_kasan_enabled_key);
> +
> +static void __init kasan_init_phys_region(void *start, void *end)
> +{
> +	unsigned long k_start, k_end, k_cur;
> +	void *va;
> +
> +	if (start >= end)
> +		return;
> +
> +	k_start = ALIGN_DOWN((unsigned long)kasan_mem_to_shadow(start), PAGE_SIZE);
> +	k_end = ALIGN((unsigned long)kasan_mem_to_shadow(end), PAGE_SIZE);
> +
> +	va = memblock_alloc(k_end - k_start, PAGE_SIZE);
> +	for (k_cur = k_start; k_cur < k_end; k_cur += PAGE_SIZE, va += PAGE_SIZE)
> +		map_kernel_page(k_cur, __pa(va), PAGE_KERNEL);
> +}
> +
> +void __init kasan_init(void)
> +{
> +	/*
> +	 * We want to do the following things:
> +	 *  1) Map real memory into the shadow for all physical memblocks
> +	 *     This takes us from c000... to c008...
> +	 *  2) Leave a hole over the shadow of vmalloc space. KASAN_VMALLOC
> +	 *     will manage this for us.
> +	 *     This takes us from c008... to c00a...
> +	 *  3) Map the 'early shadow'/zero page over iomap and vmemmap space.
> +	 *     This takes us up to where we start at c00e...
> +	 */
> +
> +	void *k_start = kasan_mem_to_shadow((void *)RADIX_VMALLOC_END);
> +	void *k_end = kasan_mem_to_shadow((void *)RADIX_VMEMMAP_END);
> +	phys_addr_t start, end;
> +	u64 i;
> +	pte_t zero_pte = pfn_pte(virt_to_pfn(kasan_early_shadow_page), PAGE_KERNEL);
> +
> +	if (!early_radix_enabled()) {
> +		pr_warn("KASAN not enabled as it requires radix!");
> +		return;
> +	}
> +
> +	for_each_mem_range(i, &start, &end)
> +		kasan_init_phys_region((void *)start, (void *)end);
> +
> +	for (i = 0; i < PTRS_PER_PTE; i++)
> +		__set_pte_at(&init_mm, (unsigned long)kasan_early_shadow_page,
> +			     &kasan_early_shadow_pte[i], zero_pte, 0);
> +
> +	for (i = 0; i < PTRS_PER_PMD; i++)
> +		pmd_populate_kernel(&init_mm, &kasan_early_shadow_pmd[i],
> +				    kasan_early_shadow_pte);
> +
> +	for (i = 0; i < PTRS_PER_PUD; i++)
> +		pud_populate(&init_mm, &kasan_early_shadow_pud[i],
> +			     kasan_early_shadow_pmd);
> +
> +	/* map the early shadow over the iomap and vmemmap space */
> +	kasan_populate_early_shadow(k_start, k_end);
> +
> +	/* mark early shadow region as RO and wipe it */
> +	zero_pte = pfn_pte(virt_to_pfn(kasan_early_shadow_page), PAGE_KERNEL_RO);
> +	for (i = 0; i < PTRS_PER_PTE; i++)
> +		__set_pte_at(&init_mm, (unsigned long)kasan_early_shadow_page,
> +			     &kasan_early_shadow_pte[i], zero_pte, 0);
> +
> +	/*
> +	 * clear_page relies on some cache info that hasn't been set up yet.
> +	 * It ends up looping ~forever and blows up other data.
> +	 * Use memset instead.
> +	 */
> +	memset(kasan_early_shadow_page, 0, PAGE_SIZE);
> +
> +	static_branch_inc(&powerpc_kasan_enabled_key);
> +
> +	/* Enable error messages */
> +	init_task.kasan_depth = 0;
> +	pr_info("KASAN init done (64-bit Book3S)\n");

Maybe mention RADIX in the printed message ?

> +}
> +
> +void __init kasan_late_init(void) { }
> diff --git a/arch/powerpc/mm/ptdump/ptdump.c b/arch/powerpc/mm/ptdump/ptdump.c
> index 8c846982766f..2313053fe679 100644
> --- a/arch/powerpc/mm/ptdump/ptdump.c
> +++ b/arch/powerpc/mm/ptdump/ptdump.c
> @@ -21,6 +21,7 @@
>   #include <linux/seq_file.h>
>   #include <asm/fixmap.h>
>   #include <linux/const.h>
> +#include <linux/kasan.h>
>   #include <asm/page.h>
>   #include <asm/hugetlb.h>
>   
> @@ -289,11 +290,11 @@ static void populate_markers(void)
>   #endif
>   	address_markers[i++].start_address = FIXADDR_START;
>   	address_markers[i++].start_address = FIXADDR_TOP;
> +#endif /* CONFIG_PPC64 */
>   #ifdef CONFIG_KASAN
>   	address_markers[i++].start_address = KASAN_SHADOW_START;
>   	address_markers[i++].start_address = KASAN_SHADOW_END;
>   #endif
> -#endif /* CONFIG_PPC64 */
>   }
>   
>   static int ptdump_show(struct seq_file *m, void *v)
> diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
> index 87bc1929ee5a..b6562d39b9c4 100644
> --- a/arch/powerpc/platforms/Kconfig.cputype
> +++ b/arch/powerpc/platforms/Kconfig.cputype
> @@ -104,6 +104,7 @@ config PPC_BOOK3S_64
>   	select HAVE_MOVE_PUD
>   	select IRQ_WORK
>   	select PPC_64S_HASH_MMU if !PPC_RADIX_MMU
> +	select KASAN_VMALLOC if KASAN
>   
>   config PPC_BOOK3E_64
>   	bool "Embedded processors"
> diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
> index dc7b37c23b60..6488b3842199 100644
> --- a/arch/powerpc/platforms/powernv/Makefile
> +++ b/arch/powerpc/platforms/powernv/Makefile
> @@ -1,4 +1,12 @@
>   # SPDX-License-Identifier: GPL-2.0
> +
> +# nothing that deals with real mode is safe to KASAN
> +# in particular, idle code runs a bunch of things in real mode
> +KASAN_SANITIZE_idle.o := n
> +KASAN_SANITIZE_pci-ioda.o := n
> +# pnv_machine_check_early
> +KASAN_SANITIZE_setup.o := n
> +
>   obj-y			+= setup.o opal-call.o opal-wrappers.o opal.o opal-async.o
>   obj-y			+= idle.o opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
>   obj-y			+= rng.o opal-elog.o opal-dump.o opal-sysparam.o opal-sensor.o
> diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c
> index cbb67813cd5d..9e1a25398f98 100644
> --- a/arch/powerpc/platforms/powernv/smp.c
> +++ b/arch/powerpc/platforms/powernv/smp.c
> @@ -345,7 +345,7 @@ static void __init pnv_smp_probe(void)
>   	}
>   }
>   
> -static int pnv_system_reset_exception(struct pt_regs *regs)
> +noinstr static int pnv_system_reset_exception(struct pt_regs *regs)
>   {
>   	if (smp_handle_nmi_ipi(regs))
>   		return 1;
> diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile
> index ee60b59024b4..98e878c32a21 100644
> --- a/arch/powerpc/platforms/pseries/Makefile
> +++ b/arch/powerpc/platforms/pseries/Makefile
> @@ -32,3 +32,9 @@ obj-$(CONFIG_SUSPEND)		+= suspend.o
>   obj-$(CONFIG_PPC_VAS)		+= vas.o
>   
>   obj-$(CONFIG_ARCH_HAS_CC_PLATFORM)	+= cc_platform.o
> +
> +# nothing that operates in real mode is safe for KASAN
> +KASAN_SANITIZE_ras.o := n
> +KASAN_SANITIZE_kexec.o := n
> +#machine_kexec
> +KASAN_SANITIZE_setup.o := n
> diff --git a/arch/powerpc/sysdev/xics/xics-common.c b/arch/powerpc/sysdev/xics/xics-common.c
> index f3fb2a12124c..322b2b8bd467 100644
> --- a/arch/powerpc/sysdev/xics/xics-common.c
> +++ b/arch/powerpc/sysdev/xics/xics-common.c
> @@ -146,7 +146,7 @@ void __init xics_smp_probe(void)
>   
>   #endif /* CONFIG_SMP */
>   
> -void xics_teardown_cpu(void)
> +noinstr void xics_teardown_cpu(void)
>   {
>   	struct xics_cppr *os_cppr = this_cpu_ptr(&xics_cppr);
>   
> @@ -159,7 +159,7 @@ void xics_teardown_cpu(void)
>   	icp_ops->teardown_cpu();
>   }
>   
> -void xics_kexec_teardown_cpu(int secondary)
> +noinstr void xics_kexec_teardown_cpu(int secondary)
>   {
>   	xics_teardown_cpu();
>   
> diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
> index 1ca5564bda9d..87b825b7401d 100644
> --- a/arch/powerpc/sysdev/xive/common.c
> +++ b/arch/powerpc/sysdev/xive/common.c
> @@ -1241,7 +1241,7 @@ static int xive_setup_cpu_ipi(unsigned int cpu)
>   	return 0;
>   }
>   
> -static void xive_cleanup_cpu_ipi(unsigned int cpu, struct xive_cpu *xc)
> +noinstr static void xive_cleanup_cpu_ipi(unsigned int cpu, struct xive_cpu *xc)
>   {
>   	unsigned int xive_ipi_irq = xive_ipi_cpu_to_irq(cpu);
>   
> @@ -1634,7 +1634,7 @@ void xive_flush_interrupt(void)
>   
>   #endif /* CONFIG_SMP */
>   
> -void xive_teardown_cpu(void)
> +noinstr void xive_teardown_cpu(void)
>   {
>   	struct xive_cpu *xc = __this_cpu_read(xive_cpu);
>   	unsigned int cpu = smp_processor_id();
Paul Mackerras May 17, 2022, 7:31 a.m. UTC | #2
On Sun, May 15, 2022 at 07:33:52AM +0000, Christophe Leroy wrote:
> 
> 
> Le 11/05/2022 à 09:28, Paul Mackerras a écrit :
> > From: Daniel Axtens <dja@axtens.net>
> > 
> > Implement a limited form of KASAN for Book3S 64-bit machines running under
> > the Radix MMU, supporting only outline mode.

[snip]

> > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > index b779603978e1..0bf76e40c9c2 100644
> > --- a/arch/powerpc/Kconfig
> > +++ b/arch/powerpc/Kconfig
> > @@ -105,6 +105,7 @@ config PPC
> >   	# Please keep this list sorted alphabetically.
> >   	#
> >   	select ARCH_32BIT_OFF_T if PPC32
> > +	select ARCH_DISABLE_KASAN_INLINE	if PPC_RADIX_MMU
> >   	select ARCH_ENABLE_MEMORY_HOTPLUG
> >   	select ARCH_ENABLE_MEMORY_HOTREMOVE
> >   	select ARCH_HAS_COPY_MC			if PPC64
> > @@ -152,6 +153,7 @@ config PPC
> >   	select ARCH_WANT_IPC_PARSE_VERSION
> >   	select ARCH_WANT_IRQS_OFF_ACTIVATE_MM
> >   	select ARCH_WANT_LD_ORPHAN_WARN
> > +	select ARCH_WANTS_NO_INSTR
> 
> Can you explain why we need that ?

The help text for the option says:

     An architecture should select this if the noinstr macro is being used on
     functions to denote that the toolchain should avoid instrumenting such
     functions and is required for correctness.

All it really seems to do is restrict the conditions under which the
GCOV and KCOV options can be enabled.

> Is it tied to KASAN ? Is yes, why didn't we have it for PPC32 until now ?

Since we really do need to avoid instrumenting certain functions on
ppc64 (as in things will break if we do instrument them), I think we
need to select ARCH_WANTS_NO_INSTR.

For ppc32, as far as I recall there is much less code that runs in
real mode and it is mostly assembler (except for some boot code),
mostly because all addresses have to be explicitly translated to
physical addresses for 32-bit real-mode code, unlike ppc64 where we
can use access memory in the linear mapping using virtual addresses
because of the fact that the CPU ignores the top 4 bits of the
effective address in real mode.  That said, there is a lot less code
that runs in real mode on ppc64 than there used to be.

> Maybe that's independant of KASAN and would be worth a separate patch ?

Yes, possibly, though KASAN does appear to be the only user of noinstr
in arch/powerpc.

[snip]

> > diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
> > index d090d9612348..bafc9869afcd 100644
> > --- a/arch/powerpc/include/asm/book3s/64/radix.h
> > +++ b/arch/powerpc/include/asm/book3s/64/radix.h
> > @@ -35,6 +35,11 @@
> >   #define RADIX_PMD_SHIFT		(PAGE_SHIFT + RADIX_PTE_INDEX_SIZE)
> >   #define RADIX_PUD_SHIFT		(RADIX_PMD_SHIFT + RADIX_PMD_INDEX_SIZE)
> >   #define RADIX_PGD_SHIFT		(RADIX_PUD_SHIFT + RADIX_PUD_INDEX_SIZE)
> > +
> > +#define R_PTRS_PER_PTE		(1 << RADIX_PTE_INDEX_SIZE)
> > +#define R_PTRS_PER_PMD		(1 << RADIX_PMD_INDEX_SIZE)
> > +#define R_PTRS_PER_PUD		(1 << RADIX_PUD_INDEX_SIZE)
> > +
> >   /*
> >    * Size of EA range mapped by our pagetables.
> >    */
> > @@ -68,11 +73,11 @@
> >    *
> >    *
> >    * 3rd quadrant expanded:
> > - * +------------------------------+
> > + * +------------------------------+  Highest address (0xc010000000000000)
> > + * +------------------------------+  KASAN shadow end (0xc00fc00000000000)
> >    * |                              |
> >    * |                              |
> > - * |                              |
> > - * +------------------------------+  Kernel vmemmap end (0xc010000000000000)
> > + * +------------------------------+  Kernel vmemmap end/shadow start (0xc00e000000000000)
> >    * |                              |
> >    * |           512TB		  |
> >    * |                              |
> > @@ -126,6 +131,8 @@
> >   #define RADIX_VMEMMAP_SIZE	RADIX_KERN_MAP_SIZE
> >   #define RADIX_VMEMMAP_END	(RADIX_VMEMMAP_START + RADIX_VMEMMAP_SIZE)
> >   
> > +/* For the sizes of the shadow area, see kasan.h */
> > +
> 
> Why does this comment pops up here ?

Do you mean, why is it there at all, or do you mean why is it in that
particular place rather than closer to the "3rd quadrant expanded"
diagram or something?

> >   #ifndef __ASSEMBLY__
> >   #define RADIX_PTE_TABLE_SIZE	(sizeof(pte_t) << RADIX_PTE_INDEX_SIZE)
> >   #define RADIX_PMD_TABLE_SIZE	(sizeof(pmd_t) << RADIX_PMD_INDEX_SIZE)
> > diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h
> > index fc28f46d2f9d..fb244b6ca7f0 100644
> > --- a/arch/powerpc/include/asm/interrupt.h
> > +++ b/arch/powerpc/include/asm/interrupt.h
> > @@ -327,22 +327,46 @@ static inline void interrupt_nmi_enter_prepare(struct pt_regs *regs, struct inte
> >   	}
> >   #endif
> >   
> > +	/* If data relocations are enabled, it's safe to use nmi_enter() */
> > +	if (mfmsr() & MSR_DR) {
> > +		nmi_enter();
> > +		return;
> > +	}
> > +
> >   	/*
> > -	 * Do not use nmi_enter() for pseries hash guest taking a real-mode
> > +	 * But do not use nmi_enter() for pseries hash guest taking a real-mode
> >   	 * NMI because not everything it touches is within the RMA limit.
> >   	 */
> > +	if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) &&
> > +	    firmware_has_feature(FW_FEATURE_LPAR) &&
> > +	    !radix_enabled())
> > +		return;
> > +
> > +	/*
> > +	 * Likewise, don't use it if we have some form of instrumentation (like
> > +	 * KASAN shadow) that is not safe to access in real mode (even on radix)
> > +	 */
> > +	if (IS_ENABLED(CONFIG_KASAN))
> > +		return;
> > +
> > +	/* Otherwise, it should be safe to call it */
> > +	nmi_enter();
> >   }
> >   
> >   static inline void interrupt_nmi_exit_prepare(struct pt_regs *regs, struct interrupt_nmi_state *state)
> >   {
> > -	if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64) ||
> > -			!firmware_has_feature(FW_FEATURE_LPAR) ||
> > -			radix_enabled() || (mfmsr() & MSR_DR))
> > +	if (mfmsr() & MSR_DR) {
> > +		// nmi_exit if relocations are on
> >   		nmi_exit();
> > +	} else if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) &&
> > +		   firmware_has_feature(FW_FEATURE_LPAR) &&
> > +		   !radix_enabled()) {
> > +		// no nmi_exit for a pseries hash guest taking a real mode exception
> > +	} else if (IS_ENABLED(CONFIG_KASAN)) {
> > +		// no nmi_exit for KASAN in real mode
> > +	} else {
> > +		nmi_exit();
> > +	}
> >   
> 
> I think all this changeset around MSR_DR versus NMI would be worth a 
> separate patch with a dedicated explanation.

It's actually not as much of a change as it might appear.  All that is
really being done is to add !IS_ENABLED(CONFIG_KASAN) to the
conditions for calling nmi_enter() and nmi_exit() when the CPU is in
real mode.  In other words, the condition is equivalent to:

	if (((!IS_ENABLED(CONFIG_PPC_BOOK3S_64) ||
			!firmware_has_feature(FW_FEATURE_LPAR) ||
			radix_enabled()) && !IS_ENABLED(CONFIG_KASAN)) ||
			(mfmsr() & MSR_DR))
		nmi_enter();

I think the version as Daniel has transformed it is more readable
though.  It could be a separate patch, sure.

> >   	/*
> >   	 * nmi does not call nap_adjust_return because nmi should not create
> > @@ -410,7 +434,8 @@ static inline void interrupt_nmi_exit_prepare(struct pt_regs *regs, struct inter
> >    * Specific handlers may have additional restrictions.
> >    */
> >   #define DEFINE_INTERRUPT_HANDLER_RAW(func)				\
> > -static __always_inline long ____##func(struct pt_regs *regs);		\
> > +static __always_inline __no_sanitize_address __no_kcsan long		\
> 
> Why is this required for PPC64 while it wasn't for PPC32 ?

I assume because we switch into virtual mode sooner on ppc32, because
of the need to transform addresses explicitly in real mode.  I haven't
looked at the ppc32 interrupt code for a long time (and even the ppc64
interrupt code is now quite different from what I recall).

> Any side effect on PPC32 ?

I suppose you won't get KASAN coverage of those interrupt handling
functions, where you did previously.  Do you want me to make the
"__no_sanitize_address __no_kcsan" apply only on ppc64?

[snip]

> > diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
> > index 3c478e5ef24c..6efc822e70fd 100644
> > --- a/arch/powerpc/include/asm/kasan.h
> > +++ b/arch/powerpc/include/asm/kasan.h
> > @@ -30,9 +30,31 @@
> >   
> >   #define KASAN_SHADOW_OFFSET	ASM_CONST(CONFIG_KASAN_SHADOW_OFFSET)
> >   
> > +#ifdef CONFIG_PPC32
> >   #define KASAN_SHADOW_END	(-(-KASAN_SHADOW_START >> KASAN_SHADOW_SCALE_SHIFT))
> > +#endif
> >   
> >   #ifdef CONFIG_KASAN
> > +#ifdef CONFIG_PPC_BOOK3S_64
> > +/*
> > + * The shadow ends before the highest accessible address
> > + * because we don't need a shadow for the shadow. Instead:
> > + * c00e000000000000 << 3 + a80e000000000000 = c00fc00000000000
> > + */
> > +#define KASAN_SHADOW_END 0xc00fc00000000000UL
> 
> Would be easer to read if we could group that with PPC32 in a 
> #ifdef/#else/#endif

Sure.  It's a bit weird that KASAN_SHADOW_END is always defined for
ppc32, but only if CONFIG_KASAN=y for ppc64.  I'll fix it.

[snip]

> > diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> > index 4d7829399570..e46335da7f72 100644
> > --- a/arch/powerpc/kernel/Makefile
> > +++ b/arch/powerpc/kernel/Makefile
> > @@ -33,6 +33,17 @@ KASAN_SANITIZE_early_32.o := n
> >   KASAN_SANITIZE_cputable.o := n
> >   KASAN_SANITIZE_prom_init.o := n
> >   KASAN_SANITIZE_btext.o := n
> > +KASAN_SANITIZE_paca.o := n
> > +KASAN_SANITIZE_setup_64.o := n
> > +KASAN_SANITIZE_mce.o := n
> > +KASAN_SANITIZE_mce_power.o := n
> > +
> > +# we have to be particularly careful in ppc64 to exclude code that
> > +# runs with translations off, as we cannot access the shadow with
> > +# translations off. However, ppc32 can sanitize this.
> > +ifdef CONFIG_PPC64
> > +KASAN_SANITIZE_traps.o := n
> > +endif
> >   
> >   ifdef CONFIG_KASAN
> >   CFLAGS_early_32.o += -DDISABLE_BRANCH_PROFILING
> > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> > index 984813a4d5dc..c962f9bfc7d4 100644
> > --- a/arch/powerpc/kernel/process.c
> > +++ b/arch/powerpc/kernel/process.c
> > @@ -2220,8 +2220,8 @@ void __no_sanitize_address show_stack(struct task_struct *tsk,
> >   			break;
> >   
> >   		stack = (unsigned long *) sp;
> > -		newsp = stack[0];
> > -		ip = stack[STACK_FRAME_LR_SAVE];
> > +		newsp = READ_ONCE_NOCHECK(stack[0]);
> > +		ip = READ_ONCE_NOCHECK(stack[STACK_FRAME_LR_SAVE]);
> >   		if (!firstframe || ip != lr) {
> >   			printk("%s["REG"] ["REG"] %pS",
> >   				loglvl, sp, ip, (void *)ip);
> > @@ -2239,17 +2239,19 @@ void __no_sanitize_address show_stack(struct task_struct *tsk,
> >   		 * See if this is an exception frame.
> >   		 * We look for the "regshere" marker in the current frame.
> >   		 */
> > -		if (validate_sp(sp, tsk, STACK_FRAME_WITH_PT_REGS)
> > -		    && stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER) {
> > +		if (validate_sp(sp, tsk, STACK_FRAME_WITH_PT_REGS) &&
> > +		    (READ_ONCE_NOCHECK(stack[STACK_FRAME_MARKER]) ==
> > +		     STACK_FRAME_REGS_MARKER)) {
> >   			struct pt_regs *regs = (struct pt_regs *)
> >   				(sp + STACK_FRAME_OVERHEAD);
> >   
> > -			lr = regs->link;
> > +			lr = READ_ONCE_NOCHECK(regs->link);
> >   			printk("%s--- interrupt: %lx at %pS\n",
> > -			       loglvl, regs->trap, (void *)regs->nip);
> > +			       loglvl, READ_ONCE_NOCHECK(regs->trap),
> > +			       (void *)READ_ONCE_NOCHECK(regs->nip));
> >   			__show_regs(regs);
> >   			printk("%s--- interrupt: %lx\n",
> > -			       loglvl, regs->trap);
> > +			       loglvl, READ_ONCE_NOCHECK(regs->trap));
> >   
> >   			firstframe = 1;
> >   		}
> 
> This change we submitted as a standalone patch and was rejected as is, 
> see 
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210528074806.1311297-1-dja@axtens.net/
> 
> v2 was accepted, see 
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210614120907.1952321-1-dja@axtens.net/
> 
> Therefore AFAIU this change is redundant with the accepted v2 and should 
> be dropped.

OK, thanks for the history.  I'll drop this change.

> > diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> > index b7fd6a72aa76..5fedadc27952 100644
> > --- a/arch/powerpc/kernel/smp.c
> > +++ b/arch/powerpc/kernel/smp.c
> > @@ -413,32 +413,32 @@ static struct cpumask nmi_ipi_pending_mask;
> >   static bool nmi_ipi_busy = false;
> >   static void (*nmi_ipi_function)(struct pt_regs *) = NULL;
> >   
> > -static void nmi_ipi_lock_start(unsigned long *flags)
> > +noinstr static void nmi_ipi_lock_start(unsigned long *flags)
> >   {
> >   	raw_local_irq_save(*flags);
> >   	hard_irq_disable();
> > -	while (atomic_cmpxchg(&__nmi_ipi_lock, 0, 1) == 1) {
> > +	while (arch_atomic_cmpxchg(&__nmi_ipi_lock, 0, 1) == 1) {
> >   		raw_local_irq_restore(*flags);
> > -		spin_until_cond(atomic_read(&__nmi_ipi_lock) == 0);
> > +		spin_until_cond(arch_atomic_read(&__nmi_ipi_lock) == 0);
> >   		raw_local_irq_save(*flags);
> >   		hard_irq_disable();
> >   	}
> >   }
> >   
> > -static void nmi_ipi_lock(void)
> > +noinstr static void nmi_ipi_lock(void)
> >   {
> > -	while (atomic_cmpxchg(&__nmi_ipi_lock, 0, 1) == 1)
> > -		spin_until_cond(atomic_read(&__nmi_ipi_lock) == 0);
> > +	while (arch_atomic_cmpxchg(&__nmi_ipi_lock, 0, 1) == 1)
> > +		spin_until_cond(arch_atomic_read(&__nmi_ipi_lock) == 0);
> >   }
> >   
> > -static void nmi_ipi_unlock(void)
> > +noinstr static void nmi_ipi_unlock(void)
> >   {
> >   	smp_mb();
> > -	WARN_ON(atomic_read(&__nmi_ipi_lock) != 1);
> > -	atomic_set(&__nmi_ipi_lock, 0);
> > +	WARN_ON(arch_atomic_read(&__nmi_ipi_lock) != 1);
> > +	arch_atomic_set(&__nmi_ipi_lock, 0);
> >   }
> >   
> > -static void nmi_ipi_unlock_end(unsigned long *flags)
> > +noinstr static void nmi_ipi_unlock_end(unsigned long *flags)
> >   {
> >   	nmi_ipi_unlock();
> >   	raw_local_irq_restore(*flags);
> > @@ -447,7 +447,7 @@ static void nmi_ipi_unlock_end(unsigned long *flags)
> >   /*
> >    * Platform NMI handler calls this to ack
> >    */
> > -int smp_handle_nmi_ipi(struct pt_regs *regs)
> > +noinstr int smp_handle_nmi_ipi(struct pt_regs *regs)
> >   {
> >   	void (*fn)(struct pt_regs *) = NULL;
> >   	unsigned long flags;
> 
> Can you explain those changes from atomic_xxx to arch_atomic_xxx ?

With KASAN enabled, atomic_read() and similar functions become
wrappers which explicitly call KASAN checking functions.  See
include/linux/atomic/atomic-instrumented.h and
include/linux/instrumented.h.  Since we don't want those NMI functions
instrumented, we have to use the arch_atomic_* functions so as to
avoid getting the explicit instrumentation calls.  As to why we don't
want instrumentation here, it's because we want to avoid crashing when
doing kexec or more particularly kdump, I believe.

> Should it be a separate patch ?

Could be - matter of taste I guess.

[snip]

> > diff --git a/arch/powerpc/mm/book3s64/Makefile b/arch/powerpc/mm/book3s64/Makefile
> > index 2d50cac499c5..3150159e5121 100644
> > --- a/arch/powerpc/mm/book3s64/Makefile
> > +++ b/arch/powerpc/mm/book3s64/Makefile
> > @@ -24,3 +24,12 @@ obj-$(CONFIG_PPC_PKEY)	+= pkeys.o
> >   
> >   # Instrumenting the SLB fault path can lead to duplicate SLB entries
> >   KCOV_INSTRUMENT_slb.o := n
> > +
> > +# Parts of these can run in real mode and therefore are
> > +# not safe with the current outline KASAN implementation
> 
> But are we not safe about KASAN being disabled when in real mode ? Isn't 
> it the purpose of kasan_arch_is_ready ?

We have not added kasan_arch_is_ready() checks to all of the KASAN
entry points, only those that are called early in boot before KASAN
has been initialized, as I found out when I changed kasan_init() to
just return without doing anything if radix is not enabled.  On
POWER8, the kernel crashed shortly afterwards in generic KASAN code
that didn't have a kasan_arch_is_ready() call in it.

As to whether we should add such calls to every entry point in the
generic KASAN code -- it would probably be a good idea to do that,
though a bit fragile in the long term because people will add stuff to
the generic KASAN code without realizing there needs to be the guard,
since their architecture won't need it.

In the really long term, hopefully we can eventually drop HPT and get
to the point where practically nothing runs in real mode.  That's a
fair way off though.

> > +KASAN_SANITIZE_mmu_context.o := n
> > +KASAN_SANITIZE_pgtable.o := n
> > +KASAN_SANITIZE_radix_pgtable.o := n
> > +KASAN_SANITIZE_radix_tlb.o := n
> > +KASAN_SANITIZE_slb.o := n
> > +KASAN_SANITIZE_pkeys.o := n
> > diff --git a/arch/powerpc/mm/kasan/Makefile b/arch/powerpc/mm/kasan/Makefile
> > index bcbfd6f2eca3..4999aadb1867 100644
> > --- a/arch/powerpc/mm/kasan/Makefile
> > +++ b/arch/powerpc/mm/kasan/Makefile
> > @@ -5,3 +5,4 @@ KASAN_SANITIZE := n
> >   obj-$(CONFIG_PPC32)		+= init_32.o
> >   obj-$(CONFIG_PPC_8xx)		+= 8xx.o
> >   obj-$(CONFIG_PPC_BOOK3S_32)	+= book3s_32.o
> > +obj-$(CONFIG_PPC_BOOK3S_64)	+= init_book3s_64.o
> > diff --git a/arch/powerpc/mm/kasan/init_book3s_64.c b/arch/powerpc/mm/kasan/init_book3s_64.c
> > new file mode 100644
> > index 000000000000..cb96e16e57a1
> > --- /dev/null
> > +++ b/arch/powerpc/mm/kasan/init_book3s_64.c
> > @@ -0,0 +1,103 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * KASAN for 64-bit Book3S powerpc
> > + *
> > + * Copyright (C) 2019-2020 IBM Corporation
> > + * Author: Daniel Axtens <dja@axtens.net>
> > + */
> > +
> > +/*
> > +# ppc64 turns on virtual memory late in boot, after calling into generic code
> > +# like the device-tree parser, so it uses this in conjunction with a hook in
> > +# outline mode to avoid invalid access early in boot.
> > +*/
> > +
> > +#define DISABLE_BRANCH_PROFILING
> > +
> > +#include <linux/kasan.h>
> > +#include <linux/printk.h>
> > +#include <linux/sched/task.h>
> > +#include <linux/memblock.h>
> > +#include <asm/pgalloc.h>
> > +
> > +DEFINE_STATIC_KEY_FALSE(powerpc_kasan_enabled_key);
> > +
> > +static void __init kasan_init_phys_region(void *start, void *end)
> > +{
> > +	unsigned long k_start, k_end, k_cur;
> > +	void *va;
> > +
> > +	if (start >= end)
> > +		return;
> > +
> > +	k_start = ALIGN_DOWN((unsigned long)kasan_mem_to_shadow(start), PAGE_SIZE);
> > +	k_end = ALIGN((unsigned long)kasan_mem_to_shadow(end), PAGE_SIZE);
> > +
> > +	va = memblock_alloc(k_end - k_start, PAGE_SIZE);
> > +	for (k_cur = k_start; k_cur < k_end; k_cur += PAGE_SIZE, va += PAGE_SIZE)
> > +		map_kernel_page(k_cur, __pa(va), PAGE_KERNEL);
> > +}
> > +
> > +void __init kasan_init(void)
> > +{
> > +	/*
> > +	 * We want to do the following things:
> > +	 *  1) Map real memory into the shadow for all physical memblocks
> > +	 *     This takes us from c000... to c008...
> > +	 *  2) Leave a hole over the shadow of vmalloc space. KASAN_VMALLOC
> > +	 *     will manage this for us.
> > +	 *     This takes us from c008... to c00a...
> > +	 *  3) Map the 'early shadow'/zero page over iomap and vmemmap space.
> > +	 *     This takes us up to where we start at c00e...
> > +	 */
> > +
> > +	void *k_start = kasan_mem_to_shadow((void *)RADIX_VMALLOC_END);
> > +	void *k_end = kasan_mem_to_shadow((void *)RADIX_VMEMMAP_END);
> > +	phys_addr_t start, end;
> > +	u64 i;
> > +	pte_t zero_pte = pfn_pte(virt_to_pfn(kasan_early_shadow_page), PAGE_KERNEL);
> > +
> > +	if (!early_radix_enabled()) {
> > +		pr_warn("KASAN not enabled as it requires radix!");
> > +		return;
> > +	}
> > +
> > +	for_each_mem_range(i, &start, &end)
> > +		kasan_init_phys_region((void *)start, (void *)end);
> > +
> > +	for (i = 0; i < PTRS_PER_PTE; i++)
> > +		__set_pte_at(&init_mm, (unsigned long)kasan_early_shadow_page,
> > +			     &kasan_early_shadow_pte[i], zero_pte, 0);
> > +
> > +	for (i = 0; i < PTRS_PER_PMD; i++)
> > +		pmd_populate_kernel(&init_mm, &kasan_early_shadow_pmd[i],
> > +				    kasan_early_shadow_pte);
> > +
> > +	for (i = 0; i < PTRS_PER_PUD; i++)
> > +		pud_populate(&init_mm, &kasan_early_shadow_pud[i],
> > +			     kasan_early_shadow_pmd);
> > +
> > +	/* map the early shadow over the iomap and vmemmap space */
> > +	kasan_populate_early_shadow(k_start, k_end);
> > +
> > +	/* mark early shadow region as RO and wipe it */
> > +	zero_pte = pfn_pte(virt_to_pfn(kasan_early_shadow_page), PAGE_KERNEL_RO);
> > +	for (i = 0; i < PTRS_PER_PTE; i++)
> > +		__set_pte_at(&init_mm, (unsigned long)kasan_early_shadow_page,
> > +			     &kasan_early_shadow_pte[i], zero_pte, 0);
> > +
> > +	/*
> > +	 * clear_page relies on some cache info that hasn't been set up yet.
> > +	 * It ends up looping ~forever and blows up other data.
> > +	 * Use memset instead.
> > +	 */
> > +	memset(kasan_early_shadow_page, 0, PAGE_SIZE);
> > +
> > +	static_branch_inc(&powerpc_kasan_enabled_key);
> > +
> > +	/* Enable error messages */
> > +	init_task.kasan_depth = 0;
> > +	pr_info("KASAN init done (64-bit Book3S)\n");
> 
> Maybe mention RADIX in the printed message ?

Why?  As in, who would find it useful?

Paul.
Christophe Leroy May 17, 2022, 8:06 a.m. UTC | #3
Le 17/05/2022 à 09:31, Paul Mackerras a écrit :
> On Sun, May 15, 2022 at 07:33:52AM +0000, Christophe Leroy wrote:
>>
>>
>> Le 11/05/2022 à 09:28, Paul Mackerras a écrit :
>>> From: Daniel Axtens <dja@axtens.net>
>>>
>>> Implement a limited form of KASAN for Book3S 64-bit machines running under
>>> the Radix MMU, supporting only outline mode.
> 
> [snip]
> 
>>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>>> index b779603978e1..0bf76e40c9c2 100644
>>> --- a/arch/powerpc/Kconfig
>>> +++ b/arch/powerpc/Kconfig
>>> @@ -105,6 +105,7 @@ config PPC
>>>    	# Please keep this list sorted alphabetically.
>>>    	#
>>>    	select ARCH_32BIT_OFF_T if PPC32
>>> +	select ARCH_DISABLE_KASAN_INLINE	if PPC_RADIX_MMU
>>>    	select ARCH_ENABLE_MEMORY_HOTPLUG
>>>    	select ARCH_ENABLE_MEMORY_HOTREMOVE
>>>    	select ARCH_HAS_COPY_MC			if PPC64
>>> @@ -152,6 +153,7 @@ config PPC
>>>    	select ARCH_WANT_IPC_PARSE_VERSION
>>>    	select ARCH_WANT_IRQS_OFF_ACTIVATE_MM
>>>    	select ARCH_WANT_LD_ORPHAN_WARN
>>> +	select ARCH_WANTS_NO_INSTR
>>
>> Can you explain why we need that ?
> 
> The help text for the option says:
> 
>       An architecture should select this if the noinstr macro is being used on
>       functions to denote that the toolchain should avoid instrumenting such
>       functions and is required for correctness.
> 
> All it really seems to do is restrict the conditions under which the
> GCOV and KCOV options can be enabled.
> 
>> Is it tied to KASAN ? Is yes, why didn't we have it for PPC32 until now ?
> 
> Since we really do need to avoid instrumenting certain functions on
> ppc64 (as in things will break if we do instrument them), I think we
> need to select ARCH_WANTS_NO_INSTR.
> 
> For ppc32, as far as I recall there is much less code that runs in
> real mode and it is mostly assembler (except for some boot code),
> mostly because all addresses have to be explicitly translated to
> physical addresses for 32-bit real-mode code, unlike ppc64 where we
> can use access memory in the linear mapping using virtual addresses
> because of the fact that the CPU ignores the top 4 bits of the
> effective address in real mode.  That said, there is a lot less code
> that runs in real mode on ppc64 than there used to be.
> 
>> Maybe that's independant of KASAN and would be worth a separate patch ?
> 
> Yes, possibly, though KASAN does appear to be the only user of noinstr
> in arch/powerpc.
> 
> [snip]
> 
>>> diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
>>> index d090d9612348..bafc9869afcd 100644
>>> --- a/arch/powerpc/include/asm/book3s/64/radix.h
>>> +++ b/arch/powerpc/include/asm/book3s/64/radix.h
>>> @@ -35,6 +35,11 @@
>>>    #define RADIX_PMD_SHIFT		(PAGE_SHIFT + RADIX_PTE_INDEX_SIZE)
>>>    #define RADIX_PUD_SHIFT		(RADIX_PMD_SHIFT + RADIX_PMD_INDEX_SIZE)
>>>    #define RADIX_PGD_SHIFT		(RADIX_PUD_SHIFT + RADIX_PUD_INDEX_SIZE)
>>> +
>>> +#define R_PTRS_PER_PTE		(1 << RADIX_PTE_INDEX_SIZE)
>>> +#define R_PTRS_PER_PMD		(1 << RADIX_PMD_INDEX_SIZE)
>>> +#define R_PTRS_PER_PUD		(1 << RADIX_PUD_INDEX_SIZE)
>>> +
>>>    /*
>>>     * Size of EA range mapped by our pagetables.
>>>     */
>>> @@ -68,11 +73,11 @@
>>>     *
>>>     *
>>>     * 3rd quadrant expanded:
>>> - * +------------------------------+
>>> + * +------------------------------+  Highest address (0xc010000000000000)
>>> + * +------------------------------+  KASAN shadow end (0xc00fc00000000000)
>>>     * |                              |
>>>     * |                              |
>>> - * |                              |
>>> - * +------------------------------+  Kernel vmemmap end (0xc010000000000000)
>>> + * +------------------------------+  Kernel vmemmap end/shadow start (0xc00e000000000000)
>>>     * |                              |
>>>     * |           512TB		  |
>>>     * |                              |
>>> @@ -126,6 +131,8 @@
>>>    #define RADIX_VMEMMAP_SIZE	RADIX_KERN_MAP_SIZE
>>>    #define RADIX_VMEMMAP_END	(RADIX_VMEMMAP_START + RADIX_VMEMMAP_SIZE)
>>>    
>>> +/* For the sizes of the shadow area, see kasan.h */
>>> +
>>
>> Why does this comment pops up here ?
> 
> Do you mean, why is it there at all, or do you mean why is it in that
> particular place rather than closer to the "3rd quadrant expanded"
> diagram or something?

Why at this place mainly, I mean I can't see the relationship between 
the added comment and the following lines starting with a #ifndef ASSEMBLY.

If the comment is to be usefull, it should be added to a related place I 
guess.

> 
>>>    #ifndef __ASSEMBLY__
>>>    #define RADIX_PTE_TABLE_SIZE	(sizeof(pte_t) << RADIX_PTE_INDEX_SIZE)
>>>    #define RADIX_PMD_TABLE_SIZE	(sizeof(pmd_t) << RADIX_PMD_INDEX_SIZE)
>>> diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h
>>> index fc28f46d2f9d..fb244b6ca7f0 100644
>>> --- a/arch/powerpc/include/asm/interrupt.h
>>> +++ b/arch/powerpc/include/asm/interrupt.h
>>> @@ -327,22 +327,46 @@ static inline void interrupt_nmi_enter_prepare(struct pt_regs *regs, struct inte
>>>    	}
>>>    #endif
>>>    
>>> +	/* If data relocations are enabled, it's safe to use nmi_enter() */
>>> +	if (mfmsr() & MSR_DR) {
>>> +		nmi_enter();
>>> +		return;
>>> +	}
>>> +
>>>    	/*
>>> -	 * Do not use nmi_enter() for pseries hash guest taking a real-mode
>>> +	 * But do not use nmi_enter() for pseries hash guest taking a real-mode
>>>    	 * NMI because not everything it touches is within the RMA limit.
>>>    	 */
>>> +	if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) &&
>>> +	    firmware_has_feature(FW_FEATURE_LPAR) &&
>>> +	    !radix_enabled())
>>> +		return;
>>> +
>>> +	/*
>>> +	 * Likewise, don't use it if we have some form of instrumentation (like
>>> +	 * KASAN shadow) that is not safe to access in real mode (even on radix)
>>> +	 */
>>> +	if (IS_ENABLED(CONFIG_KASAN))
>>> +		return;
>>> +
>>> +	/* Otherwise, it should be safe to call it */
>>> +	nmi_enter();
>>>    }
>>>    
>>>    static inline void interrupt_nmi_exit_prepare(struct pt_regs *regs, struct interrupt_nmi_state *state)
>>>    {
>>> -	if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64) ||
>>> -			!firmware_has_feature(FW_FEATURE_LPAR) ||
>>> -			radix_enabled() || (mfmsr() & MSR_DR))
>>> +	if (mfmsr() & MSR_DR) {
>>> +		// nmi_exit if relocations are on
>>>    		nmi_exit();
>>> +	} else if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) &&
>>> +		   firmware_has_feature(FW_FEATURE_LPAR) &&
>>> +		   !radix_enabled()) {
>>> +		// no nmi_exit for a pseries hash guest taking a real mode exception
>>> +	} else if (IS_ENABLED(CONFIG_KASAN)) {
>>> +		// no nmi_exit for KASAN in real mode
>>> +	} else {
>>> +		nmi_exit();
>>> +	}
>>>    
>>
>> I think all this changeset around MSR_DR versus NMI would be worth a
>> separate patch with a dedicated explanation.
> 
> It's actually not as much of a change as it might appear.  All that is
> really being done is to add !IS_ENABLED(CONFIG_KASAN) to the
> conditions for calling nmi_enter() and nmi_exit() when the CPU is in
> real mode.  In other words, the condition is equivalent to:
> 
> 	if (((!IS_ENABLED(CONFIG_PPC_BOOK3S_64) ||
> 			!firmware_has_feature(FW_FEATURE_LPAR) ||
> 			radix_enabled()) && !IS_ENABLED(CONFIG_KASAN)) ||
> 			(mfmsr() & MSR_DR))
> 		nmi_enter();
> 
> I think the version as Daniel has transformed it is more readable
> though.  It could be a separate patch, sure.
> 
>>>    	/*
>>>    	 * nmi does not call nap_adjust_return because nmi should not create
>>> @@ -410,7 +434,8 @@ static inline void interrupt_nmi_exit_prepare(struct pt_regs *regs, struct inter
>>>     * Specific handlers may have additional restrictions.
>>>     */
>>>    #define DEFINE_INTERRUPT_HANDLER_RAW(func)				\
>>> -static __always_inline long ____##func(struct pt_regs *regs);		\
>>> +static __always_inline __no_sanitize_address __no_kcsan long		\
>>
>> Why is this required for PPC64 while it wasn't for PPC32 ?
> 
> I assume because we switch into virtual mode sooner on ppc32, because
> of the need to transform addresses explicitly in real mode.  I haven't
> looked at the ppc32 interrupt code for a long time (and even the ppc64
> interrupt code is now quite different from what I recall).
> 
>> Any side effect on PPC32 ?
> 
> I suppose you won't get KASAN coverage of those interrupt handling
> functions, where you did previously.  Do you want me to make the
> "__no_sanitize_address __no_kcsan" apply only on ppc64?

No, I think that should be OK for PPC32 as well now that you explained 
what it really does.

> 
> [snip]
> 
>>> diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
>>> index 3c478e5ef24c..6efc822e70fd 100644
>>> --- a/arch/powerpc/include/asm/kasan.h
>>> +++ b/arch/powerpc/include/asm/kasan.h
>>> @@ -30,9 +30,31 @@
>>>    
>>>    #define KASAN_SHADOW_OFFSET	ASM_CONST(CONFIG_KASAN_SHADOW_OFFSET)
>>>    
>>> +#ifdef CONFIG_PPC32
>>>    #define KASAN_SHADOW_END	(-(-KASAN_SHADOW_START >> KASAN_SHADOW_SCALE_SHIFT))
>>> +#endif
>>>    
>>>    #ifdef CONFIG_KASAN
>>> +#ifdef CONFIG_PPC_BOOK3S_64
>>> +/*
>>> + * The shadow ends before the highest accessible address
>>> + * because we don't need a shadow for the shadow. Instead:
>>> + * c00e000000000000 << 3 + a80e000000000000 = c00fc00000000000
>>> + */
>>> +#define KASAN_SHADOW_END 0xc00fc00000000000UL
>>
>> Would be easer to read if we could group that with PPC32 in a
>> #ifdef/#else/#endif
> 
> Sure.  It's a bit weird that KASAN_SHADOW_END is always defined for
> ppc32, but only if CONFIG_KASAN=y for ppc64.  I'll fix it.

Thanks

> 
> [snip]
> 
>>> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
>>> index 4d7829399570..e46335da7f72 100644
>>> --- a/arch/powerpc/kernel/Makefile
>>> +++ b/arch/powerpc/kernel/Makefile
>>> @@ -33,6 +33,17 @@ KASAN_SANITIZE_early_32.o := n
>>>    KASAN_SANITIZE_cputable.o := n
>>>    KASAN_SANITIZE_prom_init.o := n
>>>    KASAN_SANITIZE_btext.o := n
>>> +KASAN_SANITIZE_paca.o := n
>>> +KASAN_SANITIZE_setup_64.o := n
>>> +KASAN_SANITIZE_mce.o := n
>>> +KASAN_SANITIZE_mce_power.o := n
>>> +
>>> +# we have to be particularly careful in ppc64 to exclude code that
>>> +# runs with translations off, as we cannot access the shadow with
>>> +# translations off. However, ppc32 can sanitize this.
>>> +ifdef CONFIG_PPC64
>>> +KASAN_SANITIZE_traps.o := n
>>> +endif
>>>    
>>>    ifdef CONFIG_KASAN
>>>    CFLAGS_early_32.o += -DDISABLE_BRANCH_PROFILING
>>> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
>>> index 984813a4d5dc..c962f9bfc7d4 100644
>>> --- a/arch/powerpc/kernel/process.c
>>> +++ b/arch/powerpc/kernel/process.c
>>> @@ -2220,8 +2220,8 @@ void __no_sanitize_address show_stack(struct task_struct *tsk,
>>>    			break;
>>>    
>>>    		stack = (unsigned long *) sp;
>>> -		newsp = stack[0];
>>> -		ip = stack[STACK_FRAME_LR_SAVE];
>>> +		newsp = READ_ONCE_NOCHECK(stack[0]);
>>> +		ip = READ_ONCE_NOCHECK(stack[STACK_FRAME_LR_SAVE]);
>>>    		if (!firstframe || ip != lr) {
>>>    			printk("%s["REG"] ["REG"] %pS",
>>>    				loglvl, sp, ip, (void *)ip);
>>> @@ -2239,17 +2239,19 @@ void __no_sanitize_address show_stack(struct task_struct *tsk,
>>>    		 * See if this is an exception frame.
>>>    		 * We look for the "regshere" marker in the current frame.
>>>    		 */
>>> -		if (validate_sp(sp, tsk, STACK_FRAME_WITH_PT_REGS)
>>> -		    && stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER) {
>>> +		if (validate_sp(sp, tsk, STACK_FRAME_WITH_PT_REGS) &&
>>> +		    (READ_ONCE_NOCHECK(stack[STACK_FRAME_MARKER]) ==
>>> +		     STACK_FRAME_REGS_MARKER)) {
>>>    			struct pt_regs *regs = (struct pt_regs *)
>>>    				(sp + STACK_FRAME_OVERHEAD);
>>>    
>>> -			lr = regs->link;
>>> +			lr = READ_ONCE_NOCHECK(regs->link);
>>>    			printk("%s--- interrupt: %lx at %pS\n",
>>> -			       loglvl, regs->trap, (void *)regs->nip);
>>> +			       loglvl, READ_ONCE_NOCHECK(regs->trap),
>>> +			       (void *)READ_ONCE_NOCHECK(regs->nip));
>>>    			__show_regs(regs);
>>>    			printk("%s--- interrupt: %lx\n",
>>> -			       loglvl, regs->trap);
>>> +			       loglvl, READ_ONCE_NOCHECK(regs->trap));
>>>    
>>>    			firstframe = 1;
>>>    		}
>>
>> This change we submitted as a standalone patch and was rejected as is,
>> see
>> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210528074806.1311297-1-dja@axtens.net/
>>
>> v2 was accepted, see
>> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210614120907.1952321-1-dja@axtens.net/
>>
>> Therefore AFAIU this change is redundant with the accepted v2 and should
>> be dropped.
> 
> OK, thanks for the history.  I'll drop this change.
> 
>>> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
>>> index b7fd6a72aa76..5fedadc27952 100644
>>> --- a/arch/powerpc/kernel/smp.c
>>> +++ b/arch/powerpc/kernel/smp.c
>>> @@ -413,32 +413,32 @@ static struct cpumask nmi_ipi_pending_mask;
>>>    static bool nmi_ipi_busy = false;
>>>    static void (*nmi_ipi_function)(struct pt_regs *) = NULL;
>>>    
>>> -static void nmi_ipi_lock_start(unsigned long *flags)
>>> +noinstr static void nmi_ipi_lock_start(unsigned long *flags)
>>>    {
>>>    	raw_local_irq_save(*flags);
>>>    	hard_irq_disable();
>>> -	while (atomic_cmpxchg(&__nmi_ipi_lock, 0, 1) == 1) {
>>> +	while (arch_atomic_cmpxchg(&__nmi_ipi_lock, 0, 1) == 1) {
>>>    		raw_local_irq_restore(*flags);
>>> -		spin_until_cond(atomic_read(&__nmi_ipi_lock) == 0);
>>> +		spin_until_cond(arch_atomic_read(&__nmi_ipi_lock) == 0);
>>>    		raw_local_irq_save(*flags);
>>>    		hard_irq_disable();
>>>    	}
>>>    }
>>>    
>>> -static void nmi_ipi_lock(void)
>>> +noinstr static void nmi_ipi_lock(void)
>>>    {
>>> -	while (atomic_cmpxchg(&__nmi_ipi_lock, 0, 1) == 1)
>>> -		spin_until_cond(atomic_read(&__nmi_ipi_lock) == 0);
>>> +	while (arch_atomic_cmpxchg(&__nmi_ipi_lock, 0, 1) == 1)
>>> +		spin_until_cond(arch_atomic_read(&__nmi_ipi_lock) == 0);
>>>    }
>>>    
>>> -static void nmi_ipi_unlock(void)
>>> +noinstr static void nmi_ipi_unlock(void)
>>>    {
>>>    	smp_mb();
>>> -	WARN_ON(atomic_read(&__nmi_ipi_lock) != 1);
>>> -	atomic_set(&__nmi_ipi_lock, 0);
>>> +	WARN_ON(arch_atomic_read(&__nmi_ipi_lock) != 1);
>>> +	arch_atomic_set(&__nmi_ipi_lock, 0);
>>>    }
>>>    
>>> -static void nmi_ipi_unlock_end(unsigned long *flags)
>>> +noinstr static void nmi_ipi_unlock_end(unsigned long *flags)
>>>    {
>>>    	nmi_ipi_unlock();
>>>    	raw_local_irq_restore(*flags);
>>> @@ -447,7 +447,7 @@ static void nmi_ipi_unlock_end(unsigned long *flags)
>>>    /*
>>>     * Platform NMI handler calls this to ack
>>>     */
>>> -int smp_handle_nmi_ipi(struct pt_regs *regs)
>>> +noinstr int smp_handle_nmi_ipi(struct pt_regs *regs)
>>>    {
>>>    	void (*fn)(struct pt_regs *) = NULL;
>>>    	unsigned long flags;
>>
>> Can you explain those changes from atomic_xxx to arch_atomic_xxx ?
> 
> With KASAN enabled, atomic_read() and similar functions become
> wrappers which explicitly call KASAN checking functions.  See
> include/linux/atomic/atomic-instrumented.h and
> include/linux/instrumented.h.  Since we don't want those NMI functions
> instrumented, we have to use the arch_atomic_* functions so as to
> avoid getting the explicit instrumentation calls.  As to why we don't
> want instrumentation here, it's because we want to avoid crashing when
> doing kexec or more particularly kdump, I believe.
> 
>> Should it be a separate patch ?
> 
> Could be - matter of taste I guess.

Well, this patch being quite big it's usually appreciable to get it 
split with some preparation patches, that's the way we do usually, I 
remember when I submitted the series to Switch to 
CONFIG_THREAD_INFO_IN_TASK, there was already half a dozen of 
preparation patches and Michael split them into several more.

> 
> [snip]
> 
>>> diff --git a/arch/powerpc/mm/book3s64/Makefile b/arch/powerpc/mm/book3s64/Makefile
>>> index 2d50cac499c5..3150159e5121 100644
>>> --- a/arch/powerpc/mm/book3s64/Makefile
>>> +++ b/arch/powerpc/mm/book3s64/Makefile
>>> @@ -24,3 +24,12 @@ obj-$(CONFIG_PPC_PKEY)	+= pkeys.o
>>>    
>>>    # Instrumenting the SLB fault path can lead to duplicate SLB entries
>>>    KCOV_INSTRUMENT_slb.o := n
>>> +
>>> +# Parts of these can run in real mode and therefore are
>>> +# not safe with the current outline KASAN implementation
>>
>> But are we not safe about KASAN being disabled when in real mode ? Isn't
>> it the purpose of kasan_arch_is_ready ?
> 
> We have not added kasan_arch_is_ready() checks to all of the KASAN
> entry points, only those that are called early in boot before KASAN
> has been initialized, as I found out when I changed kasan_init() to
> just return without doing anything if radix is not enabled.  On
> POWER8, the kernel crashed shortly afterwards in generic KASAN code
> that didn't have a kasan_arch_is_ready() call in it.
> 
> As to whether we should add such calls to every entry point in the
> generic KASAN code -- it would probably be a good idea to do that,
> though a bit fragile in the long term because people will add stuff to
> the generic KASAN code without realizing there needs to be the guard,
> since their architecture won't need it.
> 
> In the really long term, hopefully we can eventually drop HPT and get
> to the point where practically nothing runs in real mode.  That's a
> fair way off though.
> 
>>> +KASAN_SANITIZE_mmu_context.o := n
>>> +KASAN_SANITIZE_pgtable.o := n
>>> +KASAN_SANITIZE_radix_pgtable.o := n
>>> +KASAN_SANITIZE_radix_tlb.o := n
>>> +KASAN_SANITIZE_slb.o := n
>>> +KASAN_SANITIZE_pkeys.o := n
>>> diff --git a/arch/powerpc/mm/kasan/Makefile b/arch/powerpc/mm/kasan/Makefile
>>> index bcbfd6f2eca3..4999aadb1867 100644
>>> --- a/arch/powerpc/mm/kasan/Makefile
>>> +++ b/arch/powerpc/mm/kasan/Makefile
>>> @@ -5,3 +5,4 @@ KASAN_SANITIZE := n
>>>    obj-$(CONFIG_PPC32)		+= init_32.o
>>>    obj-$(CONFIG_PPC_8xx)		+= 8xx.o
>>>    obj-$(CONFIG_PPC_BOOK3S_32)	+= book3s_32.o
>>> +obj-$(CONFIG_PPC_BOOK3S_64)	+= init_book3s_64.o
>>> diff --git a/arch/powerpc/mm/kasan/init_book3s_64.c b/arch/powerpc/mm/kasan/init_book3s_64.c
>>> new file mode 100644
>>> index 000000000000..cb96e16e57a1
>>> --- /dev/null
>>> +++ b/arch/powerpc/mm/kasan/init_book3s_64.c
>>> @@ -0,0 +1,103 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +/*
>>> + * KASAN for 64-bit Book3S powerpc
>>> + *
>>> + * Copyright (C) 2019-2020 IBM Corporation
>>> + * Author: Daniel Axtens <dja@axtens.net>
>>> + */
>>> +
>>> +/*
>>> +# ppc64 turns on virtual memory late in boot, after calling into generic code
>>> +# like the device-tree parser, so it uses this in conjunction with a hook in
>>> +# outline mode to avoid invalid access early in boot.
>>> +*/
>>> +
>>> +#define DISABLE_BRANCH_PROFILING
>>> +
>>> +#include <linux/kasan.h>
>>> +#include <linux/printk.h>
>>> +#include <linux/sched/task.h>
>>> +#include <linux/memblock.h>
>>> +#include <asm/pgalloc.h>
>>> +
>>> +DEFINE_STATIC_KEY_FALSE(powerpc_kasan_enabled_key);
>>> +
>>> +static void __init kasan_init_phys_region(void *start, void *end)
>>> +{
>>> +	unsigned long k_start, k_end, k_cur;
>>> +	void *va;
>>> +
>>> +	if (start >= end)
>>> +		return;
>>> +
>>> +	k_start = ALIGN_DOWN((unsigned long)kasan_mem_to_shadow(start), PAGE_SIZE);
>>> +	k_end = ALIGN((unsigned long)kasan_mem_to_shadow(end), PAGE_SIZE);
>>> +
>>> +	va = memblock_alloc(k_end - k_start, PAGE_SIZE);
>>> +	for (k_cur = k_start; k_cur < k_end; k_cur += PAGE_SIZE, va += PAGE_SIZE)
>>> +		map_kernel_page(k_cur, __pa(va), PAGE_KERNEL);
>>> +}
>>> +
>>> +void __init kasan_init(void)
>>> +{
>>> +	/*
>>> +	 * We want to do the following things:
>>> +	 *  1) Map real memory into the shadow for all physical memblocks
>>> +	 *     This takes us from c000... to c008...
>>> +	 *  2) Leave a hole over the shadow of vmalloc space. KASAN_VMALLOC
>>> +	 *     will manage this for us.
>>> +	 *     This takes us from c008... to c00a...
>>> +	 *  3) Map the 'early shadow'/zero page over iomap and vmemmap space.
>>> +	 *     This takes us up to where we start at c00e...
>>> +	 */
>>> +
>>> +	void *k_start = kasan_mem_to_shadow((void *)RADIX_VMALLOC_END);
>>> +	void *k_end = kasan_mem_to_shadow((void *)RADIX_VMEMMAP_END);
>>> +	phys_addr_t start, end;
>>> +	u64 i;
>>> +	pte_t zero_pte = pfn_pte(virt_to_pfn(kasan_early_shadow_page), PAGE_KERNEL);
>>> +
>>> +	if (!early_radix_enabled()) {
>>> +		pr_warn("KASAN not enabled as it requires radix!");
>>> +		return;
>>> +	}
>>> +
>>> +	for_each_mem_range(i, &start, &end)
>>> +		kasan_init_phys_region((void *)start, (void *)end);
>>> +
>>> +	for (i = 0; i < PTRS_PER_PTE; i++)
>>> +		__set_pte_at(&init_mm, (unsigned long)kasan_early_shadow_page,
>>> +			     &kasan_early_shadow_pte[i], zero_pte, 0);
>>> +
>>> +	for (i = 0; i < PTRS_PER_PMD; i++)
>>> +		pmd_populate_kernel(&init_mm, &kasan_early_shadow_pmd[i],
>>> +				    kasan_early_shadow_pte);
>>> +
>>> +	for (i = 0; i < PTRS_PER_PUD; i++)
>>> +		pud_populate(&init_mm, &kasan_early_shadow_pud[i],
>>> +			     kasan_early_shadow_pmd);
>>> +
>>> +	/* map the early shadow over the iomap and vmemmap space */
>>> +	kasan_populate_early_shadow(k_start, k_end);
>>> +
>>> +	/* mark early shadow region as RO and wipe it */
>>> +	zero_pte = pfn_pte(virt_to_pfn(kasan_early_shadow_page), PAGE_KERNEL_RO);
>>> +	for (i = 0; i < PTRS_PER_PTE; i++)
>>> +		__set_pte_at(&init_mm, (unsigned long)kasan_early_shadow_page,
>>> +			     &kasan_early_shadow_pte[i], zero_pte, 0);
>>> +
>>> +	/*
>>> +	 * clear_page relies on some cache info that hasn't been set up yet.
>>> +	 * It ends up looping ~forever and blows up other data.
>>> +	 * Use memset instead.
>>> +	 */
>>> +	memset(kasan_early_shadow_page, 0, PAGE_SIZE);
>>> +
>>> +	static_branch_inc(&powerpc_kasan_enabled_key);
>>> +
>>> +	/* Enable error messages */
>>> +	init_task.kasan_depth = 0;
>>> +	pr_info("KASAN init done (64-bit Book3S)\n");
>>
>> Maybe mention RADIX in the printed message ?
> 
> Why?  As in, who would find it useful?

Don't know. Is it usefull to mention 64-bit Book3S at all then ?

Christophe
diff mbox series

Patch

diff --git a/Documentation/dev-tools/kasan.rst b/Documentation/dev-tools/kasan.rst
index 8e5cd26d59cb..448995c11bee 100644
--- a/Documentation/dev-tools/kasan.rst
+++ b/Documentation/dev-tools/kasan.rst
@@ -36,8 +36,9 @@  Both software KASAN modes work with SLUB and SLAB memory allocators,
 while the hardware tag-based KASAN currently only supports SLUB.
 
 Currently, generic KASAN is supported for the x86_64, arm, arm64, xtensa, s390,
-and riscv architectures. It is also supported on 32-bit powerpc kernels.
-Tag-based KASAN modes are supported only for arm64.
+and riscv architectures. It is also supported on powerpc for 32-bit kernels and
+for 64-bit kernels running under the Radix MMU. Tag-based KASAN modes are
+supported only for arm64.
 
 Usage
 -----
@@ -350,10 +351,10 @@  CONFIG_KASAN_VMALLOC
 
 With ``CONFIG_KASAN_VMALLOC``, KASAN can cover vmalloc space at the
 cost of greater memory usage. Currently, this is supported on x86,
-riscv, s390, and 32-bit powerpc.
+riscv, s390, and powerpc.
 
-It is optional, except on 32-bit powerpc kernels with module support,
-where it is required.
+It is optional, except on 64-bit powerpc kernels, and on 32-bit
+powerpc kernels with module support, where it is required.
 
 This works by hooking into vmalloc and vmap and dynamically
 allocating real shadow memory to back the mappings.
diff --git a/Documentation/powerpc/kasan.txt b/Documentation/powerpc/kasan.txt
index 26bb0e8bb18c..f032b4eaf205 100644
--- a/Documentation/powerpc/kasan.txt
+++ b/Documentation/powerpc/kasan.txt
@@ -1,4 +1,4 @@ 
-KASAN is supported on powerpc on 32-bit only.
+KASAN is supported on powerpc on 32-bit and Radix 64-bit only.
 
 32 bit support
 ==============
@@ -10,3 +10,49 @@  fixmap area and occupies one eighth of the total kernel virtual memory space.
 
 Instrumentation of the vmalloc area is optional, unless built with modules,
 in which case it is required.
+
+64 bit support
+==============
+
+Currently, only the radix MMU is supported. There have been versions for hash
+and Book3E processors floating around on the mailing list, but nothing has been
+merged.
+
+KASAN support on Book3S is a bit tricky to get right:
+
+ - It would be good to support inline instrumentation so as to be able to catch
+   stack issues that cannot be caught with outline mode.
+
+ - Inline instrumentation requires a fixed offset.
+
+ - Book3S runs code with translations off ("real mode") during boot, including a
+   lot of generic device-tree parsing code which is used to determine MMU
+   features.
+
+ - Some code - most notably a lot of KVM code - also runs with translations off
+   after boot.
+
+ - Therefore any offset has to point to memory that is valid with
+   translations on or off.
+
+One approach is just to give up on inline instrumentation. This way boot-time
+checks can be delayed until after the MMU is set is up, and we can just not
+instrument any code that runs with translations off after booting. This is the
+current approach.
+
+To avoid this limitiation, the KASAN shadow would have to be placed inside the
+linear mapping, using the same high-bits trick we use for the rest of the linear
+mapping. This is tricky:
+
+ - We'd like to place it near the start of physical memory. In theory we can do
+   this at run-time based on how much physical memory we have, but this requires
+   being able to arbitrarily relocate the kernel, which is basically the tricky
+   part of KASLR. Not being game to implement both tricky things at once, this
+   is hopefully something we can revisit once we get KASLR for Book3S.
+
+ - Alternatively, we can place the shadow at the _end_ of memory, but this
+   requires knowing how much contiguous physical memory a system has _at compile
+   time_. This is a big hammer, and has some unfortunate consequences: inablity
+   to handle discontiguous physical memory, total failure to boot on machines
+   with less memory than specified, and that machines with more memory than
+   specified can't use it. This was deemed unacceptable.
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index b779603978e1..0bf76e40c9c2 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -105,6 +105,7 @@  config PPC
 	# Please keep this list sorted alphabetically.
 	#
 	select ARCH_32BIT_OFF_T if PPC32
+	select ARCH_DISABLE_KASAN_INLINE	if PPC_RADIX_MMU
 	select ARCH_ENABLE_MEMORY_HOTPLUG
 	select ARCH_ENABLE_MEMORY_HOTREMOVE
 	select ARCH_HAS_COPY_MC			if PPC64
@@ -152,6 +153,7 @@  config PPC
 	select ARCH_WANT_IPC_PARSE_VERSION
 	select ARCH_WANT_IRQS_OFF_ACTIVATE_MM
 	select ARCH_WANT_LD_ORPHAN_WARN
+	select ARCH_WANTS_NO_INSTR
 	select ARCH_WEAK_RELEASE_ACQUIRE
 	select BINFMT_ELF
 	select BUILDTIME_TABLE_SORT
@@ -183,7 +185,8 @@  config PPC
 	select HAVE_ARCH_JUMP_LABEL
 	select HAVE_ARCH_JUMP_LABEL_RELATIVE
 	select HAVE_ARCH_KASAN			if PPC32 && PPC_PAGE_SHIFT <= 14
-	select HAVE_ARCH_KASAN_VMALLOC		if PPC32 && PPC_PAGE_SHIFT <= 14
+	select HAVE_ARCH_KASAN			if PPC_RADIX_MMU
+	select HAVE_ARCH_KASAN_VMALLOC		if HAVE_ARCH_KASAN
 	select HAVE_ARCH_KFENCE			if PPC_BOOK3S_32 || PPC_8xx || 40x
 	select HAVE_ARCH_KGDB
 	select HAVE_ARCH_MMAP_RND_BITS
diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
index 192f0ed0097f..9f363c143d86 100644
--- a/arch/powerpc/Kconfig.debug
+++ b/arch/powerpc/Kconfig.debug
@@ -374,4 +374,5 @@  config PPC_FAST_ENDIAN_SWITCH
 config KASAN_SHADOW_OFFSET
 	hex
 	depends on KASAN
-	default 0xe0000000
+	default 0xe0000000 if PPC32
+	default 0xa80e000000000000 if PPC64
diff --git a/arch/powerpc/include/asm/book3s/64/hash.h b/arch/powerpc/include/asm/book3s/64/hash.h
index a7a0572f3846..17e7a778c856 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -18,6 +18,10 @@ 
 #include <asm/book3s/64/hash-4k.h>
 #endif
 
+#define H_PTRS_PER_PTE		(1 << H_PTE_INDEX_SIZE)
+#define H_PTRS_PER_PMD		(1 << H_PMD_INDEX_SIZE)
+#define H_PTRS_PER_PUD		(1 << H_PUD_INDEX_SIZE)
+
 /* Bits to set in a PMD/PUD/PGD entry valid bit*/
 #define HASH_PMD_VAL_BITS		(0x8000000000000000UL)
 #define HASH_PUD_VAL_BITS		(0x8000000000000000UL)
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 875730d5af40..010eb373fcb3 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -232,6 +232,9 @@  extern unsigned long __pmd_frag_size_shift;
 #define PTRS_PER_PUD	(1 << PUD_INDEX_SIZE)
 #define PTRS_PER_PGD	(1 << PGD_INDEX_SIZE)
 
+#define MAX_PTRS_PER_PTE ((H_PTRS_PER_PTE > R_PTRS_PER_PTE) ? H_PTRS_PER_PTE : R_PTRS_PER_PTE)
+#define MAX_PTRS_PER_PMD ((H_PTRS_PER_PMD > R_PTRS_PER_PMD) ? H_PTRS_PER_PMD : R_PTRS_PER_PMD)
+#define MAX_PTRS_PER_PUD ((H_PTRS_PER_PUD > R_PTRS_PER_PUD) ? H_PTRS_PER_PUD : R_PTRS_PER_PUD)
 #define MAX_PTRS_PER_PGD	(1 << (H_PGD_INDEX_SIZE > RADIX_PGD_INDEX_SIZE ? \
 				       H_PGD_INDEX_SIZE : RADIX_PGD_INDEX_SIZE))
 
diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
index d090d9612348..bafc9869afcd 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -35,6 +35,11 @@ 
 #define RADIX_PMD_SHIFT		(PAGE_SHIFT + RADIX_PTE_INDEX_SIZE)
 #define RADIX_PUD_SHIFT		(RADIX_PMD_SHIFT + RADIX_PMD_INDEX_SIZE)
 #define RADIX_PGD_SHIFT		(RADIX_PUD_SHIFT + RADIX_PUD_INDEX_SIZE)
+
+#define R_PTRS_PER_PTE		(1 << RADIX_PTE_INDEX_SIZE)
+#define R_PTRS_PER_PMD		(1 << RADIX_PMD_INDEX_SIZE)
+#define R_PTRS_PER_PUD		(1 << RADIX_PUD_INDEX_SIZE)
+
 /*
  * Size of EA range mapped by our pagetables.
  */
@@ -68,11 +73,11 @@ 
  *
  *
  * 3rd quadrant expanded:
- * +------------------------------+
+ * +------------------------------+  Highest address (0xc010000000000000)
+ * +------------------------------+  KASAN shadow end (0xc00fc00000000000)
  * |                              |
  * |                              |
- * |                              |
- * +------------------------------+  Kernel vmemmap end (0xc010000000000000)
+ * +------------------------------+  Kernel vmemmap end/shadow start (0xc00e000000000000)
  * |                              |
  * |           512TB		  |
  * |                              |
@@ -126,6 +131,8 @@ 
 #define RADIX_VMEMMAP_SIZE	RADIX_KERN_MAP_SIZE
 #define RADIX_VMEMMAP_END	(RADIX_VMEMMAP_START + RADIX_VMEMMAP_SIZE)
 
+/* For the sizes of the shadow area, see kasan.h */
+
 #ifndef __ASSEMBLY__
 #define RADIX_PTE_TABLE_SIZE	(sizeof(pte_t) << RADIX_PTE_INDEX_SIZE)
 #define RADIX_PMD_TABLE_SIZE	(sizeof(pmd_t) << RADIX_PMD_INDEX_SIZE)
diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h
index fc28f46d2f9d..fb244b6ca7f0 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -327,22 +327,46 @@  static inline void interrupt_nmi_enter_prepare(struct pt_regs *regs, struct inte
 	}
 #endif
 
+	/* If data relocations are enabled, it's safe to use nmi_enter() */
+	if (mfmsr() & MSR_DR) {
+		nmi_enter();
+		return;
+	}
+
 	/*
-	 * Do not use nmi_enter() for pseries hash guest taking a real-mode
+	 * But do not use nmi_enter() for pseries hash guest taking a real-mode
 	 * NMI because not everything it touches is within the RMA limit.
 	 */
-	if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64) ||
-			!firmware_has_feature(FW_FEATURE_LPAR) ||
-			radix_enabled() || (mfmsr() & MSR_DR))
-		nmi_enter();
+	if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) &&
+	    firmware_has_feature(FW_FEATURE_LPAR) &&
+	    !radix_enabled())
+		return;
+
+	/*
+	 * Likewise, don't use it if we have some form of instrumentation (like
+	 * KASAN shadow) that is not safe to access in real mode (even on radix)
+	 */
+	if (IS_ENABLED(CONFIG_KASAN))
+		return;
+
+	/* Otherwise, it should be safe to call it */
+	nmi_enter();
 }
 
 static inline void interrupt_nmi_exit_prepare(struct pt_regs *regs, struct interrupt_nmi_state *state)
 {
-	if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64) ||
-			!firmware_has_feature(FW_FEATURE_LPAR) ||
-			radix_enabled() || (mfmsr() & MSR_DR))
+	if (mfmsr() & MSR_DR) {
+		// nmi_exit if relocations are on
 		nmi_exit();
+	} else if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) &&
+		   firmware_has_feature(FW_FEATURE_LPAR) &&
+		   !radix_enabled()) {
+		// no nmi_exit for a pseries hash guest taking a real mode exception
+	} else if (IS_ENABLED(CONFIG_KASAN)) {
+		// no nmi_exit for KASAN in real mode
+	} else {
+		nmi_exit();
+	}
 
 	/*
 	 * nmi does not call nap_adjust_return because nmi should not create
@@ -410,7 +434,8 @@  static inline void interrupt_nmi_exit_prepare(struct pt_regs *regs, struct inter
  * Specific handlers may have additional restrictions.
  */
 #define DEFINE_INTERRUPT_HANDLER_RAW(func)				\
-static __always_inline long ____##func(struct pt_regs *regs);		\
+static __always_inline __no_sanitize_address __no_kcsan long		\
+____##func(struct pt_regs *regs);					\
 									\
 interrupt_handler long func(struct pt_regs *regs)			\
 {									\
@@ -424,7 +449,8 @@  interrupt_handler long func(struct pt_regs *regs)			\
 }									\
 NOKPROBE_SYMBOL(func);							\
 									\
-static __always_inline long ____##func(struct pt_regs *regs)
+static __always_inline __no_sanitize_address __no_kcsan long		\
+____##func(struct pt_regs *regs)
 
 /**
  * DECLARE_INTERRUPT_HANDLER - Declare synchronous interrupt handler function
@@ -549,7 +575,8 @@  static __always_inline void ____##func(struct pt_regs *regs)
  * body with a pair of curly brackets.
  */
 #define DEFINE_INTERRUPT_HANDLER_NMI(func)				\
-static __always_inline long ____##func(struct pt_regs *regs);		\
+static __always_inline __no_sanitize_address __no_kcsan long		\
+____##func(struct pt_regs *regs);					\
 									\
 interrupt_handler long func(struct pt_regs *regs)			\
 {									\
@@ -566,7 +593,8 @@  interrupt_handler long func(struct pt_regs *regs)			\
 }									\
 NOKPROBE_SYMBOL(func);							\
 									\
-static __always_inline long ____##func(struct pt_regs *regs)
+static __always_inline  __no_sanitize_address __no_kcsan long		\
+____##func(struct pt_regs *regs)
 
 
 /* Interrupt handlers */
diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
index 3c478e5ef24c..6efc822e70fd 100644
--- a/arch/powerpc/include/asm/kasan.h
+++ b/arch/powerpc/include/asm/kasan.h
@@ -30,9 +30,31 @@ 
 
 #define KASAN_SHADOW_OFFSET	ASM_CONST(CONFIG_KASAN_SHADOW_OFFSET)
 
+#ifdef CONFIG_PPC32
 #define KASAN_SHADOW_END	(-(-KASAN_SHADOW_START >> KASAN_SHADOW_SCALE_SHIFT))
+#endif
 
 #ifdef CONFIG_KASAN
+#ifdef CONFIG_PPC_BOOK3S_64
+/*
+ * The shadow ends before the highest accessible address
+ * because we don't need a shadow for the shadow. Instead:
+ * c00e000000000000 << 3 + a80e000000000000 = c00fc00000000000
+ */
+#define KASAN_SHADOW_END 0xc00fc00000000000UL
+
+DECLARE_STATIC_KEY_FALSE(powerpc_kasan_enabled_key);
+
+static __always_inline bool kasan_arch_is_ready(void)
+{
+	if (static_branch_likely(&powerpc_kasan_enabled_key))
+		return true;
+	return false;
+}
+
+#define kasan_arch_is_ready kasan_arch_is_ready
+#endif
+
 void kasan_early_init(void);
 void kasan_mmu_init(void);
 void kasan_init(void);
diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h
index 42f89e2d8f04..d4d547b4ed28 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -305,12 +305,6 @@  static inline void regs_set_recoverable(struct pt_regs *regs)
 		regs_set_return_msr(regs, regs->msr | MSR_RI);
 }
 
-static inline void regs_set_unrecoverable(struct pt_regs *regs)
-{
-	if (cpu_has_msr_ri())
-		regs_set_return_msr(regs, regs->msr & ~MSR_RI);
-}
-
 #define arch_has_single_step()	(1)
 #define arch_has_block_step()	(true)
 #define ARCH_HAS_USER_SINGLE_STEP_REPORT
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 4d7829399570..e46335da7f72 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -33,6 +33,17 @@  KASAN_SANITIZE_early_32.o := n
 KASAN_SANITIZE_cputable.o := n
 KASAN_SANITIZE_prom_init.o := n
 KASAN_SANITIZE_btext.o := n
+KASAN_SANITIZE_paca.o := n
+KASAN_SANITIZE_setup_64.o := n
+KASAN_SANITIZE_mce.o := n
+KASAN_SANITIZE_mce_power.o := n
+
+# we have to be particularly careful in ppc64 to exclude code that
+# runs with translations off, as we cannot access the shadow with
+# translations off. However, ppc32 can sanitize this.
+ifdef CONFIG_PPC64
+KASAN_SANITIZE_traps.o := n
+endif
 
 ifdef CONFIG_KASAN
 CFLAGS_early_32.o += -DDISABLE_BRANCH_PROFILING
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 984813a4d5dc..c962f9bfc7d4 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -2220,8 +2220,8 @@  void __no_sanitize_address show_stack(struct task_struct *tsk,
 			break;
 
 		stack = (unsigned long *) sp;
-		newsp = stack[0];
-		ip = stack[STACK_FRAME_LR_SAVE];
+		newsp = READ_ONCE_NOCHECK(stack[0]);
+		ip = READ_ONCE_NOCHECK(stack[STACK_FRAME_LR_SAVE]);
 		if (!firstframe || ip != lr) {
 			printk("%s["REG"] ["REG"] %pS",
 				loglvl, sp, ip, (void *)ip);
@@ -2239,17 +2239,19 @@  void __no_sanitize_address show_stack(struct task_struct *tsk,
 		 * See if this is an exception frame.
 		 * We look for the "regshere" marker in the current frame.
 		 */
-		if (validate_sp(sp, tsk, STACK_FRAME_WITH_PT_REGS)
-		    && stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER) {
+		if (validate_sp(sp, tsk, STACK_FRAME_WITH_PT_REGS) &&
+		    (READ_ONCE_NOCHECK(stack[STACK_FRAME_MARKER]) ==
+		     STACK_FRAME_REGS_MARKER)) {
 			struct pt_regs *regs = (struct pt_regs *)
 				(sp + STACK_FRAME_OVERHEAD);
 
-			lr = regs->link;
+			lr = READ_ONCE_NOCHECK(regs->link);
 			printk("%s--- interrupt: %lx at %pS\n",
-			       loglvl, regs->trap, (void *)regs->nip);
+			       loglvl, READ_ONCE_NOCHECK(regs->trap),
+			       (void *)READ_ONCE_NOCHECK(regs->nip));
 			__show_regs(regs);
 			printk("%s--- interrupt: %lx\n",
-			       loglvl, regs->trap);
+			       loglvl, READ_ONCE_NOCHECK(regs->trap));
 
 			firstframe = 1;
 		}
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index b7fd6a72aa76..5fedadc27952 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -413,32 +413,32 @@  static struct cpumask nmi_ipi_pending_mask;
 static bool nmi_ipi_busy = false;
 static void (*nmi_ipi_function)(struct pt_regs *) = NULL;
 
-static void nmi_ipi_lock_start(unsigned long *flags)
+noinstr static void nmi_ipi_lock_start(unsigned long *flags)
 {
 	raw_local_irq_save(*flags);
 	hard_irq_disable();
-	while (atomic_cmpxchg(&__nmi_ipi_lock, 0, 1) == 1) {
+	while (arch_atomic_cmpxchg(&__nmi_ipi_lock, 0, 1) == 1) {
 		raw_local_irq_restore(*flags);
-		spin_until_cond(atomic_read(&__nmi_ipi_lock) == 0);
+		spin_until_cond(arch_atomic_read(&__nmi_ipi_lock) == 0);
 		raw_local_irq_save(*flags);
 		hard_irq_disable();
 	}
 }
 
-static void nmi_ipi_lock(void)
+noinstr static void nmi_ipi_lock(void)
 {
-	while (atomic_cmpxchg(&__nmi_ipi_lock, 0, 1) == 1)
-		spin_until_cond(atomic_read(&__nmi_ipi_lock) == 0);
+	while (arch_atomic_cmpxchg(&__nmi_ipi_lock, 0, 1) == 1)
+		spin_until_cond(arch_atomic_read(&__nmi_ipi_lock) == 0);
 }
 
-static void nmi_ipi_unlock(void)
+noinstr static void nmi_ipi_unlock(void)
 {
 	smp_mb();
-	WARN_ON(atomic_read(&__nmi_ipi_lock) != 1);
-	atomic_set(&__nmi_ipi_lock, 0);
+	WARN_ON(arch_atomic_read(&__nmi_ipi_lock) != 1);
+	arch_atomic_set(&__nmi_ipi_lock, 0);
 }
 
-static void nmi_ipi_unlock_end(unsigned long *flags)
+noinstr static void nmi_ipi_unlock_end(unsigned long *flags)
 {
 	nmi_ipi_unlock();
 	raw_local_irq_restore(*flags);
@@ -447,7 +447,7 @@  static void nmi_ipi_unlock_end(unsigned long *flags)
 /*
  * Platform NMI handler calls this to ack
  */
-int smp_handle_nmi_ipi(struct pt_regs *regs)
+noinstr int smp_handle_nmi_ipi(struct pt_regs *regs)
 {
 	void (*fn)(struct pt_regs *) = NULL;
 	unsigned long flags;
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index a08bb7cefdc5..3aaa50e5c72f 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -393,7 +393,7 @@  void _exception(int signr, struct pt_regs *regs, int code, unsigned long addr)
  * Builds that do not support KVM could take this second option to increase
  * the recoverability of NMIs.
  */
-void hv_nmi_check_nonrecoverable(struct pt_regs *regs)
+noinstr void hv_nmi_check_nonrecoverable(struct pt_regs *regs)
 {
 #ifdef CONFIG_PPC_POWERNV
 	unsigned long kbase = (unsigned long)_stext;
@@ -433,7 +433,9 @@  void hv_nmi_check_nonrecoverable(struct pt_regs *regs)
 	return;
 
 nonrecoverable:
-	regs_set_unrecoverable(regs);
+	regs->msr &= ~MSR_RI;
+	local_paca->hsrr_valid = 0;
+	local_paca->srr_valid = 0;
 #endif
 }
 DEFINE_INTERRUPT_HANDLER_NMI(system_reset_exception)
diff --git a/arch/powerpc/kexec/Makefile b/arch/powerpc/kexec/Makefile
index b6c52608cb49..0c2abe7f9908 100644
--- a/arch/powerpc/kexec/Makefile
+++ b/arch/powerpc/kexec/Makefile
@@ -13,3 +13,5 @@  obj-$(CONFIG_KEXEC_FILE)	+= file_load.o ranges.o file_load_$(BITS).o elf_$(BITS)
 GCOV_PROFILE_core_$(BITS).o := n
 KCOV_INSTRUMENT_core_$(BITS).o := n
 UBSAN_SANITIZE_core_$(BITS).o := n
+KASAN_SANITIZE_core.o := n
+KASAN_SANITIZE_core_$(BITS) := n
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 9bdfc8b50899..ed0569b2a3d4 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -134,3 +134,8 @@  obj-$(CONFIG_KVM_BOOK3S_64_PR) += kvm-pr.o
 obj-$(CONFIG_KVM_BOOK3S_64_HV) += kvm-hv.o
 
 obj-y += $(kvm-book3s_64-builtin-objs-y)
+
+# KVM does a lot in real-mode, and 64-bit Book3S KASAN doesn't support that
+ifdef CONFIG_PPC_BOOK3S_64
+KASAN_SANITIZE := n
+endif
diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
index 5d1881d2e39a..8560c912186d 100644
--- a/arch/powerpc/lib/Makefile
+++ b/arch/powerpc/lib/Makefile
@@ -13,6 +13,9 @@  CFLAGS_REMOVE_feature-fixups.o = $(CC_FLAGS_FTRACE)
 
 KASAN_SANITIZE_code-patching.o := n
 KASAN_SANITIZE_feature-fixups.o := n
+# restart_table.o contains functions called in the NMI interrupt path
+# which can be in real mode. Disable KASAN.
+KASAN_SANITIZE_restart_table.o := n
 
 ifdef CONFIG_KASAN
 CFLAGS_code-patching.o += -DDISABLE_BRANCH_PROFILING
diff --git a/arch/powerpc/mm/book3s64/Makefile b/arch/powerpc/mm/book3s64/Makefile
index 2d50cac499c5..3150159e5121 100644
--- a/arch/powerpc/mm/book3s64/Makefile
+++ b/arch/powerpc/mm/book3s64/Makefile
@@ -24,3 +24,12 @@  obj-$(CONFIG_PPC_PKEY)	+= pkeys.o
 
 # Instrumenting the SLB fault path can lead to duplicate SLB entries
 KCOV_INSTRUMENT_slb.o := n
+
+# Parts of these can run in real mode and therefore are
+# not safe with the current outline KASAN implementation
+KASAN_SANITIZE_mmu_context.o := n
+KASAN_SANITIZE_pgtable.o := n
+KASAN_SANITIZE_radix_pgtable.o := n
+KASAN_SANITIZE_radix_tlb.o := n
+KASAN_SANITIZE_slb.o := n
+KASAN_SANITIZE_pkeys.o := n
diff --git a/arch/powerpc/mm/kasan/Makefile b/arch/powerpc/mm/kasan/Makefile
index bcbfd6f2eca3..4999aadb1867 100644
--- a/arch/powerpc/mm/kasan/Makefile
+++ b/arch/powerpc/mm/kasan/Makefile
@@ -5,3 +5,4 @@  KASAN_SANITIZE := n
 obj-$(CONFIG_PPC32)		+= init_32.o
 obj-$(CONFIG_PPC_8xx)		+= 8xx.o
 obj-$(CONFIG_PPC_BOOK3S_32)	+= book3s_32.o
+obj-$(CONFIG_PPC_BOOK3S_64)	+= init_book3s_64.o
diff --git a/arch/powerpc/mm/kasan/init_book3s_64.c b/arch/powerpc/mm/kasan/init_book3s_64.c
new file mode 100644
index 000000000000..cb96e16e57a1
--- /dev/null
+++ b/arch/powerpc/mm/kasan/init_book3s_64.c
@@ -0,0 +1,103 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KASAN for 64-bit Book3S powerpc
+ *
+ * Copyright (C) 2019-2020 IBM Corporation
+ * Author: Daniel Axtens <dja@axtens.net>
+ */
+
+/*
+# ppc64 turns on virtual memory late in boot, after calling into generic code
+# like the device-tree parser, so it uses this in conjunction with a hook in
+# outline mode to avoid invalid access early in boot.
+*/
+
+#define DISABLE_BRANCH_PROFILING
+
+#include <linux/kasan.h>
+#include <linux/printk.h>
+#include <linux/sched/task.h>
+#include <linux/memblock.h>
+#include <asm/pgalloc.h>
+
+DEFINE_STATIC_KEY_FALSE(powerpc_kasan_enabled_key);
+
+static void __init kasan_init_phys_region(void *start, void *end)
+{
+	unsigned long k_start, k_end, k_cur;
+	void *va;
+
+	if (start >= end)
+		return;
+
+	k_start = ALIGN_DOWN((unsigned long)kasan_mem_to_shadow(start), PAGE_SIZE);
+	k_end = ALIGN((unsigned long)kasan_mem_to_shadow(end), PAGE_SIZE);
+
+	va = memblock_alloc(k_end - k_start, PAGE_SIZE);
+	for (k_cur = k_start; k_cur < k_end; k_cur += PAGE_SIZE, va += PAGE_SIZE)
+		map_kernel_page(k_cur, __pa(va), PAGE_KERNEL);
+}
+
+void __init kasan_init(void)
+{
+	/*
+	 * We want to do the following things:
+	 *  1) Map real memory into the shadow for all physical memblocks
+	 *     This takes us from c000... to c008...
+	 *  2) Leave a hole over the shadow of vmalloc space. KASAN_VMALLOC
+	 *     will manage this for us.
+	 *     This takes us from c008... to c00a...
+	 *  3) Map the 'early shadow'/zero page over iomap and vmemmap space.
+	 *     This takes us up to where we start at c00e...
+	 */
+
+	void *k_start = kasan_mem_to_shadow((void *)RADIX_VMALLOC_END);
+	void *k_end = kasan_mem_to_shadow((void *)RADIX_VMEMMAP_END);
+	phys_addr_t start, end;
+	u64 i;
+	pte_t zero_pte = pfn_pte(virt_to_pfn(kasan_early_shadow_page), PAGE_KERNEL);
+
+	if (!early_radix_enabled()) {
+		pr_warn("KASAN not enabled as it requires radix!");
+		return;
+	}
+
+	for_each_mem_range(i, &start, &end)
+		kasan_init_phys_region((void *)start, (void *)end);
+
+	for (i = 0; i < PTRS_PER_PTE; i++)
+		__set_pte_at(&init_mm, (unsigned long)kasan_early_shadow_page,
+			     &kasan_early_shadow_pte[i], zero_pte, 0);
+
+	for (i = 0; i < PTRS_PER_PMD; i++)
+		pmd_populate_kernel(&init_mm, &kasan_early_shadow_pmd[i],
+				    kasan_early_shadow_pte);
+
+	for (i = 0; i < PTRS_PER_PUD; i++)
+		pud_populate(&init_mm, &kasan_early_shadow_pud[i],
+			     kasan_early_shadow_pmd);
+
+	/* map the early shadow over the iomap and vmemmap space */
+	kasan_populate_early_shadow(k_start, k_end);
+
+	/* mark early shadow region as RO and wipe it */
+	zero_pte = pfn_pte(virt_to_pfn(kasan_early_shadow_page), PAGE_KERNEL_RO);
+	for (i = 0; i < PTRS_PER_PTE; i++)
+		__set_pte_at(&init_mm, (unsigned long)kasan_early_shadow_page,
+			     &kasan_early_shadow_pte[i], zero_pte, 0);
+
+	/*
+	 * clear_page relies on some cache info that hasn't been set up yet.
+	 * It ends up looping ~forever and blows up other data.
+	 * Use memset instead.
+	 */
+	memset(kasan_early_shadow_page, 0, PAGE_SIZE);
+
+	static_branch_inc(&powerpc_kasan_enabled_key);
+
+	/* Enable error messages */
+	init_task.kasan_depth = 0;
+	pr_info("KASAN init done (64-bit Book3S)\n");
+}
+
+void __init kasan_late_init(void) { }
diff --git a/arch/powerpc/mm/ptdump/ptdump.c b/arch/powerpc/mm/ptdump/ptdump.c
index 8c846982766f..2313053fe679 100644
--- a/arch/powerpc/mm/ptdump/ptdump.c
+++ b/arch/powerpc/mm/ptdump/ptdump.c
@@ -21,6 +21,7 @@ 
 #include <linux/seq_file.h>
 #include <asm/fixmap.h>
 #include <linux/const.h>
+#include <linux/kasan.h>
 #include <asm/page.h>
 #include <asm/hugetlb.h>
 
@@ -289,11 +290,11 @@  static void populate_markers(void)
 #endif
 	address_markers[i++].start_address = FIXADDR_START;
 	address_markers[i++].start_address = FIXADDR_TOP;
+#endif /* CONFIG_PPC64 */
 #ifdef CONFIG_KASAN
 	address_markers[i++].start_address = KASAN_SHADOW_START;
 	address_markers[i++].start_address = KASAN_SHADOW_END;
 #endif
-#endif /* CONFIG_PPC64 */
 }
 
 static int ptdump_show(struct seq_file *m, void *v)
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index 87bc1929ee5a..b6562d39b9c4 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -104,6 +104,7 @@  config PPC_BOOK3S_64
 	select HAVE_MOVE_PUD
 	select IRQ_WORK
 	select PPC_64S_HASH_MMU if !PPC_RADIX_MMU
+	select KASAN_VMALLOC if KASAN
 
 config PPC_BOOK3E_64
 	bool "Embedded processors"
diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index dc7b37c23b60..6488b3842199 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -1,4 +1,12 @@ 
 # SPDX-License-Identifier: GPL-2.0
+
+# nothing that deals with real mode is safe to KASAN
+# in particular, idle code runs a bunch of things in real mode
+KASAN_SANITIZE_idle.o := n
+KASAN_SANITIZE_pci-ioda.o := n
+# pnv_machine_check_early
+KASAN_SANITIZE_setup.o := n
+
 obj-y			+= setup.o opal-call.o opal-wrappers.o opal.o opal-async.o
 obj-y			+= idle.o opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
 obj-y			+= rng.o opal-elog.o opal-dump.o opal-sysparam.o opal-sensor.o
diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c
index cbb67813cd5d..9e1a25398f98 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -345,7 +345,7 @@  static void __init pnv_smp_probe(void)
 	}
 }
 
-static int pnv_system_reset_exception(struct pt_regs *regs)
+noinstr static int pnv_system_reset_exception(struct pt_regs *regs)
 {
 	if (smp_handle_nmi_ipi(regs))
 		return 1;
diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile
index ee60b59024b4..98e878c32a21 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -32,3 +32,9 @@  obj-$(CONFIG_SUSPEND)		+= suspend.o
 obj-$(CONFIG_PPC_VAS)		+= vas.o
 
 obj-$(CONFIG_ARCH_HAS_CC_PLATFORM)	+= cc_platform.o
+
+# nothing that operates in real mode is safe for KASAN
+KASAN_SANITIZE_ras.o := n
+KASAN_SANITIZE_kexec.o := n
+#machine_kexec
+KASAN_SANITIZE_setup.o := n
diff --git a/arch/powerpc/sysdev/xics/xics-common.c b/arch/powerpc/sysdev/xics/xics-common.c
index f3fb2a12124c..322b2b8bd467 100644
--- a/arch/powerpc/sysdev/xics/xics-common.c
+++ b/arch/powerpc/sysdev/xics/xics-common.c
@@ -146,7 +146,7 @@  void __init xics_smp_probe(void)
 
 #endif /* CONFIG_SMP */
 
-void xics_teardown_cpu(void)
+noinstr void xics_teardown_cpu(void)
 {
 	struct xics_cppr *os_cppr = this_cpu_ptr(&xics_cppr);
 
@@ -159,7 +159,7 @@  void xics_teardown_cpu(void)
 	icp_ops->teardown_cpu();
 }
 
-void xics_kexec_teardown_cpu(int secondary)
+noinstr void xics_kexec_teardown_cpu(int secondary)
 {
 	xics_teardown_cpu();
 
diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
index 1ca5564bda9d..87b825b7401d 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -1241,7 +1241,7 @@  static int xive_setup_cpu_ipi(unsigned int cpu)
 	return 0;
 }
 
-static void xive_cleanup_cpu_ipi(unsigned int cpu, struct xive_cpu *xc)
+noinstr static void xive_cleanup_cpu_ipi(unsigned int cpu, struct xive_cpu *xc)
 {
 	unsigned int xive_ipi_irq = xive_ipi_cpu_to_irq(cpu);
 
@@ -1634,7 +1634,7 @@  void xive_flush_interrupt(void)
 
 #endif /* CONFIG_SMP */
 
-void xive_teardown_cpu(void)
+noinstr void xive_teardown_cpu(void)
 {
 	struct xive_cpu *xc = __this_cpu_read(xive_cpu);
 	unsigned int cpu = smp_processor_id();