diff mbox series

[1/2] powerpc/64s: Fix crashes when toggling stf barrier

Message ID 20210504134250.890401-1-mpe@ellerman.id.au (mailing list archive)
State Changes Requested
Headers show
Series [1/2] powerpc/64s: Fix crashes when toggling stf barrier | expand
Related show

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success Successfully applied on branch powerpc/merge (134b5c8a49b594ff6cfb4ea1a92400bb382b46d2)
snowpatch_ozlabs/checkpatch warning total: 0 errors, 1 warnings, 0 checks, 34 lines checked
snowpatch_ozlabs/needsstable success Patch is tagged for stable

Commit Message

Michael Ellerman May 4, 2021, 1:42 p.m. UTC
The STF (store-to-load forwarding) barrier mitigation can be
enabled/disabled at runtime via a debugfs file (stf_barrier), which
causes the kernel to patch itself to enable/disable the relevant
mitigations.

However depending on which mitigation we're using, it may not be safe to
do that patching while other CPUs are active. For example the following
crash:

  User access of kernel address (c00000003fff5af0) - exploit attempt? (uid: 0)
  segfault (11) at c00000003fff5af0 nip 7fff8ad12198 lr 7fff8ad121f8 code 1
  code: 40820128 e93c00d0 e9290058 7c292840 40810058 38600000 4bfd9a81 e8410018
  code: 2c030006 41810154 3860ffb6 e9210098 <e94d8ff0> 7d295279 39400000 40820a3c

Shows that we returned to userspace without restoring the user r13
value, due to executing the partially patched STF exit code.

Fix it by doing the patching under stop machine. The CPUs that aren't
doing the patching will be spinning in the core of the stop machine
logic. That is currently sufficient for our purposes, because none of
the patching we do is to that code or anywhere in the vicinity.

Fixes: a048a07d7f45 ("powerpc/64s: Add support for a store forwarding barrier at kernel entry/exit")
Cc: stable@vger.kernel.org # v4.17+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 arch/powerpc/lib/feature-fixups.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

Comments

Nathan Lynch May 4, 2021, 10:44 p.m. UTC | #1
Michael Ellerman <mpe@ellerman.id.au> writes:
> -void do_stf_barrier_fixups(enum stf_barrier_type types)
> +static int __do_stf_barrier_fixups(void *data)
>  {
> +	enum stf_barrier_type types = (enum stf_barrier_type)data;
> +
>  	do_stf_entry_barrier_fixups(types);
>  	do_stf_exit_barrier_fixups(types);
> +
> +	return 0;
> +}
> +
> +void do_stf_barrier_fixups(enum stf_barrier_type types)
> +{
> +	/*
> +	 * The call to the fallback entry flush, and the fallback/sync-ori exit
> +	 * flush can not be safely patched in/out while other CPUs are executing
> +	 * them. So call __do_stf_barrier_fixups() on one CPU while all other CPUs
> +	 * spin in the stop machine core with interrupts hard disabled.
> +	 */
> +	stop_machine_cpuslocked(__do_stf_barrier_fixups, (void *)types, NULL);

Would it be preferable to avoid the explicit casts:

	stop_machine_cpuslocked(__do_stf_barrier_fixups, &types, NULL);

...

static int __do_stf_barrier_fixups(void *data)
{
	enum stf_barrier_type *types = data;

 	do_stf_entry_barrier_fixups(*types);
 	do_stf_exit_barrier_fixups(*types);

?

post_mobility_fixup() does cpus_read_unlock() before calling
pseries_setup_security_mitigations(), I think that will need to be
changed?
Michael Ellerman May 5, 2021, 2:48 a.m. UTC | #2
Nathan Lynch <nathanl@linux.ibm.com> writes:
> Michael Ellerman <mpe@ellerman.id.au> writes:
>> -void do_stf_barrier_fixups(enum stf_barrier_type types)
>> +static int __do_stf_barrier_fixups(void *data)
>>  {
>> +	enum stf_barrier_type types = (enum stf_barrier_type)data;
>> +
>>  	do_stf_entry_barrier_fixups(types);
>>  	do_stf_exit_barrier_fixups(types);
>> +
>> +	return 0;
>> +}
>> +
>> +void do_stf_barrier_fixups(enum stf_barrier_type types)
>> +{
>> +	/*
>> +	 * The call to the fallback entry flush, and the fallback/sync-ori exit
>> +	 * flush can not be safely patched in/out while other CPUs are executing
>> +	 * them. So call __do_stf_barrier_fixups() on one CPU while all other CPUs
>> +	 * spin in the stop machine core with interrupts hard disabled.
>> +	 */
>> +	stop_machine_cpuslocked(__do_stf_barrier_fixups, (void *)types, NULL);
>
> Would it be preferable to avoid the explicit casts:
>
> 	stop_machine_cpuslocked(__do_stf_barrier_fixups, &types, NULL);
>
> ...
>
> static int __do_stf_barrier_fixups(void *data)
> {
> 	enum stf_barrier_type *types = data;
>
>  	do_stf_entry_barrier_fixups(*types);
>  	do_stf_exit_barrier_fixups(*types);
>
> ?

Yes.

That will also avoid the pesky issue of undefined behaviour :facepalm:

> post_mobility_fixup() does cpus_read_unlock() before calling
> pseries_setup_security_mitigations(), I think that will need to be
> changed?

I don't think so.

I'm using stop_machine_cpuslocked() but that's because I'm a goose and
forgot to switch to stop_machine() after I reworked the code to not take
cpus_read_lock() by hand. I really shouldn't send patches after 11pm.

I don't think it's important to keep the cpus lock held from where we
take it in post_mobility_fixup(). If some CPUs come or go between there
and here that's fine.

I'll send a v2.

cheers
Nathan Lynch May 5, 2021, 2:55 a.m. UTC | #3
Michael Ellerman <mpe@ellerman.id.au> writes:
> Nathan Lynch <nathanl@linux.ibm.com> writes:
>> post_mobility_fixup() does cpus_read_unlock() before calling
>> pseries_setup_security_mitigations(), I think that will need to be
>> changed?
>
> I don't think so.
>
> I'm using stop_machine_cpuslocked() but that's because I'm a goose and
> forgot to switch to stop_machine() after I reworked the code to not take
> cpus_read_lock() by hand. I really shouldn't send patches after 11pm.
>
> I don't think it's important to keep the cpus lock held from where we
> take it in post_mobility_fixup(). If some CPUs come or go between there
> and here that's fine.

Yes, agreed.
diff mbox series

Patch

diff --git a/arch/powerpc/lib/feature-fixups.c b/arch/powerpc/lib/feature-fixups.c
index 1fd31b4b0e13..8f8c8c98a6ac 100644
--- a/arch/powerpc/lib/feature-fixups.c
+++ b/arch/powerpc/lib/feature-fixups.c
@@ -14,6 +14,7 @@ 
 #include <linux/string.h>
 #include <linux/init.h>
 #include <linux/sched/mm.h>
+#include <linux/stop_machine.h>
 #include <asm/cputable.h>
 #include <asm/code-patching.h>
 #include <asm/page.h>
@@ -227,11 +228,25 @@  static void do_stf_exit_barrier_fixups(enum stf_barrier_type types)
 		                                           : "unknown");
 }
 
-
-void do_stf_barrier_fixups(enum stf_barrier_type types)
+static int __do_stf_barrier_fixups(void *data)
 {
+	enum stf_barrier_type types = (enum stf_barrier_type)data;
+
 	do_stf_entry_barrier_fixups(types);
 	do_stf_exit_barrier_fixups(types);
+
+	return 0;
+}
+
+void do_stf_barrier_fixups(enum stf_barrier_type types)
+{
+	/*
+	 * The call to the fallback entry flush, and the fallback/sync-ori exit
+	 * flush can not be safely patched in/out while other CPUs are executing
+	 * them. So call __do_stf_barrier_fixups() on one CPU while all other CPUs
+	 * spin in the stop machine core with interrupts hard disabled.
+	 */
+	stop_machine_cpuslocked(__do_stf_barrier_fixups, (void *)types, NULL);
 }
 
 void do_uaccess_flush_fixups(enum l1d_flush_type types)