diff mbox series

[v3,09/11] um: Rely on PTRACE_SETREGSET to set FS/GS base registers

Message ID 20231110110348.1815612-10-benjamin@sipsolutions.net
State Accepted
Headers show
Series General cleanups and fixes from SECCOMP patchset | expand

Commit Message

Benjamin Berg Nov. 10, 2023, 11:03 a.m. UTC
From: Benjamin Berg <benjamin@sipsolutions.net>

These registers are saved/restored together with the other general
registers using ptrace. In arch_set_tls we then just need to set the
register and it will be synced back normally.

Most of this logic was introduced in commit f355559cf7845 ("[PATCH] uml:
x86_64 thread fixes"). However, at least today we can rely on ptrace to
restore the base registers for us. As such, only the part of the patch
that tracks the FS register for use as thread local storage is actually
needed.

Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net>
---
 arch/um/include/shared/os.h    |  3 --
 arch/x86/um/asm/elf.h          |  4 +--
 arch/x86/um/asm/processor_64.h |  3 --
 arch/x86/um/os-Linux/Makefile  |  1 -
 arch/x86/um/os-Linux/prctl.c   | 12 -------
 arch/x86/um/syscalls_64.c      | 62 +++++++---------------------------
 arch/x86/um/tls_64.c           |  2 +-
 7 files changed, 16 insertions(+), 71 deletions(-)
 delete mode 100644 arch/x86/um/os-Linux/prctl.c

Comments

Richard Weinberger Jan. 4, 2024, 11:05 p.m. UTC | #1
On Fri, Nov 10, 2023 at 12:03 PM <benjamin@sipsolutions.net> wrote:
>
> From: Benjamin Berg <benjamin@sipsolutions.net>
>
> These registers are saved/restored together with the other general
> registers using ptrace. In arch_set_tls we then just need to set the
> register and it will be synced back normally.
>
> Most of this logic was introduced in commit f355559cf7845 ("[PATCH] uml:
> x86_64 thread fixes"). However, at least today we can rely on ptrace to

Do you know since when exactly? I don't want to break UML in subtle ways
on old kernels.
Richard Weinberger Jan. 4, 2024, 11:34 p.m. UTC | #2
On Fri, Jan 5, 2024 at 12:05 AM Richard Weinberger
<richard.weinberger@gmail.com> wrote:
> > Most of this logic was introduced in commit f355559cf7845 ("[PATCH] uml:
> > x86_64 thread fixes"). However, at least today we can rely on ptrace to
>
> Do you know since when exactly? I don't want to break UML in subtle ways
> on old kernels.

BTW: I have applied the series up to here. Thanks a lot for cleaning
up all this. :-)
Benjamin Berg Jan. 5, 2024, 9:54 a.m. UTC | #3
Hi,

On Fri, 2024-01-05 at 00:05 +0100, Richard Weinberger wrote:
> On Fri, Nov 10, 2023 at 12:03 PM <benjamin@sipsolutions.net> wrote:
> > 
> > From: Benjamin Berg <benjamin@sipsolutions.net>
> > 
> > These registers are saved/restored together with the other general
> > registers using ptrace. In arch_set_tls we then just need to set
> > the
> > register and it will be synced back normally.
> > 
> > Most of this logic was introduced in commit f355559cf7845 ("[PATCH]
> > uml:
> > x86_64 thread fixes"). However, at least today we can rely on
> > ptrace to
> 
> Do you know since when exactly? I don't want to break UML in subtle ways
> on old kernels.

To be honest, I don't remember, and I doubt I really understood what I
was doing.

Anyway, I now found this commit now, which is contained in v2.6.25:

commit df5d438e33d7fc914ba9b6e0d6b019a8966c5fcc
Author: Roland McGrath <roland@redhat.com>
Date:   Wed Jan 30 13:30:45 2008 +0100

    x86: ptrace fs/gs_base
    
    The fs_base and gs_base fields are available in user_regs_struct.
    But reading these via ptrace (PTRACE_GETREGS or PTRACE_PEEKUSR) does
    not give a reliably useful value.  The thread_struct fields are 0
    when do_arch_prctl decided to use a GDT slot instead of MSR_FS_BASE,
    which it does for a value under 1<<32.
    
    This changes ptrace access to fs_base and gs_base to work like
    PTRACE_ARCH_PRCTL does.  That is, it reads the base address that
    user-mode memory access using the fs/gs instruction prefixes will
    use, regardless of how it's being implemented in the kernel.  The
    MSR vs GDT is an implementation detail that is pretty much hidden
    from userland in the actual using, and there is no reason that
    ptrace should give the internal implementation picture rather than
    the user-mode semantic picture.  In the case of setting the value,
    this can implicitly change the fsindex/gsindex value (also
    separately in user_regs_struct), which is what happens when the
    thread calls arch_prctl itself.  In a PTRACE_SETREGS, the fs_base
    change will come after the fsindex change due to the order of the
    struct, and so a change the debugger made to fs_base will have the
    effect intended, another part of the user_regs_struct will now
    differ when read back from what the debugger wrote.
    
    This makes PTRACE_ARCH_PRCTL obsolete.  We could consider declaring
    it deprecated and removing it one day, though there is no hurry.
    For the foreseeable future, debuggers have to assume an old kernel
    that does not report reliable fs_base/gs_base values in user_regs_struct
    and stick to PTRACE_ARCH_PRCTL anyway.

I think the last paragraph is quite straight forward in saying that we
do not need anything special anymore. Note that the original commit
adding the code pre-dates this commit by about a year, i.e.:

commit f355559cf78455ed6be103b020e4b800230c64eb
Author: Jeff Dike <jdike@addtoit.com>
Date:   Sat Feb 10 01:44:29 2007 -0800

    [PATCH] uml: x86_64 thread fixes
    
    x86_64 needs some TLS fixes.  What was missing was remembering the child
    thread id during clone and stuffing it into the child during each context
    switch.
    
    The %fs value is stored separately in the thread structure since the host
    controls what effect it has on the actual register file.  The host also needs
    to store it in its own thread struct, so we need the value kept outside the
    register file.
    
    arch_prctl_skas was fixed to call PTRACE_ARCH_PRCTL appropriately.  There is
    some saving and restoring of registers in the ARCH_SET_* cases so that the
    correct set of registers are changed on the host and restored to the process
    when it runs again.

There was some shuffling around in the ptrace code at some point,
eventually switching from doing an arch_prctl call to setting things
more directly. But I don't think that matters to us.

Benjamin
Richard Weinberger Jan. 5, 2024, 1:29 p.m. UTC | #4
----- Ursprüngliche Mail -----
> Von: "Benjamin Berg" <benjamin@sipsolutions.net>
> On Fri, 2024-01-05 at 00:05 +0100, Richard Weinberger wrote:
>> On Fri, Nov 10, 2023 at 12:03 PM <benjamin@sipsolutions.net> wrote:
>> > 
>> > From: Benjamin Berg <benjamin@sipsolutions.net>
>> > 
>> > These registers are saved/restored together with the other general
>> > registers using ptrace. In arch_set_tls we then just need to set
>> > the
>> > register and it will be synced back normally.
>> > 
>> > Most of this logic was introduced in commit f355559cf7845 ("[PATCH]
>> > uml:
>> > x86_64 thread fixes"). However, at least today we can rely on
>> > ptrace to
>> 
>> Do you know since when exactly? I don't want to break UML in subtle ways
>> on old kernels.
> 
> To be honest, I don't remember, and I doubt I really understood what I
> was doing.
> 
> Anyway, I now found this commit now, which is contained in v2.6.25:

Okay, 2.6.26 is way older than anything supported. So I'll apply our patch. :-)

Thanks,
//richard
diff mbox series

Patch

diff --git a/arch/um/include/shared/os.h b/arch/um/include/shared/os.h
index 0df646c6651e..aff8906304ea 100644
--- a/arch/um/include/shared/os.h
+++ b/arch/um/include/shared/os.h
@@ -323,9 +323,6 @@  extern void sigio_broken(int fd);
 extern int __add_sigio_fd(int fd);
 extern int __ignore_sigio_fd(int fd);
 
-/* prctl.c */
-extern int os_arch_prctl(int pid, int option, unsigned long *arg2);
-
 /* tty.c */
 extern int get_pty(void);
 
diff --git a/arch/x86/um/asm/elf.h b/arch/x86/um/asm/elf.h
index 6523eb7c3bd1..6052200fe925 100644
--- a/arch/x86/um/asm/elf.h
+++ b/arch/x86/um/asm/elf.h
@@ -168,8 +168,8 @@  do {								\
 	(pr_reg)[18] = (_regs)->regs.gp[18];			\
 	(pr_reg)[19] = (_regs)->regs.gp[19];			\
 	(pr_reg)[20] = (_regs)->regs.gp[20];			\
-	(pr_reg)[21] = current->thread.arch.fs;			\
-	(pr_reg)[22] = 0;					\
+	(pr_reg)[21] = (_regs)->regs.gp[21];			\
+	(pr_reg)[22] = (_regs)->regs.gp[22];			\
 	(pr_reg)[23] = 0;					\
 	(pr_reg)[24] = 0;					\
 	(pr_reg)[25] = 0;					\
diff --git a/arch/x86/um/asm/processor_64.h b/arch/x86/um/asm/processor_64.h
index 1ef9c21877bc..f90159508936 100644
--- a/arch/x86/um/asm/processor_64.h
+++ b/arch/x86/um/asm/processor_64.h
@@ -10,13 +10,11 @@ 
 struct arch_thread {
         unsigned long debugregs[8];
         int debugregs_seq;
-        unsigned long fs;
         struct faultinfo faultinfo;
 };
 
 #define INIT_ARCH_THREAD { .debugregs  		= { [ 0 ... 7 ] = 0 }, \
 			   .debugregs_seq	= 0, \
-			   .fs			= 0, \
 			   .faultinfo		= { 0, 0, 0 } }
 
 #define STACKSLOTS_PER_LINE 4
@@ -28,7 +26,6 @@  static inline void arch_flush_thread(struct arch_thread *thread)
 static inline void arch_copy_thread(struct arch_thread *from,
                                     struct arch_thread *to)
 {
-	to->fs = from->fs;
 }
 
 #define current_sp() ({ void *sp; __asm__("movq %%rsp, %0" : "=r" (sp) : ); sp; })
diff --git a/arch/x86/um/os-Linux/Makefile b/arch/x86/um/os-Linux/Makefile
index ae169125d03f..5249bbc30dcd 100644
--- a/arch/x86/um/os-Linux/Makefile
+++ b/arch/x86/um/os-Linux/Makefile
@@ -6,7 +6,6 @@ 
 obj-y = registers.o task_size.o mcontext.o
 
 obj-$(CONFIG_X86_32) += tls.o
-obj-$(CONFIG_64BIT) += prctl.o
 
 USER_OBJS := $(obj-y)
 
diff --git a/arch/x86/um/os-Linux/prctl.c b/arch/x86/um/os-Linux/prctl.c
deleted file mode 100644
index 8431e87ac333..000000000000
--- a/arch/x86/um/os-Linux/prctl.c
+++ /dev/null
@@ -1,12 +0,0 @@ 
-/*
- * Copyright (C) 2007 Jeff Dike (jdike@{addtoit.com,linux.intel.com})
- * Licensed under the GPL
- */
-
-#include <sys/ptrace.h>
-#include <asm/ptrace.h>
-
-int os_arch_prctl(int pid, int option, unsigned long *arg2)
-{
-	return ptrace(PTRACE_ARCH_PRCTL, pid, (unsigned long) arg2, option);
-}
diff --git a/arch/x86/um/syscalls_64.c b/arch/x86/um/syscalls_64.c
index 27b29ae6c471..6a00a28c9cca 100644
--- a/arch/x86/um/syscalls_64.c
+++ b/arch/x86/um/syscalls_64.c
@@ -16,60 +16,24 @@ 
 long arch_prctl(struct task_struct *task, int option,
 		unsigned long __user *arg2)
 {
-	unsigned long *ptr = arg2, tmp;
-	long ret;
-	int pid = task->mm->context.id.u.pid;
-
-	/*
-	 * With ARCH_SET_FS (and ARCH_SET_GS is treated similarly to
-	 * be safe), we need to call arch_prctl on the host because
-	 * setting %fs may result in something else happening (like a
-	 * GDT or thread.fs being set instead).  So, we let the host
-	 * fiddle the registers and thread struct and restore the
-	 * registers afterwards.
-	 *
-	 * So, the saved registers are stored to the process (this
-	 * needed because a stub may have been the last thing to run),
-	 * arch_prctl is run on the host, then the registers are read
-	 * back.
-	 */
-	switch (option) {
-	case ARCH_SET_FS:
-	case ARCH_SET_GS:
-		ret = restore_pid_registers(pid, &current->thread.regs.regs);
-		if (ret)
-			return ret;
-		break;
-	case ARCH_GET_FS:
-	case ARCH_GET_GS:
-		/*
-		 * With these two, we read to a local pointer and
-		 * put_user it to the userspace pointer that we were
-		 * given.  If addr isn't valid (because it hasn't been
-		 * faulted in or is just bogus), we want put_user to
-		 * fault it in (or return -EFAULT) instead of having
-		 * the host return -EFAULT.
-		 */
-		ptr = &tmp;
-	}
-
-	ret = os_arch_prctl(pid, option, ptr);
-	if (ret)
-		return ret;
+	long ret = -EINVAL;
 
 	switch (option) {
 	case ARCH_SET_FS:
-		current->thread.arch.fs = (unsigned long) ptr;
-		ret = save_registers(pid, &current->thread.regs.regs);
+		current->thread.regs.regs.gp[FS_BASE / sizeof(unsigned long)] =
+			(unsigned long) arg2;
+		ret = 0;
 		break;
 	case ARCH_SET_GS:
-		ret = save_registers(pid, &current->thread.regs.regs);
+		current->thread.regs.regs.gp[GS_BASE / sizeof(unsigned long)] =
+			(unsigned long) arg2;
+		ret = 0;
 		break;
 	case ARCH_GET_FS:
-		ret = put_user(tmp, arg2);
+		ret = put_user(current->thread.regs.regs.gp[FS_BASE / sizeof(unsigned long)], arg2);
 		break;
 	case ARCH_GET_GS:
-		ret = put_user(tmp, arg2);
+		ret = put_user(current->thread.regs.regs.gp[GS_BASE / sizeof(unsigned long)], arg2);
 		break;
 	}
 
@@ -83,10 +47,10 @@  SYSCALL_DEFINE2(arch_prctl, int, option, unsigned long, arg2)
 
 void arch_switch_to(struct task_struct *to)
 {
-	if ((to->thread.arch.fs == 0) || (to->mm == NULL))
-		return;
-
-	arch_prctl(to, ARCH_SET_FS, (void __user *) to->thread.arch.fs);
+	/*
+	 * Nothing needs to be done on x86_64.
+	 * The FS_BASE/GS_BASE registers are saved in the ptrace register set.
+	 */
 }
 
 SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
diff --git a/arch/x86/um/tls_64.c b/arch/x86/um/tls_64.c
index ebd3855d9b13..c51a613f6f5c 100644
--- a/arch/x86/um/tls_64.c
+++ b/arch/x86/um/tls_64.c
@@ -12,7 +12,7 @@  int arch_set_tls(struct task_struct *t, unsigned long tls)
 	 * If CLONE_SETTLS is set, we need to save the thread id
 	 * so it can be set during context switches.
 	 */
-	t->thread.arch.fs = tls;
+	t->thread.regs.regs.gp[FS_BASE / sizeof(unsigned long)] = tls;
 
 	return 0;
 }