Patchwork replace incorrect memcpy_user_stub code

login
register
mail settings
Submitter Chris Torek
Date Dec. 19, 2008, 2:08 a.m.
Message ID <200812190208.mBJ28vn13713@elf.torek.net>
Download mbox | patch
Permalink /patch/14775/
State Changes Requested
Delegated to: David Miller
Headers show

Comments

Chris Torek - Dec. 19, 2008, 2:08 a.m.
I'll send this formatted against the old kernel, not sparc-next
(did get the latter all cloned up, but have not yet started working
with it).

This came up in testing kgdb, using the built-in tests -- turn
on CONFIG_KGDB_TESTS, then

    echo V1 > /sys/module/kgdbts/parameters/kgdbts

-- but it would affect using kgdb if you were debugging and looking
at bad pointers.

You will probably want to tweak it anyway (especially the comment
in the .S file: I thought use of KERNEL_DS was more rare than it
is, though it probably *should* be more rare than it is...).

Something else I noticed in passing: the EX and EX_LD/EX_ST macros
scattered throughout the various .S files make a fair bit of .fixup
code, all of which does the same thing.  At the cost of one symbol
in copy_in_user.S, you could just have one common two-instruction
retl-and-mov-1 fixup that they all share.

Note: I still get occasional

    CPU[2]: Returns from cpu_idle!

(the cpu number changes) doing the internal kgdb tests, so something
else is still not quite right.

Chris

From 59ec5fee8a6db009086e186468961586aec1c97c Mon Sep 17 00:00:00 2001
From: Chris Torek <chris.torek@windriver.com>
Date: Thu, 18 Dec 2008 18:07:45 -0700
Subject: [PATCH 1/1] replace incorrect memcpy_user_stub code

The memcpy_user_stub function is used when a copy to/from user call
is made with the "user" space revectored to point to kernel space.
This makes it almost the same as memcpy, but different in two ways,
including the return value: copy from/to user returns zero on success,
but nonzero on failure, including failures due to bad addresses.
The kernel probe-read / probe-write routines do this to access
addresses supplied by kgdb, so that bad pointers can be caught.
The existing stub simply called memcpy, then returned 0; bad pointers
resulted in a kernel panic.  We now use the same exception catching
mechanism as for regular user-space copies.

It might seem ideal to make a duplicate of memcpy but with exception
catching added, but there are numerous optimized memcpy variants, with
boot-time patching to use the best one; we would have to duplicate
all of them, and add more boot-time patching.  This routine is used
rarely, and mostly for debugging and compatibility code, so I just
put in a small, simple, and obviously correct version.

Signed-off-by: Chris Torek <chris.torek@windriver.com>
---
 arch/sparc64/lib/copy_in_user.S |   23 ++++++++++++++++-------
 1 files changed, 16 insertions(+), 7 deletions(-)
David Miller - Dec. 19, 2008, 2:59 a.m.
From: Chris Torek <chris.torek@windriver.com>
Date: Thu, 18 Dec 2008 19:08:57 -0700

> I'll send this formatted against the old kernel, not sparc-next
> (did get the latter all cloned up, but have not yet started working
> with it).
> 
> This came up in testing kgdb, using the built-in tests -- turn
> on CONFIG_KGDB_TESTS, then
> 
>     echo V1 > /sys/module/kgdbts/parameters/kgdbts
> 
> -- but it would affect using kgdb if you were debugging and looking
> at bad pointers.
> 
> You will probably want to tweak it anyway (especially the comment
> in the .S file: I thought use of KERNEL_DS was more rare than it
> is, though it probably *should* be more rare than it is...).

This is going to perform really bad, and it's common for
syscalls invoked by the kernel to trigger this case.

I'll have to think about this, thanks for pointing out the
issue.
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Dec. 19, 2008, 3:15 a.m.
From: Chris Torek <chris.torek@windriver.com>
Date: Thu, 18 Dec 2008 20:11:26 -0700

> >This is going to perform really bad ...
> 
> Yes, I went for space on the theory that it was not common...
> 
> >and it's common for
> >syscalls invoked by the kernel to trigger this case.
> 
> ... but so much for that theory. :-)

At least it used to be the case, it may not matter today and
I'll do some investigation.  If it doesn't matter, your patch
is fine and I'll apply it directly.

> >I'll have to think about this, thanks for pointing out the
> >issue.
> 
> The other obvious thing is to have a bunch of userlike memcpy
> variants, one for each *memcpy.S file.  The macros you have
> should make this straightforward.  I was mostly worried about
> breaking stuff.

I'm trending away from that because it's unnecessary bloat.

I wish memcpy() returned void, that would make so many things like
this easy to handle.  I could just add the annotations to the real
memcpy, always return the same return values as the user copies do,
and the memcpy() callers would just ignore the return value.

But things are what they are :-)
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Dec. 26, 2008, 10:57 p.m.
From: David Miller <davem@davemloft.net>
Date: Thu, 18 Dec 2008 19:15:16 -0800 (PST)

> From: Chris Torek <chris.torek@windriver.com>
> Date: Thu, 18 Dec 2008 20:11:26 -0700
> 
> > >This is going to perform really bad ...
> > 
> > Yes, I went for space on the theory that it was not common...
> > 
> > >and it's common for
> > >syscalls invoked by the kernel to trigger this case.
> > 
> > ... but so much for that theory. :-)
> 
> At least it used to be the case, it may not matter today and
> I'll do some investigation.  If it doesn't matter, your patch
> is fine and I'll apply it directly.

Ok, I finally had a chance to look things over and it turns out
the performance of this case does in fact matter.

A few examples that might copy lots of big chunks are:

1) ELF core dumping

2) NFSD file read and write

3) kernel_sendmsg() and kernel_recvmsg() which are used by things
   like NFS client sunrpc, and other network filesystem implementations

(basically, grep the tree for "set_fs(KERNEL_DS)")

So I started thinking about alternative ways to fix this.

There might be some way we can annotate the regular memcpy()
implementations with exception handling markers, like we do
for the user copy cases, such that they only trigger the
alternative return value handling when a special thread local
flag has been set.

Or something like that.

Actually, we could do something really simple since we have
control over the encoding of the exception table entries _and_
we have a simple way to determine if we are doing a potentially
exception causing kernel memcpy().

First, if the %o7 or %i7 are inside of the function memcpy_user_stub,
then we are doing one of these in-kernel copies that are allowed to
fault.  This elides the need for a special thread flag or anything
like that.

Second, we add the annotations to the regular memcpy() routines,
but we set the "fixup" field of the exception table entries to
have the low bit set.  This works because all instructions are
4 byte aligned on sparc.

Then the code in do_kernel_fault() can, if it sees a "fixup" value
with the low bit set, check %o7/%i7 for being inside range of
memcpy_user_stub.  If so, we jump to the fixup.  Otherwise we
behave as if we did not find a matching exception table entry.

I'll see if I can whip up an implementation of this.  Thanks for
your patience Chris.


--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Feb. 9, 2009, 6:46 a.m.
From: Chris Torek <chris.torek@windriver.com>
Date: Mon, 29 Dec 2008 14:55:40 -0700

> >There might be some way we can annotate the regular memcpy()
> >implementations with exception handling markers, like we do
> >for the user copy cases, such that they only trigger the
> >alternative return value handling when a special thread local
> >flag has been set.
> 
> I thought about this, but it's ugly and creates (at least potential)
> problems with recursion / interrupts.  (Not sure if these are possible
> in the Linux kernel organization, just "icky on general principles".)

Yes, it's not very clean.

> >First, if the %o7 or %i7 are inside of the function memcpy_user_stub,
> >then we are doing one of these in-kernel copies that are allowed to
> >fault.  This elides the need for a special thread flag or anything
> >like that.
> 
> This, on the other hand, seems quite promising: as with the exception
> table itself, we use the program counter (of the return address,
> in this case) as the flag.
> 
> >Then the code in do_kernel_fault() can, if it sees a "fixup" value
> >with the low bit set, check %o7/%i7 for being inside range of
> >memcpy_user_stub.  If so, we jump to the fixup.  Otherwise we
> >behave as if we did not find a matching exception table entry.
> 
> This should work really well.

In the end I decided that this other approach is very ad-hoc
as well.

I ended up settling on something similar to your approach,
and I also implemented your .fixup space optimizations.

Patch series posting forthcoming...
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/arch/sparc64/lib/copy_in_user.S b/arch/sparc64/lib/copy_in_user.S
index 650af3f..ba91c3e 100644
--- a/arch/sparc64/lib/copy_in_user.S
+++ b/arch/sparc64/lib/copy_in_user.S
@@ -105,15 +105,24 @@  ___copy_in_user:	/* %o0=dst, %o1=src, %o2=len */
 	/* Act like copy_{to,in}_user(), ie. return zero instead
 	 * of original destination pointer.  This is invoked when
 	 * copy_{to,in}_user() finds that %asi is kernel space.
+	 *
+	 * We can't quite use memcpy directly since we are supposed
+	 * to return nonzero on faults.  Could optimize these, but
+	 * this is used extremely rarely (probe_kernel_read/write).
 	 */
 	.globl		memcpy_user_stub
 	.type		memcpy_user_stub,#function
 memcpy_user_stub:
-	save		%sp, -192, %sp
-	mov		%i0, %o0
-	mov		%i1, %o1
-	call		memcpy
-	 mov		%i2, %o2
-	ret
-	 restore	%g0, %g0, %o0
+	brz,pn		%o2, 2f
+	 clr		%o3
+1:
+	EX(ldub [%o1 + %o3], %g1)
+	EX(stb %g1, [%o0 + %o3])
+	deccc		%o2
+	bgu,pt		%XCC, 1b
+	 inc		%o3
+2:
+	retl
+	 clr		%o0
+
 	.size		memcpy_user_stub, .-memcpy_user_stub