diff mbox

[net-next] bpf: add support for %s specifier to bpf_trace_printk()

Message ID 1440656819-25622-1-git-send-email-ast@plumgrid.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Alexei Starovoitov Aug. 27, 2015, 6:26 a.m. UTC
%s specifier makes bpf program and kernel debugging easier.
To make sure that trace_printk won't crash the unsafe string
is copied into stack and unsafe pointer is substituted.
String is also sanitized for printable characters.
The cost of swapping FS in probe_kernel_read is
amortized over 4 chars to improve performance.

Suggested-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
The following C program:
 #include <linux/fs.h>
int foo(struct pt_regs *ctx, struct filename *filename)
{
  void *name = 0;

  bpf_probe_read(&name, sizeof(name), &filename->name);
  bpf_trace_printk("executed %s\\n", name);
  return 0;
}

when attached to kprobe do_execve()
will produce output in /sys/kernel/debug/tracing/trace_pipe :
    make-13492 [002] d..1  3250.997277: : executed /bin/sh
      sh-13493 [004] d..1  3250.998716: : executed /usr/bin/gcc
     gcc-13494 [002] d..1  3250.999822: : executed /usr/lib/gcc/x86_64-linux-gnu/4.7/cc1
     gcc-13495 [002] d..1  3251.006731: : executed /usr/bin/as
     gcc-13496 [002] d..1  3251.011831: : executed /usr/lib/gcc/x86_64-linux-gnu/4.7/collect2
collect2-13497 [000] d..1  3251.012941: : executed /usr/bin/ld

 kernel/trace/bpf_trace.c |   60 +++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 57 insertions(+), 3 deletions(-)

Comments

David Miller Aug. 27, 2015, 10:34 p.m. UTC | #1
From: Alexei Starovoitov <ast@plumgrid.com>
Date: Wed, 26 Aug 2015 23:26:59 -0700

> +/* similar to strncpy_from_user() but with extra checks */
> +static void probe_read_string(char *buf, int size, long unsafe_ptr)
> +{
> +	char dst[4];
> +	int i = 0;
> +
> +	size--;
> +	for (;;) {
> +		if (probe_kernel_read(dst, (void *) unsafe_ptr, 4))
> +			break;

I don't think this does the right thing when the string is not a multiple
of 3 and ends at the last byte of a page that ends a valid region of
kernel memory.

Seeing this kind of error makes me skeptical to the overall value of
optimizing this :-/
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov Aug. 27, 2015, 11:06 p.m. UTC | #2
On 8/27/15 3:34 PM, David Miller wrote:
> From: Alexei Starovoitov <ast@plumgrid.com>
> Date: Wed, 26 Aug 2015 23:26:59 -0700
>
>> +/* similar to strncpy_from_user() but with extra checks */
>> +static void probe_read_string(char *buf, int size, long unsafe_ptr)
>> +{
>> +	char dst[4];
>> +	int i = 0;
>> +
>> +	size--;
>> +	for (;;) {
>> +		if (probe_kernel_read(dst, (void *) unsafe_ptr, 4))
>> +			break;
>
> I don't think this does the right thing when the string is not a multiple
> of 3 and ends at the last byte of a page that ends a valid region of
> kernel memory.
>
> Seeing this kind of error makes me skeptical to the overall value of
> optimizing this :-/

I've considered the case when first two bytes are valid, but the
other two are in a different page. In such case the probe_read_string()
will trim the string and won't be printing these two valid bytes.
I think that's very rare, so I'm picking higher performance for common
case. The strings over > 64 bytes also will be trimmed to 64.
It's a debugging facility, so I felt that's ok.
Fair or you still think it should be per byte copy?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Aug. 27, 2015, 11:20 p.m. UTC | #3
From: Alexei Starovoitov <ast@plumgrid.com>
Date: Thu, 27 Aug 2015 16:06:14 -0700

> Fair or you still think it should be per byte copy?

I'm terribly surprised we don't have an equivalent of strncpy()
for unsafe kernel pointers.

You probably won't be the last person to want this, and it's silly
to optimize it in one place and then wait for cut&paste into the
next guy.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steven Rostedt Aug. 27, 2015, 11:43 p.m. UTC | #4
On Thu, 27 Aug 2015 16:20:39 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: Alexei Starovoitov <ast@plumgrid.com>
> Date: Thu, 27 Aug 2015 16:06:14 -0700
> 
> > Fair or you still think it should be per byte copy?
> 
> I'm terribly surprised we don't have an equivalent of strncpy()
> for unsafe kernel pointers.
> 
> You probably won't be the last person to want this, and it's silly
> to optimize it in one place and then wait for cut&paste into the
> next guy.

If it doesn't exist. Perhaps its time to create it.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov Aug. 28, 2015, 1:58 a.m. UTC | #5
On 8/27/15 4:43 PM, Steven Rostedt wrote:
> On Thu, 27 Aug 2015 16:20:39 -0700 (PDT)
> David Miller <davem@davemloft.net> wrote:
>
>> From: Alexei Starovoitov <ast@plumgrid.com>
>> Date: Thu, 27 Aug 2015 16:06:14 -0700
>>
>>> Fair or you still think it should be per byte copy?
>>
>> I'm terribly surprised we don't have an equivalent of strncpy()
>> for unsafe kernel pointers.
>>
>> You probably won't be the last person to want this, and it's silly
>> to optimize it in one place and then wait for cut&paste into the
>> next guy.
>
> If it doesn't exist. Perhaps its time to create it.

all makes sense. Working on generalizing FETCH_FUNC_NAME
from trace_kprobe.c. Seems to fit quite well.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index ef9936df1b04..60d8f95258ed 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -79,13 +79,51 @@  static const struct bpf_func_proto bpf_probe_read_proto = {
 	.arg3_type	= ARG_ANYTHING,
 };
 
+static bool is_valid_char(char c)
+{
+	return (isprint(c) || isspace(c)) && isascii(c);
+}
+
+/* similar to strncpy_from_user() but with extra checks */
+static void probe_read_string(char *buf, int size, long unsafe_ptr)
+{
+	char dst[4];
+	int i = 0;
+
+	size--;
+	for (;;) {
+		if (probe_kernel_read(dst, (void *) unsafe_ptr, 4))
+			break;
+
+		unsafe_ptr += 4;
+
+		if (dst[0] == 0 || !is_valid_char(dst[0]) || i >= size)
+			break;
+		buf[i++] = dst[0];
+
+		if (dst[1] == 0 || !is_valid_char(dst[1]) || i >= size)
+			break;
+		buf[i++] = dst[1];
+
+		if (dst[2] == 0 || !is_valid_char(dst[2]) || i >= size)
+			break;
+		buf[i++] = dst[2];
+
+		if (dst[3] == 0 || !is_valid_char(dst[3]) || i >= size)
+			break;
+		buf[i++] = dst[3];
+	}
+	buf[i] = 0;
+}
+
 /*
  * limited trace_printk()
- * only %d %u %x %ld %lu %lx %lld %llu %llx %p conversion specifiers allowed
+ * only %d %u %x %ld %lu %lx %lld %llu %llx %p %s conversion specifiers allowed
  */
 static u64 bpf_trace_printk(u64 r1, u64 fmt_size, u64 r3, u64 r4, u64 r5)
 {
 	char *fmt = (char *) (long) r1;
+	char buf[64];
 	int mod[3] = {};
 	int fmt_cnt = 0;
 	int i;
@@ -100,7 +138,7 @@  static u64 bpf_trace_printk(u64 r1, u64 fmt_size, u64 r3, u64 r4, u64 r5)
 
 	/* check format string for allowed specifiers */
 	for (i = 0; i < fmt_size; i++) {
-		if ((!isprint(fmt[i]) && !isspace(fmt[i])) || !isascii(fmt[i]))
+		if (!is_valid_char(fmt[i]))
 			return -EINVAL;
 
 		if (fmt[i] != '%')
@@ -114,12 +152,28 @@  static u64 bpf_trace_printk(u64 r1, u64 fmt_size, u64 r3, u64 r4, u64 r5)
 		if (fmt[i] == 'l') {
 			mod[fmt_cnt]++;
 			i++;
-		} else if (fmt[i] == 'p') {
+		} else if (fmt[i] == 'p' || fmt[i] == 's') {
 			mod[fmt_cnt]++;
 			i++;
 			if (!isspace(fmt[i]) && !ispunct(fmt[i]) && fmt[i] != 0)
 				return -EINVAL;
 			fmt_cnt++;
+			if (fmt[i - 1] == 's') {
+				switch (fmt_cnt) {
+				case 1:
+					probe_read_string(buf, sizeof(buf), r3);
+					r3 = (long) buf;
+					break;
+				case 2:
+					probe_read_string(buf, sizeof(buf), r4);
+					r4 = (long) buf;
+					break;
+				case 3:
+					probe_read_string(buf, sizeof(buf), r5);
+					r5 = (long) buf;
+					break;
+				}
+			}
 			continue;
 		}