diff mbox

[net-next] bpf, seccomp: prepare for upcoming criu support

Message ID 65c43ff26bd4f37c268d2aad7b8d368edfd42200.1443789826.git.daniel@iogearbox.net
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Daniel Borkmann Oct. 2, 2015, 1:17 p.m. UTC
The current ongoing effort to dump existing cBPF seccomp filters back
to user space requires to hold the pre-transformed instructions like
we do in case of socket filters from sk_attach_filter() side, so they
can be reloaded in original form at a later point in time by utilities
such as criu.

To prepare for this, simply extend the bpf_prog_create_from_user()
API to hold a flag that tells whether we should store the original
or not. Also, fanout filters could make use of that in future for
things like diag. While fanout filters already use bpf_prog_destroy(),
move seccomp over to them as well to handle original programs when
present.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Tycho Andersen <tycho.andersen@canonical.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
---
 This is in realtion to Tycho's latest patch set under [1]. The BPF
 handling is unfortunately not correct (triggering a crash on kernels
 that can set pages as ro).

 This patch here provides a minimal, simple interface from BPF API side
 as a possible step forward, so that the focus can then be on seccomp
 side wrt criu. F.e., dumping could happen similarly as in Pavel's
 sk_get_filter().

 I have tested/based this against net-next, but have no issues whether
 Kees wants to take it, or whether it should go through both trees to
 reduce merge issues as once the case with 0fc174dea545 ("ebpf: make
 internal bpf API independent of CONFIG_BPF_SYSCALL ifdefs").

 Thanks!

 [1] http://thread.gmane.org/gmane.linux.network/380300

 include/linux/filter.h |  2 +-
 kernel/seccomp.c       |  4 ++--
 net/core/filter.c      | 16 +++++++++++-----
 net/packet/af_packet.c |  2 +-
 4 files changed, 15 insertions(+), 9 deletions(-)

Comments

Tycho Andersen Oct. 2, 2015, 3:06 p.m. UTC | #1
Hi Daniel,

On Fri, Oct 02, 2015 at 03:17:33PM +0200, Daniel Borkmann wrote:
> The current ongoing effort to dump existing cBPF seccomp filters back
> to user space requires to hold the pre-transformed instructions like
> we do in case of socket filters from sk_attach_filter() side, so they
> can be reloaded in original form at a later point in time by utilities
> such as criu.
> 
> To prepare for this, simply extend the bpf_prog_create_from_user()
> API to hold a flag that tells whether we should store the original
> or not. Also, fanout filters could make use of that in future for
> things like diag. While fanout filters already use bpf_prog_destroy(),
> move seccomp over to them as well to handle original programs when
> present.
> 
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Tycho Andersen <tycho.andersen@canonical.com>

Thanks for this patch. When I rebase my tree on top of it it works
fine,

Tested-by: Tycho Andersen <tycho.andersen@canonical.com>

> Cc: Pavel Emelyanov <xemul@parallels.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Alexei Starovoitov <ast@plumgrid.com>
> ---
>  This is in realtion to Tycho's latest patch set under [1]. The BPF
>  handling is unfortunately not correct (triggering a crash on kernels
>  that can set pages as ro).
> 
>  This patch here provides a minimal, simple interface from BPF API side
>  as a possible step forward, so that the focus can then be on seccomp
>  side wrt criu. F.e., dumping could happen similarly as in Pavel's
>  sk_get_filter().
> 
>  I have tested/based this against net-next, but have no issues whether
>  Kees wants to take it, or whether it should go through both trees to
>  reduce merge issues as once the case with 0fc174dea545 ("ebpf: make
>  internal bpf API independent of CONFIG_BPF_SYSCALL ifdefs").

I'll send out a revised version of my set with Andy's comments later
today and not include this patch. Let me know if I should do something
differently.

Thanks,

Tycho
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov Oct. 2, 2015, 3:09 p.m. UTC | #2
On 10/2/15 6:17 AM, Daniel Borkmann wrote:
> The current ongoing effort to dump existing cBPF seccomp filters back
> to user space requires to hold the pre-transformed instructions like
> we do in case of socket filters from sk_attach_filter() side, so they
> can be reloaded in original form at a later point in time by utilities
> such as criu.
>
> To prepare for this, simply extend the bpf_prog_create_from_user()
> API to hold a flag that tells whether we should store the original
> or not. Also, fanout filters could make use of that in future for
> things like diag. While fanout filters already use bpf_prog_destroy(),
> move seccomp over to them as well to handle original programs when
> present.
>
> Signed-off-by: Daniel Borkmann<daniel@iogearbox.net>

I agree that adding flag to bpf_prog_create_from_user() is cleaner
than exposing static bpf_prog_store_orig_filter(), so
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Borkmann Oct. 2, 2015, 3:17 p.m. UTC | #3
On 10/02/2015 05:06 PM, Tycho Andersen wrote:
...
>> Cc: Pavel Emelyanov <xemul@parallels.com>
>> Cc: Kees Cook <keescook@chromium.org>
>> Cc: Andy Lutomirski <luto@amacapital.net>
>> Cc: Alexei Starovoitov <ast@plumgrid.com>
>> ---
>>   This is in realtion to Tycho's latest patch set under [1]. The BPF
>>   handling is unfortunately not correct (triggering a crash on kernels
>>   that can set pages as ro).
>>
>>   This patch here provides a minimal, simple interface from BPF API side
>>   as a possible step forward, so that the focus can then be on seccomp
>>   side wrt criu. F.e., dumping could happen similarly as in Pavel's
>>   sk_get_filter().
>>
>>   I have tested/based this against net-next, but have no issues whether
>>   Kees wants to take it, or whether it should go through both trees to
>>   reduce merge issues as once the case with 0fc174dea545 ("ebpf: make
>>   internal bpf API independent of CONFIG_BPF_SYSCALL ifdefs").
>
> I'll send out a revised version of my set with Andy's comments later
> today and not include this patch. Let me know if I should do something
> differently.

Should be okay.

Please make sure to describe in your cover letter that your series builds
on top of ...

   http://patchwork.ozlabs.org/patch/525492/

... so that whoever takes the set eventually is aware of this.

Thanks again,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Borkmann Oct. 2, 2015, 3:21 p.m. UTC | #4
On 10/02/2015 05:09 PM, Alexei Starovoitov wrote:
...
> I agree that adding flag to bpf_prog_create_from_user() is cleaner
> than exposing static bpf_prog_store_orig_filter(), so

There's also another reason as mentioned, i.e. that the progs
are ro-locked, so doing bpf_prog_store_orig_filter() after getting
the prog back inside seccomp won't work as-is.

> Acked-by: Alexei Starovoitov <ast@plumgrid.com>

Thanks,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Oct. 5, 2015, 1:53 p.m. UTC | #5
From: Daniel Borkmann <daniel@iogearbox.net>
Date: Fri,  2 Oct 2015 15:17:33 +0200

> The current ongoing effort to dump existing cBPF seccomp filters back
> to user space requires to hold the pre-transformed instructions like
> we do in case of socket filters from sk_attach_filter() side, so they
> can be reloaded in original form at a later point in time by utilities
> such as criu.
> 
> To prepare for this, simply extend the bpf_prog_create_from_user()
> API to hold a flag that tells whether we should store the original
> or not. Also, fanout filters could make use of that in future for
> things like diag. While fanout filters already use bpf_prog_destroy(),
> move seccomp over to them as well to handle original programs when
> present.
> 
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Applied, thanks Daniel.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/filter.h b/include/linux/filter.h
index fa2cab9..1f0f7bf 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -408,7 +408,7 @@  typedef int (*bpf_aux_classic_check_t)(struct sock_filter *filter,
 
 int bpf_prog_create(struct bpf_prog **pfp, struct sock_fprog_kern *fprog);
 int bpf_prog_create_from_user(struct bpf_prog **pfp, struct sock_fprog *fprog,
-			      bpf_aux_classic_check_t trans);
+			      bpf_aux_classic_check_t trans, bool save_orig);
 void bpf_prog_destroy(struct bpf_prog *fp);
 
 int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk);
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 5bd4779..06858a7 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -370,7 +370,7 @@  static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
 		return ERR_PTR(-ENOMEM);
 
 	ret = bpf_prog_create_from_user(&sfilter->prog, fprog,
-					seccomp_check_filter);
+					seccomp_check_filter, false);
 	if (ret < 0) {
 		kfree(sfilter);
 		return ERR_PTR(ret);
@@ -469,7 +469,7 @@  void get_seccomp_filter(struct task_struct *tsk)
 static inline void seccomp_filter_free(struct seccomp_filter *filter)
 {
 	if (filter) {
-		bpf_prog_free(filter->prog);
+		bpf_prog_destroy(filter->prog);
 		kfree(filter);
 	}
 }
diff --git a/net/core/filter.c b/net/core/filter.c
index 60e3fe7..9cf09ed 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1083,16 +1083,18 @@  EXPORT_SYMBOL_GPL(bpf_prog_create);
  *	@pfp: the unattached filter that is created
  *	@fprog: the filter program
  *	@trans: post-classic verifier transformation handler
+ *	@save_orig: save classic BPF program
  *
  * This function effectively does the same as bpf_prog_create(), only
  * that it builds up its insns buffer from user space provided buffer.
  * It also allows for passing a bpf_aux_classic_check_t handler.
  */
 int bpf_prog_create_from_user(struct bpf_prog **pfp, struct sock_fprog *fprog,
-			      bpf_aux_classic_check_t trans)
+			      bpf_aux_classic_check_t trans, bool save_orig)
 {
 	unsigned int fsize = bpf_classic_proglen(fprog);
 	struct bpf_prog *fp;
+	int err;
 
 	/* Make sure new filter is there and in the right amounts. */
 	if (fprog->filter == NULL)
@@ -1108,12 +1110,16 @@  int bpf_prog_create_from_user(struct bpf_prog **pfp, struct sock_fprog *fprog,
 	}
 
 	fp->len = fprog->len;
-	/* Since unattached filters are not copied back to user
-	 * space through sk_get_filter(), we do not need to hold
-	 * a copy here, and can spare us the work.
-	 */
 	fp->orig_prog = NULL;
 
+	if (save_orig) {
+		err = bpf_prog_store_orig_filter(fp, fprog);
+		if (err) {
+			__bpf_prog_free(fp);
+			return -ENOMEM;
+		}
+	}
+
 	/* bpf_prepare_filter() already takes care of freeing
 	 * memory in case something goes wrong.
 	 */
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index aa4b15c..81c900f 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1567,7 +1567,7 @@  static int fanout_set_data_cbpf(struct packet_sock *po, char __user *data,
 	if (copy_from_user(&fprog, data, len))
 		return -EFAULT;
 
-	ret = bpf_prog_create_from_user(&new, &fprog, NULL);
+	ret = bpf_prog_create_from_user(&new, &fprog, NULL, false);
 	if (ret)
 		return ret;