diff mbox series

[5/5] bpf: Allow to resolve bpf trampoline in unwind

Message ID 20191229143740.29143-6-jolsa@kernel.org
State RFC
Delegated to: BPF Maintainers
Headers show
Series bpf: Add trampoline helpers | expand

Commit Message

Jiri Olsa Dec. 29, 2019, 2:37 p.m. UTC
When unwinding the stack we need to identify each
address to successfully continue. Adding latch tree
to keep trampolines for quick lookup during the
unwind.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 include/linux/bpf.h     |  6 ++++++
 kernel/bpf/core.c       |  2 ++
 kernel/bpf/trampoline.c | 35 +++++++++++++++++++++++++++++++++++
 3 files changed, 43 insertions(+)

Comments

Alexei Starovoitov Jan. 6, 2020, 11:46 p.m. UTC | #1
On Sun, Dec 29, 2019 at 03:37:40PM +0100, Jiri Olsa wrote:
> When unwinding the stack we need to identify each
> address to successfully continue. Adding latch tree
> to keep trampolines for quick lookup during the
> unwind.
> 
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
...
> +bool is_bpf_trampoline(void *addr)
> +{
> +	return latch_tree_find(addr, &tree, &tree_ops) != NULL;
> +}
> +
>  struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
>  {
>  	struct bpf_trampoline *tr;
> @@ -65,6 +98,7 @@ struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
>  	for (i = 0; i < BPF_TRAMP_MAX; i++)
>  		INIT_HLIST_HEAD(&tr->progs_hlist[i]);
>  	tr->image = image;
> +	latch_tree_insert(&tr->tnode, &tree, &tree_ops);

Thanks for the fix. I was thinking to apply it, but then realized that bpf
dispatcher logic has the same issue.
Could you generalize the fix for both?
May be bpf_jit_alloc_exec_page() can do latch_tree_insert() ?
and new version of bpf_jit_free_exec() is needed that will do latch_tree_erase().
Wdyt?
Daniel Borkmann Jan. 7, 2020, 8:30 a.m. UTC | #2
On 1/7/20 12:46 AM, Alexei Starovoitov wrote:
> On Sun, Dec 29, 2019 at 03:37:40PM +0100, Jiri Olsa wrote:
>> When unwinding the stack we need to identify each
>> address to successfully continue. Adding latch tree
>> to keep trampolines for quick lookup during the
>> unwind.
>>
>> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ...
>> +bool is_bpf_trampoline(void *addr)
>> +{
>> +	return latch_tree_find(addr, &tree, &tree_ops) != NULL;
>> +}
>> +
>>   struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
>>   {
>>   	struct bpf_trampoline *tr;
>> @@ -65,6 +98,7 @@ struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
>>   	for (i = 0; i < BPF_TRAMP_MAX; i++)
>>   		INIT_HLIST_HEAD(&tr->progs_hlist[i]);
>>   	tr->image = image;
>> +	latch_tree_insert(&tr->tnode, &tree, &tree_ops);
> 
> Thanks for the fix. I was thinking to apply it, but then realized that bpf
> dispatcher logic has the same issue.
> Could you generalize the fix for both?
> May be bpf_jit_alloc_exec_page() can do latch_tree_insert() ?
> and new version of bpf_jit_free_exec() is needed that will do latch_tree_erase().
> Wdyt?

Also this patch is buggy since your latch lookup happens under RCU, but
I don't see anything that waits a grace period once you remove from the
tree. Instead you free the trampoline right away.

On a different question, given we have all the kallsym infrastructure
for BPF already in place, did you look into whether it's feasible to
make it a bit more generic to also cover JITed buffers from trampolines?
Jiri Olsa Jan. 7, 2020, 1:05 p.m. UTC | #3
On Mon, Jan 06, 2020 at 03:46:40PM -0800, Alexei Starovoitov wrote:
> On Sun, Dec 29, 2019 at 03:37:40PM +0100, Jiri Olsa wrote:
> > When unwinding the stack we need to identify each
> > address to successfully continue. Adding latch tree
> > to keep trampolines for quick lookup during the
> > unwind.
> > 
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ...
> > +bool is_bpf_trampoline(void *addr)
> > +{
> > +	return latch_tree_find(addr, &tree, &tree_ops) != NULL;
> > +}
> > +
> >  struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
> >  {
> >  	struct bpf_trampoline *tr;
> > @@ -65,6 +98,7 @@ struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
> >  	for (i = 0; i < BPF_TRAMP_MAX; i++)
> >  		INIT_HLIST_HEAD(&tr->progs_hlist[i]);
> >  	tr->image = image;
> > +	latch_tree_insert(&tr->tnode, &tree, &tree_ops);
> 
> Thanks for the fix. I was thinking to apply it, but then realized that bpf
> dispatcher logic has the same issue.
> Could you generalize the fix for both?
> May be bpf_jit_alloc_exec_page() can do latch_tree_insert() ?
> and new version of bpf_jit_free_exec() is needed that will do latch_tree_erase().
> Wdyt?

I need to check the dispatcher code, but seems ok.. will check

jirka
Jiri Olsa Jan. 7, 2020, 1:15 p.m. UTC | #4
On Tue, Jan 07, 2020 at 09:30:12AM +0100, Daniel Borkmann wrote:
> On 1/7/20 12:46 AM, Alexei Starovoitov wrote:
> > On Sun, Dec 29, 2019 at 03:37:40PM +0100, Jiri Olsa wrote:
> > > When unwinding the stack we need to identify each
> > > address to successfully continue. Adding latch tree
> > > to keep trampolines for quick lookup during the
> > > unwind.
> > > 
> > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > ...
> > > +bool is_bpf_trampoline(void *addr)
> > > +{
> > > +	return latch_tree_find(addr, &tree, &tree_ops) != NULL;
> > > +}
> > > +
> > >   struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
> > >   {
> > >   	struct bpf_trampoline *tr;
> > > @@ -65,6 +98,7 @@ struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
> > >   	for (i = 0; i < BPF_TRAMP_MAX; i++)
> > >   		INIT_HLIST_HEAD(&tr->progs_hlist[i]);
> > >   	tr->image = image;
> > > +	latch_tree_insert(&tr->tnode, &tree, &tree_ops);
> > 
> > Thanks for the fix. I was thinking to apply it, but then realized that bpf
> > dispatcher logic has the same issue.
> > Could you generalize the fix for both?
> > May be bpf_jit_alloc_exec_page() can do latch_tree_insert() ?
> > and new version of bpf_jit_free_exec() is needed that will do latch_tree_erase().
> > Wdyt?
> 
> Also this patch is buggy since your latch lookup happens under RCU, but
> I don't see anything that waits a grace period once you remove from the
> tree. Instead you free the trampoline right away.

thanks, did not think of that.. will (try to) fix ;-)

> 
> On a different question, given we have all the kallsym infrastructure
> for BPF already in place, did you look into whether it's feasible to
> make it a bit more generic to also cover JITed buffers from trampolines?
> 

hum, it did not occur to me that we want to see it in kallsyms,
but sure.. how about: bpf_trampoline_<key> ?

key would be taken from bpf_trampoline::key as function's BTF id

jirka
Björn Töpel Jan. 7, 2020, 1:30 p.m. UTC | #5
On 2020-01-07 14:05, Jiri Olsa wrote:
> On Mon, Jan 06, 2020 at 03:46:40PM -0800, Alexei Starovoitov wrote:
>> On Sun, Dec 29, 2019 at 03:37:40PM +0100, Jiri Olsa wrote:
>>> When unwinding the stack we need to identify each
>>> address to successfully continue. Adding latch tree
>>> to keep trampolines for quick lookup during the
>>> unwind.
>>>
>>> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
>> ...
>>> +bool is_bpf_trampoline(void *addr)
>>> +{
>>> +	return latch_tree_find(addr, &tree, &tree_ops) != NULL;
>>> +}
>>> +
>>>   struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
>>>   {
>>>   	struct bpf_trampoline *tr;
>>> @@ -65,6 +98,7 @@ struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
>>>   	for (i = 0; i < BPF_TRAMP_MAX; i++)
>>>   		INIT_HLIST_HEAD(&tr->progs_hlist[i]);
>>>   	tr->image = image;
>>> +	latch_tree_insert(&tr->tnode, &tree, &tree_ops);
>>
>> Thanks for the fix. I was thinking to apply it, but then realized that bpf
>> dispatcher logic has the same issue.
>> Could you generalize the fix for both?
>> May be bpf_jit_alloc_exec_page() can do latch_tree_insert() ?
>> and new version of bpf_jit_free_exec() is needed that will do latch_tree_erase().
>> Wdyt?
> 
> I need to check the dispatcher code, but seems ok.. will check
>

Thanks Jiri! The trampoline and dispatcher share the image allocation, 
so putting it there would make sense.

It's annoying that the dispatcher doesn't show up correctly in perf, and 
it's been on my list to fix that. Hopefully you beat me to it! :-D


Björn
Daniel Borkmann Jan. 7, 2020, 7:30 p.m. UTC | #6
On 1/7/20 2:15 PM, Jiri Olsa wrote:
> On Tue, Jan 07, 2020 at 09:30:12AM +0100, Daniel Borkmann wrote:
>> On 1/7/20 12:46 AM, Alexei Starovoitov wrote:
>>> On Sun, Dec 29, 2019 at 03:37:40PM +0100, Jiri Olsa wrote:
>>>> When unwinding the stack we need to identify each
>>>> address to successfully continue. Adding latch tree
>>>> to keep trampolines for quick lookup during the
>>>> unwind.
>>>>
>>>> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
>>> ...
>>>> +bool is_bpf_trampoline(void *addr)
>>>> +{
>>>> +	return latch_tree_find(addr, &tree, &tree_ops) != NULL;
>>>> +}
>>>> +
>>>>    struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
>>>>    {
>>>>    	struct bpf_trampoline *tr;
>>>> @@ -65,6 +98,7 @@ struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
>>>>    	for (i = 0; i < BPF_TRAMP_MAX; i++)
>>>>    		INIT_HLIST_HEAD(&tr->progs_hlist[i]);
>>>>    	tr->image = image;
>>>> +	latch_tree_insert(&tr->tnode, &tree, &tree_ops);
>>>
>>> Thanks for the fix. I was thinking to apply it, but then realized that bpf
>>> dispatcher logic has the same issue.
>>> Could you generalize the fix for both?
>>> May be bpf_jit_alloc_exec_page() can do latch_tree_insert() ?
>>> and new version of bpf_jit_free_exec() is needed that will do latch_tree_erase().
>>> Wdyt?
>>
>> Also this patch is buggy since your latch lookup happens under RCU, but
>> I don't see anything that waits a grace period once you remove from the
>> tree. Instead you free the trampoline right away.
> 
> thanks, did not think of that.. will (try to) fix ;-)
> 
>> On a different question, given we have all the kallsym infrastructure
>> for BPF already in place, did you look into whether it's feasible to
>> make it a bit more generic to also cover JITed buffers from trampolines?
> 
> hum, it did not occur to me that we want to see it in kallsyms,
> but sure.. how about: bpf_trampoline_<key> ?
> 
> key would be taken from bpf_trampoline::key as function's BTF id

Yeap, I think bpf_trampoline_<btf_id> would make sense here.

Thanks,
Daniel
Jiri Olsa Jan. 13, 2020, 9:43 a.m. UTC | #7
On Tue, Jan 07, 2020 at 02:30:35PM +0100, Björn Töpel wrote:
> On 2020-01-07 14:05, Jiri Olsa wrote:
> > On Mon, Jan 06, 2020 at 03:46:40PM -0800, Alexei Starovoitov wrote:
> > > On Sun, Dec 29, 2019 at 03:37:40PM +0100, Jiri Olsa wrote:
> > > > When unwinding the stack we need to identify each
> > > > address to successfully continue. Adding latch tree
> > > > to keep trampolines for quick lookup during the
> > > > unwind.
> > > > 
> > > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > ...
> > > > +bool is_bpf_trampoline(void *addr)
> > > > +{
> > > > +	return latch_tree_find(addr, &tree, &tree_ops) != NULL;
> > > > +}
> > > > +
> > > >   struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
> > > >   {
> > > >   	struct bpf_trampoline *tr;
> > > > @@ -65,6 +98,7 @@ struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
> > > >   	for (i = 0; i < BPF_TRAMP_MAX; i++)
> > > >   		INIT_HLIST_HEAD(&tr->progs_hlist[i]);
> > > >   	tr->image = image;
> > > > +	latch_tree_insert(&tr->tnode, &tree, &tree_ops);
> > > 
> > > Thanks for the fix. I was thinking to apply it, but then realized that bpf
> > > dispatcher logic has the same issue.
> > > Could you generalize the fix for both?
> > > May be bpf_jit_alloc_exec_page() can do latch_tree_insert() ?
> > > and new version of bpf_jit_free_exec() is needed that will do latch_tree_erase().
> > > Wdyt?
> > 
> > I need to check the dispatcher code, but seems ok.. will check
> > 
> 
> Thanks Jiri! The trampoline and dispatcher share the image allocation, so
> putting it there would make sense.
> 
> It's annoying that the dispatcher doesn't show up correctly in perf, and
> it's been on my list to fix that. Hopefully you beat me to it! :-D

hi,
attached patch seems to work for me (trampoline usecase), but I don't know
how to test it for dispatcher.. also I need to check if we need to decrease
BPF_TRAMP_MAX or BPF_DISPATCHER_MAX, it might take more time ;-)

jirka


---
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index aed2bc39d72b..e0ca8780dc7a 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -510,7 +510,6 @@ struct bpf_trampoline *bpf_trampoline_lookup(u64 key);
 int bpf_trampoline_link_prog(struct bpf_prog *prog);
 int bpf_trampoline_unlink_prog(struct bpf_prog *prog);
 void bpf_trampoline_put(struct bpf_trampoline *tr);
-void *bpf_jit_alloc_exec_page(void);
 #define BPF_DISPATCHER_INIT(name) {			\
 	.mutex = __MUTEX_INITIALIZER(name.mutex),	\
 	.func = &name##func,				\
@@ -542,6 +541,13 @@ void *bpf_jit_alloc_exec_page(void);
 #define BPF_DISPATCHER_PTR(name) (&name)
 void bpf_dispatcher_change_prog(struct bpf_dispatcher *d, struct bpf_prog *from,
 				struct bpf_prog *to);
+struct bpf_image {
+	struct latch_tree_node tnode;
+	unsigned char data[];
+};
+#define BPF_IMAGE_SIZE (PAGE_SIZE - sizeof(struct bpf_image))
+bool is_bpf_image(void *addr);
+void *bpf_image_alloc(void);
 #else
 static inline struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
 {
@@ -563,6 +569,10 @@ static inline void bpf_trampoline_put(struct bpf_trampoline *tr) {}
 static inline void bpf_dispatcher_change_prog(struct bpf_dispatcher *d,
 					      struct bpf_prog *from,
 					      struct bpf_prog *to) {}
+static inline bool is_bpf_image(void *addr)
+{
+	return false;
+}
 #endif
 
 struct bpf_func_info_aux {
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 29d47aae0dd1..53dc3adf6077 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -704,6 +704,8 @@ bool is_bpf_text_address(unsigned long addr)
 
 	rcu_read_lock();
 	ret = bpf_prog_kallsyms_find(addr) != NULL;
+	if (!ret)
+		ret = is_bpf_image((void*) addr);
 	rcu_read_unlock();
 
 	return ret;
diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c
index 204ee61a3904..b3e5b214fed8 100644
--- a/kernel/bpf/dispatcher.c
+++ b/kernel/bpf/dispatcher.c
@@ -113,7 +113,7 @@ static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs)
 		noff = 0;
 	} else {
 		old = d->image + d->image_off;
-		noff = d->image_off ^ (PAGE_SIZE / 2);
+		noff = d->image_off ^ (BPF_IMAGE_SIZE / 2);
 	}
 
 	new = d->num_progs ? d->image + noff : NULL;
@@ -140,7 +140,7 @@ void bpf_dispatcher_change_prog(struct bpf_dispatcher *d, struct bpf_prog *from,
 
 	mutex_lock(&d->mutex);
 	if (!d->image) {
-		d->image = bpf_jit_alloc_exec_page();
+		d->image = bpf_image_alloc();
 		if (!d->image)
 			goto out;
 	}
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index 79a04417050d..3ea56f89c68a 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -4,6 +4,7 @@
 #include <linux/bpf.h>
 #include <linux/filter.h>
 #include <linux/ftrace.h>
+#include <linux/rbtree_latch.h>
 
 /* btf_vmlinux has ~22k attachable functions. 1k htab is enough. */
 #define TRAMPOLINE_HASH_BITS 10
@@ -14,7 +15,12 @@ static struct hlist_head trampoline_table[TRAMPOLINE_TABLE_SIZE];
 /* serializes access to trampoline_table */
 static DEFINE_MUTEX(trampoline_mutex);
 
-void *bpf_jit_alloc_exec_page(void)
+static struct latch_tree_root image_tree __cacheline_aligned;
+
+/* serializes access to image_tree */
+static DEFINE_MUTEX(image_mutex);
+
+static void *bpf_jit_alloc_exec_page(void)
 {
 	void *image;
 
@@ -30,6 +36,62 @@ void *bpf_jit_alloc_exec_page(void)
 	return image;
 }
 
+static __always_inline bool image_tree_less(struct latch_tree_node *a,
+				      struct latch_tree_node *b)
+{
+	struct bpf_image *ia = container_of(a, struct bpf_image, tnode);
+	struct bpf_image *ib = container_of(b, struct bpf_image, tnode);
+
+	return ia < ib;
+}
+
+static __always_inline int image_tree_comp(void *addr, struct latch_tree_node *n)
+{
+	void *image = container_of(n, struct bpf_image, tnode);
+
+	if (addr < image)
+		return -1;
+	if (addr >= image + PAGE_SIZE)
+		return 1;
+
+	return 0;
+}
+
+static const struct latch_tree_ops image_tree_ops = {
+	.less	= image_tree_less,
+	.comp	= image_tree_comp,
+};
+
+void *bpf_image_alloc(void)
+{
+	struct bpf_image *image;
+
+	image = bpf_jit_alloc_exec_page();
+	if (!image)
+		return NULL;
+
+	mutex_lock(&image_mutex);
+	latch_tree_insert(&image->tnode, &image_tree, &image_tree_ops);
+	mutex_unlock(&image_mutex);
+	return image->data;
+}
+
+void bpf_image_delete(void *ptr)
+{
+	struct bpf_image *image = container_of(ptr, struct bpf_image, data);
+
+	mutex_lock(&image_mutex);
+	latch_tree_erase(&image->tnode, &image_tree, &image_tree_ops);
+	synchronize_rcu();
+	bpf_jit_free_exec(image);
+	mutex_unlock(&image_mutex);
+}
+
+bool is_bpf_image(void *addr)
+{
+	return latch_tree_find(addr, &image_tree, &image_tree_ops) != NULL;
+}
+
 struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
 {
 	struct bpf_trampoline *tr;
@@ -50,7 +112,7 @@ struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
 		goto out;
 
 	/* is_root was checked earlier. No need for bpf_jit_charge_modmem() */
-	image = bpf_jit_alloc_exec_page();
+	image = bpf_image_alloc();
 	if (!image) {
 		kfree(tr);
 		tr = NULL;
@@ -125,14 +187,14 @@ static int register_fentry(struct bpf_trampoline *tr, void *new_addr)
 }
 
 /* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50
- * bytes on x86.  Pick a number to fit into PAGE_SIZE / 2
+ * bytes on x86.  Pick a number to fit into BPF_IMAGE_SIZE / 2
  */
 #define BPF_MAX_TRAMP_PROGS 40
 
 static int bpf_trampoline_update(struct bpf_trampoline *tr)
 {
-	void *old_image = tr->image + ((tr->selector + 1) & 1) * PAGE_SIZE/2;
-	void *new_image = tr->image + (tr->selector & 1) * PAGE_SIZE/2;
+	void *old_image = tr->image + ((tr->selector + 1) & 1) * BPF_IMAGE_SIZE/2;
+	void *new_image = tr->image + (tr->selector & 1) * BPF_IMAGE_SIZE/2;
 	struct bpf_prog *progs_to_run[BPF_MAX_TRAMP_PROGS];
 	int fentry_cnt = tr->progs_cnt[BPF_TRAMP_FENTRY];
 	int fexit_cnt = tr->progs_cnt[BPF_TRAMP_FEXIT];
@@ -160,7 +222,7 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr)
 	if (fexit_cnt)
 		flags = BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_SKIP_FRAME;
 
-	err = arch_prepare_bpf_trampoline(new_image, new_image + PAGE_SIZE / 2,
+	err = arch_prepare_bpf_trampoline(new_image, new_image + BPF_IMAGE_SIZE / 2,
 					  &tr->func.model, flags,
 					  fentry, fentry_cnt,
 					  fexit, fexit_cnt,
@@ -251,7 +313,7 @@ void bpf_trampoline_put(struct bpf_trampoline *tr)
 		goto out;
 	if (WARN_ON_ONCE(!hlist_empty(&tr->progs_hlist[BPF_TRAMP_FEXIT])))
 		goto out;
-	bpf_jit_free_exec(tr->image);
+	bpf_image_delete(tr->image);
 	hlist_del(&tr->hlist);
 	kfree(tr);
 out:
Björn Töpel Jan. 13, 2020, 12:21 p.m. UTC | #8
On 2020-01-13 10:43, Jiri Olsa wrote:
> hi,
> attached patch seems to work for me (trampoline usecase), but I don't know
> how to test it for dispatcher.. also I need to check if we need to decrease
> BPF_TRAMP_MAX or BPF_DISPATCHER_MAX, it might take more time;-)
> 

Thanks for working on it! I'll take the patch for a spin.

To test the dispatcher, just run XDP!

With your change, the BPF_DISPATCHER_MAX is still valid. 48 entries =>
1890B which is < (BPF_IMAGE_SIZE / 2).


Cheers,
Björn
Björn Töpel Jan. 13, 2020, 12:31 p.m. UTC | #9
On 2020-01-13 13:21, Björn Töpel wrote:
> 
> On 2020-01-13 10:43, Jiri Olsa wrote:
>> hi,
>> attached patch seems to work for me (trampoline usecase), but I don't 
>> know
>> how to test it for dispatcher.. also I need to check if we need to 
>> decrease
>> BPF_TRAMP_MAX or BPF_DISPATCHER_MAX, it might take more time;-)
>>
> 
> Thanks for working on it! I'll take the patch for a spin.
> 
> To test the dispatcher, just run XDP!
> 
> With your change, the BPF_DISPATCHER_MAX is still valid. 48 entries =>
> 1890B which is < (BPF_IMAGE_SIZE / 2).
>

...and FWIW, it would be nice with bpf_dispatcher_<...> entries in 
kallsyms as well. If that code could be shared with the trampoline code 
as well (bpf_trampoline_<btf_id>), that'd be great!
Jiri Olsa Jan. 13, 2020, 12:37 p.m. UTC | #10
On Mon, Jan 13, 2020 at 01:31:38PM +0100, Björn Töpel wrote:
> On 2020-01-13 13:21, Björn Töpel wrote:
> > 
> > On 2020-01-13 10:43, Jiri Olsa wrote:
> > > hi,
> > > attached patch seems to work for me (trampoline usecase), but I
> > > don't know
> > > how to test it for dispatcher.. also I need to check if we need to
> > > decrease
> > > BPF_TRAMP_MAX or BPF_DISPATCHER_MAX, it might take more time;-)
> > > 
> > 
> > Thanks for working on it! I'll take the patch for a spin.
> > 
> > To test the dispatcher, just run XDP!
> > 
> > With your change, the BPF_DISPATCHER_MAX is still valid. 48 entries =>
> > 1890B which is < (BPF_IMAGE_SIZE / 2).

great

> > 
> 
> ...and FWIW, it would be nice with bpf_dispatcher_<...> entries in kallsyms

ok so it'd be 'bpf_dispatcher_<name>'

from DEFINE_BPF_DISPATCHER(name)

> as well. If that code could be shared with the trampoline code as well
> (bpf_trampoline_<btf_id>), that'd be great!
> 

ok, will add it

thanks,
jirka
Jiri Olsa Feb. 3, 2020, 7:58 p.m. UTC | #11
On Mon, Jan 13, 2020 at 01:37:28PM +0100, Jiri Olsa wrote:
> On Mon, Jan 13, 2020 at 01:31:38PM +0100, Björn Töpel wrote:
> > On 2020-01-13 13:21, Björn Töpel wrote:
> > > 
> > > On 2020-01-13 10:43, Jiri Olsa wrote:
> > > > hi,
> > > > attached patch seems to work for me (trampoline usecase), but I
> > > > don't know
> > > > how to test it for dispatcher.. also I need to check if we need to
> > > > decrease
> > > > BPF_TRAMP_MAX or BPF_DISPATCHER_MAX, it might take more time;-)
> > > > 
> > > 
> > > Thanks for working on it! I'll take the patch for a spin.
> > > 
> > > To test the dispatcher, just run XDP!
> > > 
> > > With your change, the BPF_DISPATCHER_MAX is still valid. 48 entries =>
> > > 1890B which is < (BPF_IMAGE_SIZE / 2).
> 
> great
> 
> > > 
> > 
> > ...and FWIW, it would be nice with bpf_dispatcher_<...> entries in kallsyms
> 
> ok so it'd be 'bpf_dispatcher_<name>'

hi,
so the only dispatcher is currently defined as:
  DEFINE_BPF_DISPATCHER(bpf_dispatcher_xdp)

with the bpf_dispatcher_<name> logic it shows in kallsyms as:
  ffffffffa0450000 t bpf_dispatcher_bpf_dispatcher_xdp    [bpf]

to fix that, would you guys preffer having:
  DEFINE_BPF_DISPATCHER(xdp) 

or using the full dispatcher name as kallsyms name?
which would require some discipline for future dispatcher names ;-)

thanks,
jirka
Björn Töpel Feb. 3, 2020, 8:27 p.m. UTC | #12
On 2020-02-03 20:58, Jiri Olsa wrote:
[...]
>>> ...and FWIW, it would be nice with bpf_dispatcher_<...> entries in kallsyms
>>
>> ok so it'd be 'bpf_dispatcher_<name>'
> 
> hi,
> so the only dispatcher is currently defined as:
>    DEFINE_BPF_DISPATCHER(bpf_dispatcher_xdp)
> 
> with the bpf_dispatcher_<name> logic it shows in kallsyms as:
>    ffffffffa0450000 t bpf_dispatcher_bpf_dispatcher_xdp    [bpf]
>

Ick! :-P


> to fix that, would you guys preffer having:
>    DEFINE_BPF_DISPATCHER(xdp)
> 
> or using the full dispatcher name as kallsyms name?
> which would require some discipline for future dispatcher names ;-)
>

I'd prefer the latter, i.e. name "xdp" is shown as bpf_dispatcher_xdp in 
kallsyms.

...and if this route is taken, the macros can be changed, so that the 
trampoline functions are prefixed with "bpf_dispatcher_". Something like 
this (and also a small '_' cleanup):


diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 8e9ad3943cd9..15c5f351f837 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -512,7 +512,7 @@ struct bpf_dispatcher {
  	u32 image_off;
  };

-static __always_inline unsigned int bpf_dispatcher_nopfunc(
+static __always_inline unsigned int bpf_dispatcher_nop_func(
  	const void *ctx,
  	const struct bpf_insn *insnsi,
  	unsigned int (*bpf_func)(const void *,
@@ -527,7 +527,7 @@ int bpf_trampoline_unlink_prog(struct bpf_prog *prog);
  void bpf_trampoline_put(struct bpf_trampoline *tr);
  #define BPF_DISPATCHER_INIT(name) {			\
  	.mutex = __MUTEX_INITIALIZER(name.mutex),	\
-	.func = &name##func,				\
+	.func = &name##_func,				\
  	.progs = {},					\
  	.num_progs = 0,					\
  	.image = NULL,					\
@@ -535,7 +535,7 @@ void bpf_trampoline_put(struct bpf_trampoline *tr);
  }

  #define DEFINE_BPF_DISPATCHER(name)					\
-	noinline unsigned int name##func(				\
+	noinline unsigned int bpf_dispatcher_##name##_func(		\
  		const void *ctx,					\
  		const struct bpf_insn *insnsi,				\
  		unsigned int (*bpf_func)(const void *,			\
@@ -543,17 +543,18 @@ void bpf_trampoline_put(struct bpf_trampoline *tr);
  	{								\
  		return bpf_func(ctx, insnsi);				\
  	}								\
-	EXPORT_SYMBOL(name##func);			\
-	struct bpf_dispatcher name = BPF_DISPATCHER_INIT(name);
+	EXPORT_SYMBOL(bpf_dispatcher_##name##_func);			\
+	struct bpf_dispatcher bpf_dispatcher_##name =			\
+		BPF_DISPATCHER_INIT(bpf_dispatcher_##name);
  #define DECLARE_BPF_DISPATCHER(name)					\
-	unsigned int name##func(					\
+	unsigned int bpf_dispatcher_##name##_func(			\
  		const void *ctx,					\
  		const struct bpf_insn *insnsi,				\
  		unsigned int (*bpf_func)(const void *,			\
  					 const struct bpf_insn *));	\
-	extern struct bpf_dispatcher name;
-#define BPF_DISPATCHER_FUNC(name) name##func
-#define BPF_DISPATCHER_PTR(name) (&name)
+	extern struct bpf_dispatcher bpf_dispatcher_##name;
+#define BPF_DISPATCHER_FUNC(name) bpf_dispatcher_##name##_func
+#define BPF_DISPATCHER_PTR(name) (&bpf_dispatcher_##name)
  void bpf_dispatcher_change_prog(struct bpf_dispatcher *d, struct 
bpf_prog *from,
  				struct bpf_prog *to);
  struct bpf_image {
@@ -579,7 +580,7 @@ static inline int bpf_trampoline_unlink_prog(struct 
bpf_prog *prog)
  static inline void bpf_trampoline_put(struct bpf_trampoline *tr) {}
  #define DEFINE_BPF_DISPATCHER(name)
  #define DECLARE_BPF_DISPATCHER(name)
-#define BPF_DISPATCHER_FUNC(name) bpf_dispatcher_nopfunc
+#define BPF_DISPATCHER_FUNC(name) bpf_dispatcher_nop_func
  #define BPF_DISPATCHER_PTR(name) NULL
  static inline void bpf_dispatcher_change_prog(struct bpf_dispatcher *d,
  					      struct bpf_prog *from,
diff --git a/include/linux/filter.h b/include/linux/filter.h
index f349e2c0884c..eafe72644282 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -577,7 +577,7 @@ DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
  	ret; })

  #define BPF_PROG_RUN(prog, ctx) __BPF_PROG_RUN(prog, ctx,		\
-					       bpf_dispatcher_nopfunc)
+					       bpf_dispatcher_nop_func)

  #define BPF_SKB_CB_LEN QDISC_CB_PRIV_LEN

@@ -701,7 +701,7 @@ static inline u32 bpf_prog_run_clear_cb(const struct 
bpf_prog *prog,
  	return res;
  }

-DECLARE_BPF_DISPATCHER(bpf_dispatcher_xdp)
+DECLARE_BPF_DISPATCHER(xdp)

  static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
  					    struct xdp_buff *xdp)
@@ -712,8 +712,7 @@ static __always_inline u32 bpf_prog_run_xdp(const 
struct bpf_prog *prog,
  	 * already takes rcu_read_lock() when fetching the program, so
  	 * it's not necessary here anymore.
  	 */
-	return __BPF_PROG_RUN(prog, xdp,
-			      BPF_DISPATCHER_FUNC(bpf_dispatcher_xdp));
+	return __BPF_PROG_RUN(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
  }

  void bpf_prog_change_xdp(struct bpf_prog *prev_prog, struct bpf_prog 
*prog);
diff --git a/net/core/filter.c b/net/core/filter.c
index 792e3744b915..5db435141e16 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -8835,10 +8835,9 @@ const struct bpf_prog_ops sk_reuseport_prog_ops = {
  };
  #endif /* CONFIG_INET */

-DEFINE_BPF_DISPATCHER(bpf_dispatcher_xdp)
+DEFINE_BPF_DISPATCHER(xdp)

  void bpf_prog_change_xdp(struct bpf_prog *prev_prog, struct bpf_prog 
*prog)
  {
-	bpf_dispatcher_change_prog(BPF_DISPATCHER_PTR(bpf_dispatcher_xdp),
-				   prev_prog, prog);
+	bpf_dispatcher_change_prog(BPF_DISPATCHER_PTR(xdp), prev_prog, prog);
  }
Toke Høiland-Jørgensen Feb. 3, 2020, 8:45 p.m. UTC | #13
Björn Töpel <bjorn.topel@intel.com> writes:

> On 2020-02-03 20:58, Jiri Olsa wrote:
> [...]
>>>> ...and FWIW, it would be nice with bpf_dispatcher_<...> entries in kallsyms
>>>
>>> ok so it'd be 'bpf_dispatcher_<name>'
>> 
>> hi,
>> so the only dispatcher is currently defined as:
>>    DEFINE_BPF_DISPATCHER(bpf_dispatcher_xdp)
>> 
>> with the bpf_dispatcher_<name> logic it shows in kallsyms as:
>>    ffffffffa0450000 t bpf_dispatcher_bpf_dispatcher_xdp    [bpf]
>>
>
> Ick! :-P
>
>
>> to fix that, would you guys preffer having:
>>    DEFINE_BPF_DISPATCHER(xdp)
>> 
>> or using the full dispatcher name as kallsyms name?
>> which would require some discipline for future dispatcher names ;-)
>>
>
> I'd prefer the latter, i.e. name "xdp" is shown as bpf_dispatcher_xdp in 
> kallsyms.
>
> ...and if this route is taken, the macros can be changed, so that the 
> trampoline functions are prefixed with "bpf_dispatcher_". Something like 
> this (and also a small '_' cleanup):

+1; and thanks for fixing that _ as well - that was really bothering me :)

-Toke
Jiri Olsa Feb. 4, 2020, 8:18 a.m. UTC | #14
On Mon, Feb 03, 2020 at 09:27:39PM +0100, Björn Töpel wrote:
> On 2020-02-03 20:58, Jiri Olsa wrote:
> [...]
> > > > ...and FWIW, it would be nice with bpf_dispatcher_<...> entries in kallsyms
> > > 
> > > ok so it'd be 'bpf_dispatcher_<name>'
> > 
> > hi,
> > so the only dispatcher is currently defined as:
> >    DEFINE_BPF_DISPATCHER(bpf_dispatcher_xdp)
> > 
> > with the bpf_dispatcher_<name> logic it shows in kallsyms as:
> >    ffffffffa0450000 t bpf_dispatcher_bpf_dispatcher_xdp    [bpf]
> > 
> 
> Ick! :-P

yea, but it draws attention ;-)

> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index f349e2c0884c..eafe72644282 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -577,7 +577,7 @@ DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
>  	ret; })
> 
>  #define BPF_PROG_RUN(prog, ctx) __BPF_PROG_RUN(prog, ctx,		\
> -					       bpf_dispatcher_nopfunc)
> +					       bpf_dispatcher_nop_func)
> 
>  #define BPF_SKB_CB_LEN QDISC_CB_PRIV_LEN
> 
> @@ -701,7 +701,7 @@ static inline u32 bpf_prog_run_clear_cb(const struct
> bpf_prog *prog,
>  	return res;
>  }
> 
> -DECLARE_BPF_DISPATCHER(bpf_dispatcher_xdp)
> +DECLARE_BPF_DISPATCHER(xdp)

yep, that's what I prefer ;-) I'll attach your patch
to my kallsyms changes

thanks,
jirka
diff mbox series

Patch

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index b14e51d56a82..66825c821ac9 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -470,6 +470,7 @@  struct bpf_trampoline {
 	/* Executable image of trampoline */
 	void *image;
 	u64 selector;
+	struct latch_tree_node tnode;
 };
 
 #define BPF_DISPATCHER_MAX 48 /* Fits in 2048B */
@@ -502,6 +503,7 @@  struct bpf_trampoline *bpf_trampoline_lookup(u64 key);
 int bpf_trampoline_link_prog(struct bpf_prog *prog);
 int bpf_trampoline_unlink_prog(struct bpf_prog *prog);
 void bpf_trampoline_put(struct bpf_trampoline *tr);
+bool is_bpf_trampoline(void *addr);
 void *bpf_jit_alloc_exec_page(void);
 #define BPF_DISPATCHER_INIT(name) {			\
 	.mutex = __MUTEX_INITIALIZER(name.mutex),	\
@@ -555,6 +557,10 @@  static inline void bpf_trampoline_put(struct bpf_trampoline *tr) {}
 static inline void bpf_dispatcher_change_prog(struct bpf_dispatcher *d,
 					      struct bpf_prog *from,
 					      struct bpf_prog *to) {}
+static inline bool is_bpf_trampoline(void *addr)
+{
+	return false;
+}
 #endif
 
 struct bpf_func_info_aux {
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 29d47aae0dd1..63a515b5aa7b 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -704,6 +704,8 @@  bool is_bpf_text_address(unsigned long addr)
 
 	rcu_read_lock();
 	ret = bpf_prog_kallsyms_find(addr) != NULL;
+	if (!ret)
+		ret = is_bpf_trampoline((void*) addr);
 	rcu_read_unlock();
 
 	return ret;
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index 505f4e4b31d2..4b5f0d0b0072 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -4,16 +4,44 @@ 
 #include <linux/bpf.h>
 #include <linux/filter.h>
 #include <linux/ftrace.h>
+#include <linux/rbtree_latch.h>
 
 /* btf_vmlinux has ~22k attachable functions. 1k htab is enough. */
 #define TRAMPOLINE_HASH_BITS 10
 #define TRAMPOLINE_TABLE_SIZE (1 << TRAMPOLINE_HASH_BITS)
 
 static struct hlist_head trampoline_table[TRAMPOLINE_TABLE_SIZE];
+static struct latch_tree_root tree __cacheline_aligned;
 
 /* serializes access to trampoline_table */
 static DEFINE_MUTEX(trampoline_mutex);
 
+static __always_inline bool tree_less(struct latch_tree_node *a,
+				      struct latch_tree_node *b)
+{
+	struct bpf_trampoline *ta = container_of(a, struct bpf_trampoline, tnode);
+	struct bpf_trampoline *tb = container_of(b, struct bpf_trampoline, tnode);
+
+	return ta->image < tb->image;
+}
+
+static __always_inline int tree_comp(void *addr, struct latch_tree_node *n)
+{
+	struct bpf_trampoline *tr = container_of(n, struct bpf_trampoline, tnode);
+
+	if (addr < tr->image)
+		return -1;
+	if (addr >= tr->image + PAGE_SIZE)
+		return  1;
+
+	return 0;
+}
+
+static const struct latch_tree_ops tree_ops = {
+	.less	= tree_less,
+	.comp	= tree_comp,
+};
+
 void *bpf_jit_alloc_exec_page(void)
 {
 	void *image;
@@ -30,6 +58,11 @@  void *bpf_jit_alloc_exec_page(void)
 	return image;
 }
 
+bool is_bpf_trampoline(void *addr)
+{
+	return latch_tree_find(addr, &tree, &tree_ops) != NULL;
+}
+
 struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
 {
 	struct bpf_trampoline *tr;
@@ -65,6 +98,7 @@  struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
 	for (i = 0; i < BPF_TRAMP_MAX; i++)
 		INIT_HLIST_HEAD(&tr->progs_hlist[i]);
 	tr->image = image;
+	latch_tree_insert(&tr->tnode, &tree, &tree_ops);
 out:
 	mutex_unlock(&trampoline_mutex);
 	return tr;
@@ -252,6 +286,7 @@  void bpf_trampoline_put(struct bpf_trampoline *tr)
 		goto out;
 	bpf_jit_free_exec(tr->image);
 	hlist_del(&tr->hlist);
+	latch_tree_erase(&tr->tnode, &tree, &tree_ops);
 	kfree(tr);
 out:
 	mutex_unlock(&trampoline_mutex);