diff mbox series

[bpf-next,1/2] bpf: implement syscall command BPF_MAP_GET_NEXT_KEY for stacktrace map

Message ID 20180104072746.1569033-2-yhs@fb.com
State Changes Requested, archived
Delegated to: BPF Maintainers
Headers show
Series bpf: implement syscall command BPF_MAP_GET_NEXT_KEY for stacktrace map | expand

Commit Message

Yonghong Song Jan. 4, 2018, 7:27 a.m. UTC
Currently, bpf syscall command BPF_MAP_GET_NEXT_KEY is not
supported for stacktrace map. However, there are use cases where
user space wants to enumerate all stacktrace map entries where
BPF_MAP_GET_NEXT_KEY command will be really helpful.
In addition, if user space wants to delete all map entries
in order to save memory and does not want to close the
map file descriptor, BPF_MAP_GET_NEXT_KEY may help improve
performance if map entries are sparsely populated.

The implementation follows the API specification of existing
BPF_MAP_GET_NEXT_KEY implementation. If user provides
an NULL key pointer, the first key is returned. Otherwise,
the first valid key after the input parameter "key"
is returned, or -ENOENT if no valid key can be found.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 kernel/bpf/stackmap.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

Comments

Jakub Kicinski Jan. 4, 2018, 9:08 p.m. UTC | #1
On Wed, 3 Jan 2018 23:27:45 -0800, Yonghong Song wrote:
> Currently, bpf syscall command BPF_MAP_GET_NEXT_KEY is not
> supported for stacktrace map. However, there are use cases where
> user space wants to enumerate all stacktrace map entries where
> BPF_MAP_GET_NEXT_KEY command will be really helpful.
> In addition, if user space wants to delete all map entries
> in order to save memory and does not want to close the
> map file descriptor, BPF_MAP_GET_NEXT_KEY may help improve
> performance if map entries are sparsely populated.
> 
> The implementation follows the API specification of existing
> BPF_MAP_GET_NEXT_KEY implementation. If user provides
> an NULL key pointer, the first key is returned. Otherwise,
> the first valid key after the input parameter "key"
> is returned, or -ENOENT if no valid key can be found.
> 
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  kernel/bpf/stackmap.c | 23 +++++++++++++++++++++--
>  1 file changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index a15bc63..207b21c 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -226,9 +226,28 @@ int bpf_stackmap_copy(struct bpf_map *map, void *key, void *value)
>  	return 0;
>  }
>  
> -static int stack_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
> +static int stack_map_get_next_key(struct bpf_map *map, void *key,
> +				  void *next_key)
>  {
> -	return -EINVAL;
> +	struct bpf_stack_map *smap = container_of(map,
> +						  struct bpf_stack_map, map);
> +	u32 id;
> +
> +	WARN_ON_ONCE(!rcu_read_lock_held());
> +
> +	if (!key)
> +		id = 0;
> +	else
> +		id = *(u32 *)key + 1;
> +
> +	while (id < smap->n_buckets && !smap->buckets[id])
> +		id++;
> +
> +	if (id >= smap->n_buckets)
> +		return -ENOENT;

AFAIU for hash maps the semantics of get next are as follows:

get_next(map, key) {
	if (!key)
		return get_first(map);

	elem = lookup(map, key);
	if (!elem)                       // <-- note this branch
		return get_first(map);
	if (elem->next)
		return elem->next->key;
	return -ENOENT;
}

For arrays elements always exist, hence the elem->next check is
omitted.  Here you are, however, testing !smap->buckets[id] so I assume
elements may not exist.  

Is there any precedent for get_next on non-existent key returning
element other than first?  The stacktrace map is a bit special, and
returning id + 1 would defeat what you're trying to do here..  Is there
value in keeping the behaviour consistent across map types?  

Anyway, you said in the commit message that "The implementation follows
the API specification of existing BPF_MAP_GET_NEXT_KEY implementation."
and I find that arguable :)

> +	*(u32 *)next_key = id;
> +	return 0;
>  }
>  
>  static int stack_map_update_elem(struct bpf_map *map, void *key, void *value,
Yonghong Song Jan. 4, 2018, 9:32 p.m. UTC | #2
On 1/4/18 1:08 PM, Jakub Kicinski wrote:
> On Wed, 3 Jan 2018 23:27:45 -0800, Yonghong Song wrote:
>> Currently, bpf syscall command BPF_MAP_GET_NEXT_KEY is not
>> supported for stacktrace map. However, there are use cases where
>> user space wants to enumerate all stacktrace map entries where
>> BPF_MAP_GET_NEXT_KEY command will be really helpful.
>> In addition, if user space wants to delete all map entries
>> in order to save memory and does not want to close the
>> map file descriptor, BPF_MAP_GET_NEXT_KEY may help improve
>> performance if map entries are sparsely populated.
>>
>> The implementation follows the API specification of existing
>> BPF_MAP_GET_NEXT_KEY implementation. If user provides
>> an NULL key pointer, the first key is returned. Otherwise,
>> the first valid key after the input parameter "key"
>> is returned, or -ENOENT if no valid key can be found.
>>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>>   kernel/bpf/stackmap.c | 23 +++++++++++++++++++++--
>>   1 file changed, 21 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
>> index a15bc63..207b21c 100644
>> --- a/kernel/bpf/stackmap.c
>> +++ b/kernel/bpf/stackmap.c
>> @@ -226,9 +226,28 @@ int bpf_stackmap_copy(struct bpf_map *map, void *key, void *value)
>>   	return 0;
>>   }
>>   
>> -static int stack_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
>> +static int stack_map_get_next_key(struct bpf_map *map, void *key,
>> +				  void *next_key)
>>   {
>> -	return -EINVAL;
>> +	struct bpf_stack_map *smap = container_of(map,
>> +						  struct bpf_stack_map, map);
>> +	u32 id;
>> +
>> +	WARN_ON_ONCE(!rcu_read_lock_held());
>> +
>> +	if (!key)
>> +		id = 0;
>> +	else
>> +		id = *(u32 *)key + 1;
>> +
>> +	while (id < smap->n_buckets && !smap->buckets[id])
>> +		id++;
>> +
>> +	if (id >= smap->n_buckets)
>> +		return -ENOENT;
> 
> AFAIU for hash maps the semantics of get next are as follows:
> 
> get_next(map, key) {
> 	if (!key)
> 		return get_first(map);
> 
> 	elem = lookup(map, key);
> 	if (!elem)                       // <-- note this branch
> 		return get_first(map);
> 	if (elem->next)
> 		return elem->next->key;
> 	return -ENOENT;
> }
> 
> For arrays elements always exist, hence the elem->next check is
> omitted.  Here you are, however, testing !smap->buckets[id] so I assume
> elements may not exist.

Right, buckets[id] could be NULL.

> 
> Is there any precedent for get_next on non-existent key returning
> element other than first?  The stacktrace map is a bit special, and

Sorry, I miss this. You are right. hashtable get_next_key will return 
the first for non-existing key. And all other implemented get_next_key
is a variant of arrays where all keys already exist.

> returning id + 1 would defeat what you're trying to do here..  Is there
> value in keeping the behaviour consistent across map types?

Let us keep the behavior consistent with hashtable then.

> 
> Anyway, you said in the commit message that "The implementation follows
> the API specification of existing BPF_MAP_GET_NEXT_KEY implementation."
> and I find that arguable :)

You are right. Will send v2 soon with re-wording of commit message as well.

>> +	*(u32 *)next_key = id;
>> +	return 0;
>>   }
>>   
>>   static int stack_map_update_elem(struct bpf_map *map, void *key, void *value,
>
diff mbox series

Patch

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index a15bc63..207b21c 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -226,9 +226,28 @@  int bpf_stackmap_copy(struct bpf_map *map, void *key, void *value)
 	return 0;
 }
 
-static int stack_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
+static int stack_map_get_next_key(struct bpf_map *map, void *key,
+				  void *next_key)
 {
-	return -EINVAL;
+	struct bpf_stack_map *smap = container_of(map,
+						  struct bpf_stack_map, map);
+	u32 id;
+
+	WARN_ON_ONCE(!rcu_read_lock_held());
+
+	if (!key)
+		id = 0;
+	else
+		id = *(u32 *)key + 1;
+
+	while (id < smap->n_buckets && !smap->buckets[id])
+		id++;
+
+	if (id >= smap->n_buckets)
+		return -ENOENT;
+
+	*(u32 *)next_key = id;
+	return 0;
 }
 
 static int stack_map_update_elem(struct bpf_map *map, void *key, void *value,