diff mbox

[net-next,1/7] bpf: add 'flags' attribute to BPF_MAP_UPDATE_ELEM command

Message ID 1415069656-14138-2-git-send-email-ast@plumgrid.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Alexei Starovoitov Nov. 4, 2014, 2:54 a.m. UTC
the current meaning of BPF_MAP_UPDATE_ELEM syscall command is:
either update existing map element or create a new one.
Initially the plan was to add a new command to handle the case of
'create new element if it didn't exist', but 'flags' style looks
cleaner and overall diff is much smaller (more code reused), so add 'flags'
attribute to BPF_MAP_UPDATE_ELEM command with the following meaning:
enum {
  BPF_MAP_UPDATE_OR_CREATE = 0, /* add new element or update existing */
  BPF_MAP_CREATE_ONLY,          /* add new element if it didn't exist */
  BPF_MAP_UPDATE_ONLY           /* update existing element */
};

BPF_MAP_CREATE_ONLY can fail with EEXIST if element already exists.
BPF_MAP_UPDATE_ONLY can fail with ENOENT if element doesn't exist.

Userspace will call it as:
int bpf_update_elem(int fd, void *key, void *value, __u64 flags)
{
    union bpf_attr attr = {
        .map_fd = fd,
        .key = ptr_to_u64(key),
        .value = ptr_to_u64(value),
        .flags = flags;
    };

    return bpf(BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
}

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 include/linux/bpf.h      |    2 +-
 include/uapi/linux/bpf.h |   10 +++++++++-
 kernel/bpf/syscall.c     |    4 ++--
 3 files changed, 12 insertions(+), 4 deletions(-)

Comments

Daniel Borkmann Nov. 4, 2014, 9:25 a.m. UTC | #1
On 11/04/2014 03:54 AM, Alexei Starovoitov wrote:
> the current meaning of BPF_MAP_UPDATE_ELEM syscall command is:
> either update existing map element or create a new one.
> Initially the plan was to add a new command to handle the case of
> 'create new element if it didn't exist', but 'flags' style looks
> cleaner and overall diff is much smaller (more code reused), so add 'flags'
> attribute to BPF_MAP_UPDATE_ELEM command with the following meaning:
> enum {
>    BPF_MAP_UPDATE_OR_CREATE = 0, /* add new element or update existing */
>    BPF_MAP_CREATE_ONLY,          /* add new element if it didn't exist */
>    BPF_MAP_UPDATE_ONLY           /* update existing element */
> };

 From you commit message/code I currently don't see an explanation why
it cannot be done in typical ``flags style'' as various syscalls do,
i.e. BPF_MAP_UPDATE_OR_CREATE rather represented as ...

   BPF_MAP_CREATE | BPF_MAP_UPDATE

Do you expect more than 64 different flags to be passed from user space
for BPF_MAP?

> BPF_MAP_CREATE_ONLY can fail with EEXIST if element already exists.
> BPF_MAP_UPDATE_ONLY can fail with ENOENT if element doesn't exist.
>
> Userspace will call it as:
> int bpf_update_elem(int fd, void *key, void *value, __u64 flags)
> {
>      union bpf_attr attr = {
>          .map_fd = fd,
>          .key = ptr_to_u64(key),
>          .value = ptr_to_u64(value),
>          .flags = flags;
>      };
>
>      return bpf(BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
> }
>
> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov Nov. 4, 2014, 11:04 p.m. UTC | #2
On Tue, Nov 4, 2014 at 1:25 AM, Daniel Borkmann <dborkman@redhat.com> wrote:
>
> On 11/04/2014 03:54 AM, Alexei Starovoitov wrote:
>>
>> the current meaning of BPF_MAP_UPDATE_ELEM syscall command is:
>> either update existing map element or create a new one.
>> Initially the plan was to add a new command to handle the case of
>> 'create new element if it didn't exist', but 'flags' style looks
>> cleaner and overall diff is much smaller (more code reused), so add 'flags'
>> attribute to BPF_MAP_UPDATE_ELEM command with the following meaning:
>> enum {
>>    BPF_MAP_UPDATE_OR_CREATE = 0, /* add new element or update existing */
>>    BPF_MAP_CREATE_ONLY,          /* add new element if it didn't exist */
>>    BPF_MAP_UPDATE_ONLY           /* update existing element */
>> };
>
>
> From you commit message/code I currently don't see an explanation why
> it cannot be done in typical ``flags style'' as various syscalls do,
> i.e. BPF_MAP_UPDATE_OR_CREATE rather represented as ...
>
>   BPF_MAP_CREATE | BPF_MAP_UPDATE
>
> Do you expect more than 64 different flags to be passed from user space
> for BPF_MAP?

several reasons:
- preserve flags==0 as default behavior
- avoid holes and extra checks for invalid combinations, so
  if (flags > BPF_MAP_UPDATE_ONLY) goto err, is enough.
- it looks much neater when user space uses
  BPF_MAP_UPDATE_OR_CREATE instead of ORing bits.

Note this choice doesn't prevent adding bit-like flags
in the future. Today I cannot think of any new flags
for the update() command, but if somebody comes up with
a new selector that can apply to all three combinations,
we can add it as 3rd bit that can be ORed.
Default will stay zero and 'if >' check in older
kernels will seamlessly work with new userspace.
I don't like holes in flags and combinatorial
explosion of bits and checks for them unless
absolutely necessary.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Borkmann Nov. 5, 2014, 2:57 p.m. UTC | #3
On 11/05/2014 12:04 AM, Alexei Starovoitov wrote:
> On Tue, Nov 4, 2014 at 1:25 AM, Daniel Borkmann <dborkman@redhat.com> wrote:
>> On 11/04/2014 03:54 AM, Alexei Starovoitov wrote:
>>>
>>> the current meaning of BPF_MAP_UPDATE_ELEM syscall command is:
>>> either update existing map element or create a new one.
>>> Initially the plan was to add a new command to handle the case of
>>> 'create new element if it didn't exist', but 'flags' style looks
>>> cleaner and overall diff is much smaller (more code reused), so add 'flags'
>>> attribute to BPF_MAP_UPDATE_ELEM command with the following meaning:
>>> enum {
>>>     BPF_MAP_UPDATE_OR_CREATE = 0, /* add new element or update existing */
>>>     BPF_MAP_CREATE_ONLY,          /* add new element if it didn't exist */
>>>     BPF_MAP_UPDATE_ONLY           /* update existing element */
>>> };
>>
>>  From you commit message/code I currently don't see an explanation why
>> it cannot be done in typical ``flags style'' as various syscalls do,
>> i.e. BPF_MAP_UPDATE_OR_CREATE rather represented as ...
>>
>>    BPF_MAP_CREATE | BPF_MAP_UPDATE
>>
>> Do you expect more than 64 different flags to be passed from user space
>> for BPF_MAP?
>
> several reasons:
> - preserve flags==0 as default behavior
> - avoid holes and extra checks for invalid combinations, so
>    if (flags > BPF_MAP_UPDATE_ONLY) goto err, is enough.
> - it looks much neater when user space uses
>    BPF_MAP_UPDATE_OR_CREATE instead of ORing bits.
>
> Note this choice doesn't prevent adding bit-like flags
> in the future. Today I cannot think of any new flags
> for the update() command, but if somebody comes up with
> a new selector that can apply to all three combinations,
> we can add it as 3rd bit that can be ORed.

Hm, mixing enums together with bitfield-like flags seems
kind of hacky ... :/ Or, do you mean to say you're adding
a 2nd flag field, i.e. splitting the 64bits into a 32bit
``cmd enum'' and 32bit ``flag section''?

I see the point with flags == 0 as default behavior though,
but at this point in time we won't get burnt by it since
the API is not yet in a usable state and defaults to be
compiled-out.

> Default will stay zero and 'if >' check in older
> kernels will seamlessly work with new userspace.
> I don't like holes in flags and combinatorial
> explosion of bits and checks for them unless
> absolutely necessary.

Hm, my concern is that we start to add many *_OR_* enum
elements once we find that a flag might be a useful in
combination with many other flags ... even though if we
only can think of 3 flags /right now/.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov Nov. 6, 2014, 5:39 p.m. UTC | #4
On Wed, Nov 5, 2014 at 6:57 AM, Daniel Borkmann <dborkman@redhat.com> wrote:
> On 11/05/2014 12:04 AM, Alexei Starovoitov wrote:
>>
>> On Tue, Nov 4, 2014 at 1:25 AM, Daniel Borkmann <dborkman@redhat.com>
>> wrote:
>>>
>>> On 11/04/2014 03:54 AM, Alexei Starovoitov wrote:
>>>>
>>>>
>>>> the current meaning of BPF_MAP_UPDATE_ELEM syscall command is:
>>>> either update existing map element or create a new one.
>>>> Initially the plan was to add a new command to handle the case of
>>>> 'create new element if it didn't exist', but 'flags' style looks
>>>> cleaner and overall diff is much smaller (more code reused), so add
>>>> 'flags'
>>>> attribute to BPF_MAP_UPDATE_ELEM command with the following meaning:
>>>> enum {
>>>>     BPF_MAP_UPDATE_OR_CREATE = 0, /* add new element or update existing
>>>> */
>>>>     BPF_MAP_CREATE_ONLY,          /* add new element if it didn't exist
>>>> */
>>>>     BPF_MAP_UPDATE_ONLY           /* update existing element */
>>>> };
>>>
>>>
>>>  From you commit message/code I currently don't see an explanation why
>>> it cannot be done in typical ``flags style'' as various syscalls do,
>>> i.e. BPF_MAP_UPDATE_OR_CREATE rather represented as ...
>>>
>>>    BPF_MAP_CREATE | BPF_MAP_UPDATE
>>>
>>> Do you expect more than 64 different flags to be passed from user space
>>> for BPF_MAP?
>>
>>
>> several reasons:
>> - preserve flags==0 as default behavior
>> - avoid holes and extra checks for invalid combinations, so
>>    if (flags > BPF_MAP_UPDATE_ONLY) goto err, is enough.
>> - it looks much neater when user space uses
>>    BPF_MAP_UPDATE_OR_CREATE instead of ORing bits.
>>
>> Note this choice doesn't prevent adding bit-like flags
>> in the future. Today I cannot think of any new flags
>> for the update() command, but if somebody comes up with
>> a new selector that can apply to all three combinations,
>> we can add it as 3rd bit that can be ORed.
>
>
> Hm, mixing enums together with bitfield-like flags seems
> kind of hacky ... :/ Or, do you mean to say you're adding
> a 2nd flag field, i.e. splitting the 64bits into a 32bit
> ``cmd enum'' and 32bit ``flag section''?

something like this.
or splitting 64-bit into 2 and 62. We'll see.
First two encode this 'type' of update and the rest -
whatever else.

> Hm, my concern is that we start to add many *_OR_* enum
> elements once we find that a flag might be a useful in
> combination with many other flags ... even though if we
> only can think of 3 flags /right now/.

Agree. Adding many *_OR_* would look bad, that's
why I said that future additions can be bits. Like in
paragraph above.

Also, we don't have 3 flags now. In this patch I'm
showing 3 types and you're suggesting to treat
them as 2 flags. To me that's incorrect, since 'no flags'
becomes invalid combination, which logically incorrect.
Therefore I cannot see them as 'flags'. This is a 'type'
or 'style' of update() command.

I think it actually matches how open() defines things
in similar situation:
#define O_RDONLY        00000000
#define O_WRONLY        00000001
#define O_RDWR          00000002
We used to think of them as flags, but they're not
bit flags, though the rest of open() flags are bit-like.
If we apply your argument to open() then open()
should have defined O_RD as 1 and OR_WR as 2
and force everyone to mix and match them, but
then zero would be invalid. So I still think that
what I have is a cleaner API :)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 3cf91754a957..51e9242e4803 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -22,7 +22,7 @@  struct bpf_map_ops {
 
 	/* funcs callable from userspace and from eBPF programs */
 	void *(*map_lookup_elem)(struct bpf_map *map, void *key);
-	int (*map_update_elem)(struct bpf_map *map, void *key, void *value);
+	int (*map_update_elem)(struct bpf_map *map, void *key, void *value, u64 flags);
 	int (*map_delete_elem)(struct bpf_map *map, void *key);
 };
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d18316f9e9c4..19c7ae4a4dd5 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -82,7 +82,7 @@  enum bpf_cmd {
 
 	/* create or update key/value pair in a given map
 	 * err = bpf(BPF_MAP_UPDATE_ELEM, union bpf_attr *attr, u32 size)
-	 * Using attr->map_fd, attr->key, attr->value
+	 * Using attr->map_fd, attr->key, attr->value, attr->flags
 	 * returns zero or negative error
 	 */
 	BPF_MAP_UPDATE_ELEM,
@@ -117,6 +117,13 @@  enum bpf_prog_type {
 	BPF_PROG_TYPE_UNSPEC,
 };
 
+/* flags for BPF_MAP_UPDATE_ELEM command */
+enum bpf_map_update_flags {
+	BPF_MAP_UPDATE_OR_CREATE = 0,	/* add new element or update existing */
+	BPF_MAP_CREATE_ONLY,		/* add new element if it didn't exist */
+	BPF_MAP_UPDATE_ONLY		/* update existing element */
+};
+
 union bpf_attr {
 	struct { /* anonymous struct used by BPF_MAP_CREATE command */
 		__u32	map_type;	/* one of enum bpf_map_type */
@@ -132,6 +139,7 @@  union bpf_attr {
 			__aligned_u64 value;
 			__aligned_u64 next_key;
 		};
+		__u64		flags;
 	};
 
 	struct { /* anonymous struct used by BPF_PROG_LOAD command */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index ba61c8c16032..c0d03bf317a2 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -190,7 +190,7 @@  err_put:
 	return err;
 }
 
-#define BPF_MAP_UPDATE_ELEM_LAST_FIELD value
+#define BPF_MAP_UPDATE_ELEM_LAST_FIELD flags
 
 static int map_update_elem(union bpf_attr *attr)
 {
@@ -231,7 +231,7 @@  static int map_update_elem(union bpf_attr *attr)
 	 * therefore all map accessors rely on this fact, so do the same here
 	 */
 	rcu_read_lock();
-	err = map->ops->map_update_elem(map, key, value);
+	err = map->ops->map_update_elem(map, key, value, attr->flags);
 	rcu_read_unlock();
 
 free_value: