Message ID | 1415069656-14138-2-git-send-email-ast@plumgrid.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
On 11/04/2014 03:54 AM, Alexei Starovoitov wrote: > the current meaning of BPF_MAP_UPDATE_ELEM syscall command is: > either update existing map element or create a new one. > Initially the plan was to add a new command to handle the case of > 'create new element if it didn't exist', but 'flags' style looks > cleaner and overall diff is much smaller (more code reused), so add 'flags' > attribute to BPF_MAP_UPDATE_ELEM command with the following meaning: > enum { > BPF_MAP_UPDATE_OR_CREATE = 0, /* add new element or update existing */ > BPF_MAP_CREATE_ONLY, /* add new element if it didn't exist */ > BPF_MAP_UPDATE_ONLY /* update existing element */ > }; From you commit message/code I currently don't see an explanation why it cannot be done in typical ``flags style'' as various syscalls do, i.e. BPF_MAP_UPDATE_OR_CREATE rather represented as ... BPF_MAP_CREATE | BPF_MAP_UPDATE Do you expect more than 64 different flags to be passed from user space for BPF_MAP? > BPF_MAP_CREATE_ONLY can fail with EEXIST if element already exists. > BPF_MAP_UPDATE_ONLY can fail with ENOENT if element doesn't exist. > > Userspace will call it as: > int bpf_update_elem(int fd, void *key, void *value, __u64 flags) > { > union bpf_attr attr = { > .map_fd = fd, > .key = ptr_to_u64(key), > .value = ptr_to_u64(value), > .flags = flags; > }; > > return bpf(BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr)); > } > > Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 4, 2014 at 1:25 AM, Daniel Borkmann <dborkman@redhat.com> wrote: > > On 11/04/2014 03:54 AM, Alexei Starovoitov wrote: >> >> the current meaning of BPF_MAP_UPDATE_ELEM syscall command is: >> either update existing map element or create a new one. >> Initially the plan was to add a new command to handle the case of >> 'create new element if it didn't exist', but 'flags' style looks >> cleaner and overall diff is much smaller (more code reused), so add 'flags' >> attribute to BPF_MAP_UPDATE_ELEM command with the following meaning: >> enum { >> BPF_MAP_UPDATE_OR_CREATE = 0, /* add new element or update existing */ >> BPF_MAP_CREATE_ONLY, /* add new element if it didn't exist */ >> BPF_MAP_UPDATE_ONLY /* update existing element */ >> }; > > > From you commit message/code I currently don't see an explanation why > it cannot be done in typical ``flags style'' as various syscalls do, > i.e. BPF_MAP_UPDATE_OR_CREATE rather represented as ... > > BPF_MAP_CREATE | BPF_MAP_UPDATE > > Do you expect more than 64 different flags to be passed from user space > for BPF_MAP? several reasons: - preserve flags==0 as default behavior - avoid holes and extra checks for invalid combinations, so if (flags > BPF_MAP_UPDATE_ONLY) goto err, is enough. - it looks much neater when user space uses BPF_MAP_UPDATE_OR_CREATE instead of ORing bits. Note this choice doesn't prevent adding bit-like flags in the future. Today I cannot think of any new flags for the update() command, but if somebody comes up with a new selector that can apply to all three combinations, we can add it as 3rd bit that can be ORed. Default will stay zero and 'if >' check in older kernels will seamlessly work with new userspace. I don't like holes in flags and combinatorial explosion of bits and checks for them unless absolutely necessary. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/05/2014 12:04 AM, Alexei Starovoitov wrote: > On Tue, Nov 4, 2014 at 1:25 AM, Daniel Borkmann <dborkman@redhat.com> wrote: >> On 11/04/2014 03:54 AM, Alexei Starovoitov wrote: >>> >>> the current meaning of BPF_MAP_UPDATE_ELEM syscall command is: >>> either update existing map element or create a new one. >>> Initially the plan was to add a new command to handle the case of >>> 'create new element if it didn't exist', but 'flags' style looks >>> cleaner and overall diff is much smaller (more code reused), so add 'flags' >>> attribute to BPF_MAP_UPDATE_ELEM command with the following meaning: >>> enum { >>> BPF_MAP_UPDATE_OR_CREATE = 0, /* add new element or update existing */ >>> BPF_MAP_CREATE_ONLY, /* add new element if it didn't exist */ >>> BPF_MAP_UPDATE_ONLY /* update existing element */ >>> }; >> >> From you commit message/code I currently don't see an explanation why >> it cannot be done in typical ``flags style'' as various syscalls do, >> i.e. BPF_MAP_UPDATE_OR_CREATE rather represented as ... >> >> BPF_MAP_CREATE | BPF_MAP_UPDATE >> >> Do you expect more than 64 different flags to be passed from user space >> for BPF_MAP? > > several reasons: > - preserve flags==0 as default behavior > - avoid holes and extra checks for invalid combinations, so > if (flags > BPF_MAP_UPDATE_ONLY) goto err, is enough. > - it looks much neater when user space uses > BPF_MAP_UPDATE_OR_CREATE instead of ORing bits. > > Note this choice doesn't prevent adding bit-like flags > in the future. Today I cannot think of any new flags > for the update() command, but if somebody comes up with > a new selector that can apply to all three combinations, > we can add it as 3rd bit that can be ORed. Hm, mixing enums together with bitfield-like flags seems kind of hacky ... :/ Or, do you mean to say you're adding a 2nd flag field, i.e. splitting the 64bits into a 32bit ``cmd enum'' and 32bit ``flag section''? I see the point with flags == 0 as default behavior though, but at this point in time we won't get burnt by it since the API is not yet in a usable state and defaults to be compiled-out. > Default will stay zero and 'if >' check in older > kernels will seamlessly work with new userspace. > I don't like holes in flags and combinatorial > explosion of bits and checks for them unless > absolutely necessary. Hm, my concern is that we start to add many *_OR_* enum elements once we find that a flag might be a useful in combination with many other flags ... even though if we only can think of 3 flags /right now/. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Nov 5, 2014 at 6:57 AM, Daniel Borkmann <dborkman@redhat.com> wrote: > On 11/05/2014 12:04 AM, Alexei Starovoitov wrote: >> >> On Tue, Nov 4, 2014 at 1:25 AM, Daniel Borkmann <dborkman@redhat.com> >> wrote: >>> >>> On 11/04/2014 03:54 AM, Alexei Starovoitov wrote: >>>> >>>> >>>> the current meaning of BPF_MAP_UPDATE_ELEM syscall command is: >>>> either update existing map element or create a new one. >>>> Initially the plan was to add a new command to handle the case of >>>> 'create new element if it didn't exist', but 'flags' style looks >>>> cleaner and overall diff is much smaller (more code reused), so add >>>> 'flags' >>>> attribute to BPF_MAP_UPDATE_ELEM command with the following meaning: >>>> enum { >>>> BPF_MAP_UPDATE_OR_CREATE = 0, /* add new element or update existing >>>> */ >>>> BPF_MAP_CREATE_ONLY, /* add new element if it didn't exist >>>> */ >>>> BPF_MAP_UPDATE_ONLY /* update existing element */ >>>> }; >>> >>> >>> From you commit message/code I currently don't see an explanation why >>> it cannot be done in typical ``flags style'' as various syscalls do, >>> i.e. BPF_MAP_UPDATE_OR_CREATE rather represented as ... >>> >>> BPF_MAP_CREATE | BPF_MAP_UPDATE >>> >>> Do you expect more than 64 different flags to be passed from user space >>> for BPF_MAP? >> >> >> several reasons: >> - preserve flags==0 as default behavior >> - avoid holes and extra checks for invalid combinations, so >> if (flags > BPF_MAP_UPDATE_ONLY) goto err, is enough. >> - it looks much neater when user space uses >> BPF_MAP_UPDATE_OR_CREATE instead of ORing bits. >> >> Note this choice doesn't prevent adding bit-like flags >> in the future. Today I cannot think of any new flags >> for the update() command, but if somebody comes up with >> a new selector that can apply to all three combinations, >> we can add it as 3rd bit that can be ORed. > > > Hm, mixing enums together with bitfield-like flags seems > kind of hacky ... :/ Or, do you mean to say you're adding > a 2nd flag field, i.e. splitting the 64bits into a 32bit > ``cmd enum'' and 32bit ``flag section''? something like this. or splitting 64-bit into 2 and 62. We'll see. First two encode this 'type' of update and the rest - whatever else. > Hm, my concern is that we start to add many *_OR_* enum > elements once we find that a flag might be a useful in > combination with many other flags ... even though if we > only can think of 3 flags /right now/. Agree. Adding many *_OR_* would look bad, that's why I said that future additions can be bits. Like in paragraph above. Also, we don't have 3 flags now. In this patch I'm showing 3 types and you're suggesting to treat them as 2 flags. To me that's incorrect, since 'no flags' becomes invalid combination, which logically incorrect. Therefore I cannot see them as 'flags'. This is a 'type' or 'style' of update() command. I think it actually matches how open() defines things in similar situation: #define O_RDONLY 00000000 #define O_WRONLY 00000001 #define O_RDWR 00000002 We used to think of them as flags, but they're not bit flags, though the rest of open() flags are bit-like. If we apply your argument to open() then open() should have defined O_RD as 1 and OR_WR as 2 and force everyone to mix and match them, but then zero would be invalid. So I still think that what I have is a cleaner API :) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 3cf91754a957..51e9242e4803 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -22,7 +22,7 @@ struct bpf_map_ops { /* funcs callable from userspace and from eBPF programs */ void *(*map_lookup_elem)(struct bpf_map *map, void *key); - int (*map_update_elem)(struct bpf_map *map, void *key, void *value); + int (*map_update_elem)(struct bpf_map *map, void *key, void *value, u64 flags); int (*map_delete_elem)(struct bpf_map *map, void *key); }; diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index d18316f9e9c4..19c7ae4a4dd5 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -82,7 +82,7 @@ enum bpf_cmd { /* create or update key/value pair in a given map * err = bpf(BPF_MAP_UPDATE_ELEM, union bpf_attr *attr, u32 size) - * Using attr->map_fd, attr->key, attr->value + * Using attr->map_fd, attr->key, attr->value, attr->flags * returns zero or negative error */ BPF_MAP_UPDATE_ELEM, @@ -117,6 +117,13 @@ enum bpf_prog_type { BPF_PROG_TYPE_UNSPEC, }; +/* flags for BPF_MAP_UPDATE_ELEM command */ +enum bpf_map_update_flags { + BPF_MAP_UPDATE_OR_CREATE = 0, /* add new element or update existing */ + BPF_MAP_CREATE_ONLY, /* add new element if it didn't exist */ + BPF_MAP_UPDATE_ONLY /* update existing element */ +}; + union bpf_attr { struct { /* anonymous struct used by BPF_MAP_CREATE command */ __u32 map_type; /* one of enum bpf_map_type */ @@ -132,6 +139,7 @@ union bpf_attr { __aligned_u64 value; __aligned_u64 next_key; }; + __u64 flags; }; struct { /* anonymous struct used by BPF_PROG_LOAD command */ diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index ba61c8c16032..c0d03bf317a2 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -190,7 +190,7 @@ err_put: return err; } -#define BPF_MAP_UPDATE_ELEM_LAST_FIELD value +#define BPF_MAP_UPDATE_ELEM_LAST_FIELD flags static int map_update_elem(union bpf_attr *attr) { @@ -231,7 +231,7 @@ static int map_update_elem(union bpf_attr *attr) * therefore all map accessors rely on this fact, so do the same here */ rcu_read_lock(); - err = map->ops->map_update_elem(map, key, value); + err = map->ops->map_update_elem(map, key, value, attr->flags); rcu_read_unlock(); free_value:
the current meaning of BPF_MAP_UPDATE_ELEM syscall command is: either update existing map element or create a new one. Initially the plan was to add a new command to handle the case of 'create new element if it didn't exist', but 'flags' style looks cleaner and overall diff is much smaller (more code reused), so add 'flags' attribute to BPF_MAP_UPDATE_ELEM command with the following meaning: enum { BPF_MAP_UPDATE_OR_CREATE = 0, /* add new element or update existing */ BPF_MAP_CREATE_ONLY, /* add new element if it didn't exist */ BPF_MAP_UPDATE_ONLY /* update existing element */ }; BPF_MAP_CREATE_ONLY can fail with EEXIST if element already exists. BPF_MAP_UPDATE_ONLY can fail with ENOENT if element doesn't exist. Userspace will call it as: int bpf_update_elem(int fd, void *key, void *value, __u64 flags) { union bpf_attr attr = { .map_fd = fd, .key = ptr_to_u64(key), .value = ptr_to_u64(value), .flags = flags; }; return bpf(BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr)); } Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> --- include/linux/bpf.h | 2 +- include/uapi/linux/bpf.h | 10 +++++++++- kernel/bpf/syscall.c | 4 ++-- 3 files changed, 12 insertions(+), 4 deletions(-)