mbox series

[RFC,bpf-next,0/3] bpf: adding map batch processing support

Message ID 20191107212023.171208-1-brianvv@google.com
Headers show
Series bpf: adding map batch processing support | expand

Message

Brian Vazquez Nov. 7, 2019, 9:20 p.m. UTC
This is a follow up in the effort to batch bpf map operations to reduce
the syscall overhead with the map_ops. I initially proposed the idea and
the discussion is here:
https://lore.kernel.org/bpf/20190724165803.87470-1-brianvv@google.com/

Yonghong talked at the LPC about this and also proposed and idea that
handles the special weird case of hashtables by doing traversing using
the buckets as a reference instead of a key. Discussion is here:
https://lore.kernel.org/bpf/20190906225434.3635421-1-yhs@fb.com/

This RFC proposes a way to extend batch operations for more data
structures by creating generic batch functions that can be used instead
of implementing the operations for each individual data structure,
reducing the code that needs to be maintained. The series contains the
patches used in Yonghong's RFC and the patch that adds the generic
implementation of the operations plus some testing with pcpu hashmaps
and arrays. Note that pcpu hashmap shouldn't use the generic
implementation and it either should have its own implementation or share
the one introduced by Yonghong, I added that just as an example to show
that the generic implementation can be easily added to a data structure.

What I want to achieve with this RFC is to collect early feedback and see if
there's any major concern about this before I move forward. I do plan
to better separate this into different patches and explain them properly
in the commit messages.

Current known issues where I would like to discuss are the followings:

- Because Yonghong's UAPI definition was done specifically for
  iterating buckets, the batch field is u64 and is treated as an u64
  instead of an opaque pointer, this won't work for other data structures
  that are going to use a key as a batch token with a size greater than
  64. Although I think at this point the only key that couldn't be
  treated as a u64 is the key of a hashmap, and the hashmap won't use
  the generic interface.
- Not all the data structures use delete (because it's not a valid
  operation) i.e. arrays. So maybe lookup_and_delete_batch command is
  not needed and we can handle that operation with a lookup_batch and a
  flag.
- For delete_batch (not just the lookup_and_delete_batch). Is this
  operation really needed? If so, shouldn't it be better if the
  behaviour is delete the keys provided? I did that with my generic
  implementation but Yonghong's delete_batch for a hashmap deletes
  buckets.

Brian Vazquez (1):
  bpf: add generic batch support

Yonghong Song (2):
  bpf: adding map batch processing support
  tools/bpf: test bpf_map_lookup_and_delete_batch()

 include/linux/bpf.h                           |  21 +
 include/uapi/linux/bpf.h                      |  22 +
 kernel/bpf/arraymap.c                         |   4 +
 kernel/bpf/hashtab.c                          | 331 ++++++++++
 kernel/bpf/syscall.c                          | 573 ++++++++++++++----
 tools/include/uapi/linux/bpf.h                |  22 +
 tools/lib/bpf/bpf.c                           |  59 ++
 tools/lib/bpf/bpf.h                           |  13 +
 tools/lib/bpf/libbpf.map                      |   4 +
 .../map_tests/map_lookup_and_delete_batch.c   | 245 ++++++++
 .../map_lookup_and_delete_batch_array.c       | 118 ++++
 11 files changed, 1292 insertions(+), 120 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch.c
 create mode 100644 tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch_array.c

Comments

Yonghong Song Nov. 13, 2019, 2:34 a.m. UTC | #1
On 11/7/19 1:20 PM, Brian Vazquez wrote:
> This is a follow up in the effort to batch bpf map operations to reduce
> the syscall overhead with the map_ops. I initially proposed the idea and
> the discussion is here:
> https://lore.kernel.org/bpf/20190724165803.87470-1-brianvv@google.com/
> 
> Yonghong talked at the LPC about this and also proposed and idea that
> handles the special weird case of hashtables by doing traversing using
> the buckets as a reference instead of a key. Discussion is here:
> https://lore.kernel.org/bpf/20190906225434.3635421-1-yhs@fb.com/
> 
> This RFC proposes a way to extend batch operations for more data
> structures by creating generic batch functions that can be used instead
> of implementing the operations for each individual data structure,
> reducing the code that needs to be maintained. The series contains the
> patches used in Yonghong's RFC and the patch that adds the generic
> implementation of the operations plus some testing with pcpu hashmaps
> and arrays. Note that pcpu hashmap shouldn't use the generic
> implementation and it either should have its own implementation or share
> the one introduced by Yonghong, I added that just as an example to show
> that the generic implementation can be easily added to a data structure.
> 
> What I want to achieve with this RFC is to collect early feedback and see if
> there's any major concern about this before I move forward. I do plan
> to better separate this into different patches and explain them properly
> in the commit messages.

Thanks Brian for working on this. The general approach described here
is good to me. Having a generic implementation for batch operations
looks good for maps (not hash table, queue/stack, etc.)

> 
> Current known issues where I would like to discuss are the followings:
> 
> - Because Yonghong's UAPI definition was done specifically for
>    iterating buckets, the batch field is u64 and is treated as an u64
>    instead of an opaque pointer, this won't work for other data structures
>    that are going to use a key as a batch token with a size greater than
>    64. Although I think at this point the only key that couldn't be
>    treated as a u64 is the key of a hashmap, and the hashmap won't use
>    the generic interface.

The u64 can be changed with a __aligned_u64 opaque value. This way,
it can represent a pointer or a 64bit value.

> - Not all the data structures use delete (because it's not a valid
>    operation) i.e. arrays. So maybe lookup_and_delete_batch command is
>    not needed and we can handle that operation with a lookup_batch and a
>    flag.

This make sense.

> - For delete_batch (not just the lookup_and_delete_batch). Is this
>    operation really needed? If so, shouldn't it be better if the
>    behaviour is delete the keys provided? I did that with my generic
>    implementation but Yonghong's delete_batch for a hashmap deletes
>    buckets.

We need batched delete in bcc. lookup_and_delete_batch is better as
it can preserves more new map entries. Alternatively, deleting
all entries after lookup is another option. But this may remove
more new map entries. Statistically this may or may not matter though.

bcc does have a clear_table (clear_map) API, but not clear who is using it.

So, I did not have a concrete use case for delete_batch yet.
I tend to think we should have delete_batch for API completeness,
but maybe other people can comment on this as well.

Maybe initial patch, we can skip it. But we should still ensure
user interface data structure can handle batch delete if it is
needed later. The current data structure should handle this
as far as I know.

> 
> Brian Vazquez (1):
>    bpf: add generic batch support
> 
> Yonghong Song (2):
>    bpf: adding map batch processing support
>    tools/bpf: test bpf_map_lookup_and_delete_batch()
> 
>   include/linux/bpf.h                           |  21 +
>   include/uapi/linux/bpf.h                      |  22 +
>   kernel/bpf/arraymap.c                         |   4 +
>   kernel/bpf/hashtab.c                          | 331 ++++++++++
>   kernel/bpf/syscall.c                          | 573 ++++++++++++++----
>   tools/include/uapi/linux/bpf.h                |  22 +
>   tools/lib/bpf/bpf.c                           |  59 ++
>   tools/lib/bpf/bpf.h                           |  13 +
>   tools/lib/bpf/libbpf.map                      |   4 +
>   .../map_tests/map_lookup_and_delete_batch.c   | 245 ++++++++
>   .../map_lookup_and_delete_batch_array.c       | 118 ++++
>   11 files changed, 1292 insertions(+), 120 deletions(-)
>   create mode 100644 tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch.c
>   create mode 100644 tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch_array.c
>
Yonghong Song Nov. 13, 2019, 10:26 p.m. UTC | #2
On 11/13/19 2:07 PM, Brian Vazquez wrote:
> Hi Yonghong,
> 
> Thanks for reviewing it! I'm preparing the changes and will submit them 
> sooner.
> 
> As for the right way to manage author rights, does anyone know what the 
> correct approach is? Should I use Yonghong's patch and apply the 
> extended support in different patches (i.e. support per_cpu maps, change 
> batch from u64 to __aligned_u64, etc) or it is fine to apply the changes 
> in place and write both sign-offs?

The logic flow of the patch set is most important.
You can add me as co-signoff if you reuse significant portion of my code.

> 
> Thanks,
> Brian
> 
> On Tue, Nov 12, 2019 at 6:34 PM Yonghong Song <yhs@fb.com 
> <mailto:yhs@fb.com>> wrote:
> 
> 
> 
>     On 11/7/19 1:20 PM, Brian Vazquez wrote:
>      > This is a follow up in the effort to batch bpf map operations to
>     reduce
>      > the syscall overhead with the map_ops. I initially proposed the
>     idea and
>      > the discussion is here:
>      >
>     https://lore.kernel.org/bpf/20190724165803.87470-1-brianvv@google.com/
>      >
>      > Yonghong talked at the LPC about this and also proposed and idea that
>      > handles the special weird case of hashtables by doing traversing
>     using
>      > the buckets as a reference instead of a key. Discussion is here:
>      > https://lore.kernel.org/bpf/20190906225434.3635421-1-yhs@fb.com/
>      >
>      > This RFC proposes a way to extend batch operations for more data
>      > structures by creating generic batch functions that can be used
>     instead
>      > of implementing the operations for each individual data structure,
>      > reducing the code that needs to be maintained. The series
>     contains the
>      > patches used in Yonghong's RFC and the patch that adds the generic
>      > implementation of the operations plus some testing with pcpu hashmaps
>      > and arrays. Note that pcpu hashmap shouldn't use the generic
>      > implementation and it either should have its own implementation
>     or share
>      > the one introduced by Yonghong, I added that just as an example
>     to show
>      > that the generic implementation can be easily added to a data
>     structure.
>      >
>      > What I want to achieve with this RFC is to collect early feedback
>     and see if
>      > there's any major concern about this before I move forward. I do plan
>      > to better separate this into different patches and explain them
>     properly
>      > in the commit messages.
> 
>     Thanks Brian for working on this. The general approach described here
>     is good to me. Having a generic implementation for batch operations
>     looks good for maps (not hash table, queue/stack, etc.)
> 
>      >
>      > Current known issues where I would like to discuss are the
>     followings:
>      >
>      > - Because Yonghong's UAPI definition was done specifically for
>      >    iterating buckets, the batch field is u64 and is treated as an u64
>      >    instead of an opaque pointer, this won't work for other data
>     structures
>      >    that are going to use a key as a batch token with a size
>     greater than
>      >    64. Although I think at this point the only key that couldn't be
>      >    treated as a u64 is the key of a hashmap, and the hashmap
>     won't use
>      >    the generic interface.
> 
>     The u64 can be changed with a __aligned_u64 opaque value. This way,
>     it can represent a pointer or a 64bit value.
> 
>      > - Not all the data structures use delete (because it's not a valid
>      >    operation) i.e. arrays. So maybe lookup_and_delete_batch
>     command is
>      >    not needed and we can handle that operation with a
>     lookup_batch and a
>      >    flag.
> 
>     This make sense.
> 
>      > - For delete_batch (not just the lookup_and_delete_batch). Is this
>      >    operation really needed? If so, shouldn't it be better if the
>      >    behaviour is delete the keys provided? I did that with my generic
>      >    implementation but Yonghong's delete_batch for a hashmap deletes
>      >    buckets.
> 
>     We need batched delete in bcc. lookup_and_delete_batch is better as
>     it can preserves more new map entries. Alternatively, deleting
>     all entries after lookup is another option. But this may remove
>     more new map entries. Statistically this may or may not matter though.
> 
>     bcc does have a clear_table (clear_map) API, but not clear who is
>     using it.
> 
>     So, I did not have a concrete use case for delete_batch yet.
>     I tend to think we should have delete_batch for API completeness,
>     but maybe other people can comment on this as well.
> 
>     Maybe initial patch, we can skip it. But we should still ensure
>     user interface data structure can handle batch delete if it is
>     needed later. The current data structure should handle this
>     as far as I know.
> 
>      >
>      > Brian Vazquez (1):
>      >    bpf: add generic batch support
>      >
>      > Yonghong Song (2):
>      >    bpf: adding map batch processing support
>      >    tools/bpf: test bpf_map_lookup_and_delete_batch()
>      >
>      >   include/linux/bpf.h                           |  21 +
>      >   include/uapi/linux/bpf.h                      |  22 +
>      >   kernel/bpf/arraymap.c                         |   4 +
>      >   kernel/bpf/hashtab.c                          | 331 ++++++++++
>      >   kernel/bpf/syscall.c                          | 573
>     ++++++++++++++----
>      >   tools/include/uapi/linux/bpf.h                |  22 +
>      >   tools/lib/bpf/bpf.c                           |  59 ++
>      >   tools/lib/bpf/bpf.h                           |  13 +
>      >   tools/lib/bpf/libbpf.map                      |   4 +
>      >   .../map_tests/map_lookup_and_delete_batch.c   | 245 ++++++++
>      >   .../map_lookup_and_delete_batch_array.c       | 118 ++++
>      >   11 files changed, 1292 insertions(+), 120 deletions(-)
>      >   create mode 100644
>     tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch.c
>      >   create mode 100644
>     tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch_array.c
>      >
>