mbox series

[bpf-next,v3,00/13] bpf: implement bpf iterator for map elements

Message ID 20200723061533.2099842-1-yhs@fb.com
Headers show
Series bpf: implement bpf iterator for map elements | expand

Message

Yonghong Song July 23, 2020, 6:15 a.m. UTC
Bpf iterator has been implemented for task, task_file,
bpf_map, ipv6_route, netlink, tcp and udp so far.

For map elements, there are two ways to traverse all elements from
user space:
  1. using BPF_MAP_GET_NEXT_KEY bpf subcommand to get elements
     one by one.
  2. using BPF_MAP_LOOKUP_BATCH bpf subcommand to get a batch of
     elements.
Both these approaches need to copy data from kernel to user space
in order to do inspection.

This patch implements bpf iterator for map elements.
User can have a bpf program in kernel to run with each map element,
do checking, filtering, aggregation, modifying values etc.
without copying data to user space.

Patch #1 and #2 are refactoring. Patch #3 implements readonly/readwrite
buffer support in verifier. Patches #4 - #7 implements map element
support for hash, percpu hash, lru hash lru percpu hash, array,
percpu array and sock local storage maps. Patches #8 - #9 are libbpf
and bpftool support. Patches #10 - #13 are selftests for implemented
map element iterators.

Changelogs:
  v2 -> v3:
    . rebase on top of latest bpf-next
  v1 -> v2:
    . support to modify map element values. (Alexei)
    . map key/values can be used with helper arguments
      for those arguments with ARG_PTR_TO_MEM or
      ARG_PTR_TO_INIT_MEM register type. (Alexei)
    . remove usused variable. (kernel test robot)

Yonghong Song (13):
  bpf: refactor bpf_iter_reg to have separate seq_info member
  bpf: refactor to provide aux info to bpf_iter_init_seq_priv_t
  bpf: support readonly/readwrite buffers in verifier
  bpf: implement bpf iterator for map elements
  bpf: implement bpf iterator for hash maps
  bpf: implement bpf iterator for array maps
  bpf: implement bpf iterator for sock local storage map
  tools/libbpf: add support for bpf map element iterator
  tools/bpftool: add bpftool support for bpf map element iterator
  selftests/bpf: add test for bpf hash map iterators
  selftests/bpf: add test for bpf array map iterators
  selftests/bpf: add a test for bpf sk_storage_map iterator
  selftests/bpf: add a test for out of bound rdonly buf access

 fs/proc/proc_net.c                            |   2 +-
 include/linux/bpf.h                           |  42 +-
 include/linux/proc_fs.h                       |   3 +-
 include/uapi/linux/bpf.h                      |   7 +
 kernel/bpf/arraymap.c                         | 138 ++++++
 kernel/bpf/bpf_iter.c                         |  89 +++-
 kernel/bpf/btf.c                              |  13 +
 kernel/bpf/hashtab.c                          | 194 ++++++++
 kernel/bpf/map_iter.c                         |  62 ++-
 kernel/bpf/prog_iter.c                        |   8 +-
 kernel/bpf/task_iter.c                        |  18 +-
 kernel/bpf/verifier.c                         |  91 +++-
 net/core/bpf_sk_storage.c                     | 206 ++++++++
 net/ipv4/tcp_ipv4.c                           |  12 +-
 net/ipv4/udp.c                                |  12 +-
 net/ipv6/route.c                              |   8 +-
 net/netlink/af_netlink.c                      |   8 +-
 .../bpftool/Documentation/bpftool-iter.rst    |  18 +-
 tools/bpf/bpftool/bash-completion/bpftool     |  18 +-
 tools/bpf/bpftool/iter.c                      |  33 +-
 tools/include/uapi/linux/bpf.h                |   7 +
 tools/lib/bpf/bpf.c                           |   1 +
 tools/lib/bpf/bpf.h                           |   3 +-
 tools/lib/bpf/libbpf.c                        |  10 +-
 tools/lib/bpf/libbpf.h                        |   3 +-
 .../selftests/bpf/prog_tests/bpf_iter.c       | 442 ++++++++++++++++++
 .../bpf/progs/bpf_iter_bpf_array_map.c        |  40 ++
 .../bpf/progs/bpf_iter_bpf_hash_map.c         | 100 ++++
 .../bpf/progs/bpf_iter_bpf_percpu_array_map.c |  46 ++
 .../bpf/progs/bpf_iter_bpf_percpu_hash_map.c  |  50 ++
 .../bpf/progs/bpf_iter_bpf_sk_storage_map.c   |  34 ++
 .../selftests/bpf/progs/bpf_iter_test_kern5.c |  35 ++
 32 files changed, 1688 insertions(+), 65 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_bpf_array_map.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_bpf_hash_map.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_bpf_percpu_array_map.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_bpf_percpu_hash_map.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_map.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_test_kern5.c

Comments

Alexei Starovoitov July 23, 2020, 6:53 a.m. UTC | #1
On Wed, Jul 22, 2020 at 11:15:33PM -0700, Yonghong Song wrote:
> Bpf iterator has been implemented for task, task_file,
> bpf_map, ipv6_route, netlink, tcp and udp so far.
> 
> For map elements, there are two ways to traverse all elements from
> user space:
>   1. using BPF_MAP_GET_NEXT_KEY bpf subcommand to get elements
>      one by one.
>   2. using BPF_MAP_LOOKUP_BATCH bpf subcommand to get a batch of
>      elements.
> Both these approaches need to copy data from kernel to user space
> in order to do inspection.
> 
> This patch implements bpf iterator for map elements.
> User can have a bpf program in kernel to run with each map element,
> do checking, filtering, aggregation, modifying values etc.
> without copying data to user space.
> 
> Patch #1 and #2 are refactoring. Patch #3 implements readonly/readwrite
> buffer support in verifier. Patches #4 - #7 implements map element
> support for hash, percpu hash, lru hash lru percpu hash, array,
> percpu array and sock local storage maps. Patches #8 - #9 are libbpf
> and bpftool support. Patches #10 - #13 are selftests for implemented
> map element iterators.

kasan is not happy:

[   16.896170] ==================================================================
[   16.896994] BUG: KASAN: use-after-free in __do_sys_bpf+0x34f3/0x3860
[   16.897657] Read of size 4 at addr ffff8881f105b208 by task test_progs/1958
[   16.898416]
[   16.898577] CPU: 0 PID: 1958 Comm: test_progs Not tainted 5.8.0-rc4-01920-g6276000cd38e #2828
[   16.899505] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
[   16.900405] Call Trace:
[   16.900679]  dump_stack+0x7d/0xb0
[   16.901068]  print_address_description.constprop.0+0x3a/0x60
[   16.901689]  ? __do_sys_bpf+0x34f3/0x3860
[   16.902125]  kasan_report.cold+0x1f/0x37
[   16.902595]  ? __do_sys_bpf+0x34f3/0x3860
[   16.903029]  __do_sys_bpf+0x34f3/0x3860
[   16.903494]  ? bpf_trace_run2+0xd1/0x210
[   16.903971]  ? bpf_link_get_from_fd+0xe0/0xe0
[   16.907802]  do_syscall_64+0x38/0x60
[   16.908187]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   16.908730] RIP: 0033:0x7f014cdfe7f9
[   16.909148] Code: Bad RIP value.
[   16.909524] RSP: 002b:00007ffe1d1e8b28 EFLAGS: 00000206 ORIG_RAX: 0000000000000141
[   16.910345] RAX: ffffffffffffffda RBX: 00007f014dd27690 RCX: 00007f014cdfe7f9
[   16.911058] RDX: 0000000000000078 RSI: 00007ffe1d1e8b60 RDI: 000000000000001e
[   16.911820] RBP: 00007ffe1d1e8b40 R08: 00007ffe1d1e8b40 R09: 00007ffe1d1e8b60
[   16.912575] R10: 0000000000000044 R11: 0000000000000206 R12: 0000000000000002
[   16.913304] R13: 0000000000000000 R14: 0000000000000002 R15: 0000000000000002
[   16.914026]
[   16.914189] Allocated by task 1958:
[   16.914562]  save_stack+0x1b/0x40
[   16.914944]  __kasan_kmalloc.constprop.0+0xc2/0xd0
[   16.915476]  bpf_iter_link_attach+0x235/0x4e0
[   16.915975]  __do_sys_bpf+0x1832/0x3860
[   16.916371]  do_syscall_64+0x38/0x60
[   16.916750]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   16.917338]
[   16.917524] Freed by task 1958:
[   16.917874]  save_stack+0x1b/0x40
[   16.918241]  __kasan_slab_free+0x12f/0x180
[   16.918681]  kfree+0xc6/0x280
[   16.919024]  bpf_iter_link_attach+0x3e3/0x4e0
[   16.919488]  __do_sys_bpf+0x1832/0x3860
[   16.919915]  do_syscall_64+0x38/0x60
[   16.920301]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

To reproduce:
./test_progs -n 5
#5 bpf_obj_id:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

./test_progs -n 4/18
#4/18 bpf_hash_map:OK
#4 bpf_iter:OK
Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED

./test_progs -n 5
[   37.569154] ==================================================================
[   37.570020] BUG: KASAN: use-after-free in __do_sys_bpf+0x34f3/0x3860
Yonghong Song July 23, 2020, 6:32 p.m. UTC | #2
On 7/22/20 11:53 PM, Alexei Starovoitov wrote:
> On Wed, Jul 22, 2020 at 11:15:33PM -0700, Yonghong Song wrote:
>> Bpf iterator has been implemented for task, task_file,
>> bpf_map, ipv6_route, netlink, tcp and udp so far.
>>
>> For map elements, there are two ways to traverse all elements from
>> user space:
>>    1. using BPF_MAP_GET_NEXT_KEY bpf subcommand to get elements
>>       one by one.
>>    2. using BPF_MAP_LOOKUP_BATCH bpf subcommand to get a batch of
>>       elements.
>> Both these approaches need to copy data from kernel to user space
>> in order to do inspection.
>>
>> This patch implements bpf iterator for map elements.
>> User can have a bpf program in kernel to run with each map element,
>> do checking, filtering, aggregation, modifying values etc.
>> without copying data to user space.
>>
>> Patch #1 and #2 are refactoring. Patch #3 implements readonly/readwrite
>> buffer support in verifier. Patches #4 - #7 implements map element
>> support for hash, percpu hash, lru hash lru percpu hash, array,
>> percpu array and sock local storage maps. Patches #8 - #9 are libbpf
>> and bpftool support. Patches #10 - #13 are selftests for implemented
>> map element iterators.
> 
> kasan is not happy:
> 
> [   16.896170] ==================================================================
> [   16.896994] BUG: KASAN: use-after-free in __do_sys_bpf+0x34f3/0x3860
> [   16.897657] Read of size 4 at addr ffff8881f105b208 by task test_progs/1958
> [   16.898416]
> [   16.898577] CPU: 0 PID: 1958 Comm: test_progs Not tainted 5.8.0-rc4-01920-g6276000cd38e #2828
> [   16.899505] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
> [   16.900405] Call Trace:
> [   16.900679]  dump_stack+0x7d/0xb0
> [   16.901068]  print_address_description.constprop.0+0x3a/0x60
> [   16.901689]  ? __do_sys_bpf+0x34f3/0x3860
> [   16.902125]  kasan_report.cold+0x1f/0x37
> [   16.902595]  ? __do_sys_bpf+0x34f3/0x3860
> [   16.903029]  __do_sys_bpf+0x34f3/0x3860
> [   16.903494]  ? bpf_trace_run2+0xd1/0x210
> [   16.903971]  ? bpf_link_get_from_fd+0xe0/0xe0
> [   16.907802]  do_syscall_64+0x38/0x60
> [   16.908187]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [   16.908730] RIP: 0033:0x7f014cdfe7f9
> [   16.909148] Code: Bad RIP value.
> [   16.909524] RSP: 002b:00007ffe1d1e8b28 EFLAGS: 00000206 ORIG_RAX: 0000000000000141
> [   16.910345] RAX: ffffffffffffffda RBX: 00007f014dd27690 RCX: 00007f014cdfe7f9
> [   16.911058] RDX: 0000000000000078 RSI: 00007ffe1d1e8b60 RDI: 000000000000001e
> [   16.911820] RBP: 00007ffe1d1e8b40 R08: 00007ffe1d1e8b40 R09: 00007ffe1d1e8b60
> [   16.912575] R10: 0000000000000044 R11: 0000000000000206 R12: 0000000000000002
> [   16.913304] R13: 0000000000000000 R14: 0000000000000002 R15: 0000000000000002
> [   16.914026]
> [   16.914189] Allocated by task 1958:
> [   16.914562]  save_stack+0x1b/0x40
> [   16.914944]  __kasan_kmalloc.constprop.0+0xc2/0xd0
> [   16.915476]  bpf_iter_link_attach+0x235/0x4e0
> [   16.915975]  __do_sys_bpf+0x1832/0x3860
> [   16.916371]  do_syscall_64+0x38/0x60
> [   16.916750]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [   16.917338]
> [   16.917524] Freed by task 1958:
> [   16.917874]  save_stack+0x1b/0x40
> [   16.918241]  __kasan_slab_free+0x12f/0x180
> [   16.918681]  kfree+0xc6/0x280
> [   16.919024]  bpf_iter_link_attach+0x3e3/0x4e0
> [   16.919488]  __do_sys_bpf+0x1832/0x3860
> [   16.919915]  do_syscall_64+0x38/0x60
> [   16.920301]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Thanks for reporting the bug. The gcc on my system is 8.2 and the
requirement for kasan support is gcc 8.3. Using clang, I am able
to see the issue. Will fix and re-submit. Thanks!

> 
> To reproduce:
> ./test_progs -n 5
> #5 bpf_obj_id:OK
> Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
> 
> ./test_progs -n 4/18
> #4/18 bpf_hash_map:OK
> #4 bpf_iter:OK
> Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED
> 
> ./test_progs -n 5
> [   37.569154] ==================================================================
> [   37.570020] BUG: KASAN: use-after-free in __do_sys_bpf+0x34f3/0x3860
>