diff mbox series

[bpf,3/5] bpf, sockmap: fix leakage of smap_psock_map_entry

Message ID 20180816194910.9040-4-daniel@iogearbox.net
State Accepted, archived
Delegated to: BPF Maintainers
Headers show
Series BPF sockmap and ulp fixes | expand

Commit Message

Daniel Borkmann Aug. 16, 2018, 7:49 p.m. UTC
While working on sockmap I noticed that we do not always kfree the
struct smap_psock_map_entry list elements which track psocks attached
to maps. In the case of sock_hash_ctx_update_elem(), these map entries
are allocated outside of __sock_map_ctx_update_elem() with their
linkage to the socket hash table filled. In the case of sock array,
the map entries are allocated inside of __sock_map_ctx_update_elem()
and added with their linkage to the psock->maps. Both additions are
under psock->maps_lock each.

Now, we drop these elements from their psock->maps list in a few
occasions: i) in sock array via smap_list_map_remove() when an entry
is either deleted from the map from user space, or updated via
user space or BPF program where we drop the old socket at that map
slot, or the sock array is freed via sock_map_free() and drops all
its elements; ii) for sock hash via smap_list_hash_remove() in exactly
the same occasions as just described for sock array; iii) in the
bpf_tcp_close() where we remove the elements from the list via
psock_map_pop() and iterate over them dropping themselves from either
sock array or sock hash; and last but not least iv) once again in
smap_gc_work() which is a callback for deferring the work once the
psock refcount hit zero and thus the socket is being destroyed.

Problem is that the only case where we kfree() the list entry is
in case iv), which at that point should have an empty list in
normal cases. So in cases from i) to iii) we unlink the elements
without freeing where they go out of reach from us. Hence fix is
to properly kfree() them as well to stop the leakage. Given these
are all handled under psock->maps_lock there is no need for deferred
RCU freeing.

I later also ran with kmemleak detector and it confirmed the finding
as well where in the state before the fix the object goes unreferenced
while after the patch no kmemleak report related to BPF showed up.

  [...]
  unreferenced object 0xffff880378eadae0 (size 64):
    comm "test_sockmap", pid 2225, jiffies 4294720701 (age 43.504s)
    hex dump (first 32 bytes):
      00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de  ................
      50 4d 75 5d 03 88 ff ff 00 00 00 00 00 00 00 00  PMu]............
    backtrace:
      [<000000005225ac3c>] sock_map_ctx_update_elem.isra.21+0xd8/0x210
      [<0000000045dd6d3c>] bpf_sock_map_update+0x29/0x60
      [<00000000877723aa>] ___bpf_prog_run+0x1e1f/0x4960
      [<000000002ef89e83>] 0xffffffffffffffff
  unreferenced object 0xffff880378ead240 (size 64):
    comm "test_sockmap", pid 2225, jiffies 4294720701 (age 43.504s)
    hex dump (first 32 bytes):
      00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de  ................
      00 44 75 5d 03 88 ff ff 00 00 00 00 00 00 00 00  .Du]............
    backtrace:
      [<000000005225ac3c>] sock_map_ctx_update_elem.isra.21+0xd8/0x210
      [<0000000030e37a3a>] sock_map_update_elem+0x125/0x240
      [<000000002e5ce36e>] map_update_elem+0x4eb/0x7b0
      [<00000000db453cc9>] __x64_sys_bpf+0x1f9/0x360
      [<0000000000763660>] do_syscall_64+0x9a/0x300
      [<00000000422a2bb2>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [<000000002ef89e83>] 0xffffffffffffffff
  [...]

Fixes: e9db4ef6bf4c ("bpf: sockhash fix omitted bucket lock in sock_close")
Fixes: 54fedb42c653 ("bpf: sockmap, fix smap_list_map_remove when psock is in many maps")
Fixes: 2f857d04601a ("bpf: sockmap, remove STRPARSER map_flags and add multi-map support")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
---
 kernel/bpf/sockmap.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Song Liu Aug. 16, 2018, 9:27 p.m. UTC | #1
On Thu, Aug 16, 2018 at 12:49 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> While working on sockmap I noticed that we do not always kfree the
> struct smap_psock_map_entry list elements which track psocks attached
> to maps. In the case of sock_hash_ctx_update_elem(), these map entries
> are allocated outside of __sock_map_ctx_update_elem() with their
> linkage to the socket hash table filled. In the case of sock array,
> the map entries are allocated inside of __sock_map_ctx_update_elem()
> and added with their linkage to the psock->maps. Both additions are
> under psock->maps_lock each.
>
> Now, we drop these elements from their psock->maps list in a few
> occasions: i) in sock array via smap_list_map_remove() when an entry
> is either deleted from the map from user space, or updated via
> user space or BPF program where we drop the old socket at that map
> slot, or the sock array is freed via sock_map_free() and drops all
> its elements; ii) for sock hash via smap_list_hash_remove() in exactly
> the same occasions as just described for sock array; iii) in the
> bpf_tcp_close() where we remove the elements from the list via
> psock_map_pop() and iterate over them dropping themselves from either
> sock array or sock hash; and last but not least iv) once again in
> smap_gc_work() which is a callback for deferring the work once the
> psock refcount hit zero and thus the socket is being destroyed.
>
> Problem is that the only case where we kfree() the list entry is
> in case iv), which at that point should have an empty list in
> normal cases. So in cases from i) to iii) we unlink the elements
> without freeing where they go out of reach from us. Hence fix is
> to properly kfree() them as well to stop the leakage. Given these
> are all handled under psock->maps_lock there is no need for deferred
> RCU freeing.
>
> I later also ran with kmemleak detector and it confirmed the finding
> as well where in the state before the fix the object goes unreferenced
> while after the patch no kmemleak report related to BPF showed up.
>
>   [...]
>   unreferenced object 0xffff880378eadae0 (size 64):
>     comm "test_sockmap", pid 2225, jiffies 4294720701 (age 43.504s)
>     hex dump (first 32 bytes):
>       00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de  ................
>       50 4d 75 5d 03 88 ff ff 00 00 00 00 00 00 00 00  PMu]............
>     backtrace:
>       [<000000005225ac3c>] sock_map_ctx_update_elem.isra.21+0xd8/0x210
>       [<0000000045dd6d3c>] bpf_sock_map_update+0x29/0x60
>       [<00000000877723aa>] ___bpf_prog_run+0x1e1f/0x4960
>       [<000000002ef89e83>] 0xffffffffffffffff
>   unreferenced object 0xffff880378ead240 (size 64):
>     comm "test_sockmap", pid 2225, jiffies 4294720701 (age 43.504s)
>     hex dump (first 32 bytes):
>       00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de  ................
>       00 44 75 5d 03 88 ff ff 00 00 00 00 00 00 00 00  .Du]............
>     backtrace:
>       [<000000005225ac3c>] sock_map_ctx_update_elem.isra.21+0xd8/0x210
>       [<0000000030e37a3a>] sock_map_update_elem+0x125/0x240
>       [<000000002e5ce36e>] map_update_elem+0x4eb/0x7b0
>       [<00000000db453cc9>] __x64_sys_bpf+0x1f9/0x360
>       [<0000000000763660>] do_syscall_64+0x9a/0x300
>       [<00000000422a2bb2>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>       [<000000002ef89e83>] 0xffffffffffffffff
>   [...]
>
> Fixes: e9db4ef6bf4c ("bpf: sockhash fix omitted bucket lock in sock_close")
> Fixes: 54fedb42c653 ("bpf: sockmap, fix smap_list_map_remove when psock is in many maps")
> Fixes: 2f857d04601a ("bpf: sockmap, remove STRPARSER map_flags and add multi-map support")
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Acked-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: Song Liu <songliubraving@fb.com>

> ---
>  kernel/bpf/sockmap.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
> index 0c1a696..94a324b 100644
> --- a/kernel/bpf/sockmap.c
> +++ b/kernel/bpf/sockmap.c
> @@ -370,6 +370,7 @@ static void bpf_tcp_close(struct sock *sk, long timeout)
>                         }
>                         raw_spin_unlock_bh(&b->lock);
>                 }
> +               kfree(e);
>                 e = psock_map_pop(sk, psock);
>         }
>         rcu_read_unlock();
> @@ -1675,8 +1676,10 @@ static void smap_list_map_remove(struct smap_psock *psock,
>
>         spin_lock_bh(&psock->maps_lock);
>         list_for_each_entry_safe(e, tmp, &psock->maps, list) {
> -               if (e->entry == entry)
> +               if (e->entry == entry) {
>                         list_del(&e->list);
> +                       kfree(e);
> +               }
>         }
>         spin_unlock_bh(&psock->maps_lock);
>  }
> @@ -1690,8 +1693,10 @@ static void smap_list_hash_remove(struct smap_psock *psock,
>         list_for_each_entry_safe(e, tmp, &psock->maps, list) {
>                 struct htab_elem *c = rcu_dereference(e->hash_link);
>
> -               if (c == hash_link)
> +               if (c == hash_link) {
>                         list_del(&e->list);
> +                       kfree(e);
> +               }
>         }
>         spin_unlock_bh(&psock->maps_lock);
>  }
> --
> 2.9.5
>
diff mbox series

Patch

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 0c1a696..94a324b 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -370,6 +370,7 @@  static void bpf_tcp_close(struct sock *sk, long timeout)
 			}
 			raw_spin_unlock_bh(&b->lock);
 		}
+		kfree(e);
 		e = psock_map_pop(sk, psock);
 	}
 	rcu_read_unlock();
@@ -1675,8 +1676,10 @@  static void smap_list_map_remove(struct smap_psock *psock,
 
 	spin_lock_bh(&psock->maps_lock);
 	list_for_each_entry_safe(e, tmp, &psock->maps, list) {
-		if (e->entry == entry)
+		if (e->entry == entry) {
 			list_del(&e->list);
+			kfree(e);
+		}
 	}
 	spin_unlock_bh(&psock->maps_lock);
 }
@@ -1690,8 +1693,10 @@  static void smap_list_hash_remove(struct smap_psock *psock,
 	list_for_each_entry_safe(e, tmp, &psock->maps, list) {
 		struct htab_elem *c = rcu_dereference(e->hash_link);
 
-		if (c == hash_link)
+		if (c == hash_link) {
 			list_del(&e->list);
+			kfree(e);
+		}
 	}
 	spin_unlock_bh(&psock->maps_lock);
 }