diff mbox series

[bpf-next] bpf: using rcu_read_lock for bpf_sk_storage_map iterator

Message ID 20200914184630.1048718-1-yhs@fb.com
State Changes Requested
Delegated to: BPF Maintainers
Headers show
Series [bpf-next] bpf: using rcu_read_lock for bpf_sk_storage_map iterator | expand

Commit Message

Yonghong Song Sept. 14, 2020, 6:46 p.m. UTC
Currently, we use bucket_lock when traversing bpf_sk_storage_map
elements. Since bpf_iter programs cannot use bpf_sk_storage_get()
and bpf_sk_storage_delete() helpers which may also grab bucket lock,
we do not have a deadlock issue which exists for hashmap when
using bucket_lock ([1]).

If a bucket contains a lot of sockets, during bpf_iter traversing
a bucket, concurrent bpf_sk_storage_{get,delete}() may experience
some undesirable delays. Using rcu_read_lock() is a reasonable
compromise here. Although it may lose some precision, e.g.,
access stale sockets, but it will not hurt performance of other
bpf programs.

[1] https://lore.kernel.org/bpf/20200902235341.2001534-1-yhs@fb.com

Cc: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 net/core/bpf_sk_storage.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

Comments

Song Liu Sept. 14, 2020, 9:28 p.m. UTC | #1
On Mon, Sep 14, 2020 at 11:47 AM Yonghong Song <yhs@fb.com> wrote:
>
> Currently, we use bucket_lock when traversing bpf_sk_storage_map
> elements. Since bpf_iter programs cannot use bpf_sk_storage_get()
> and bpf_sk_storage_delete() helpers which may also grab bucket lock,
> we do not have a deadlock issue which exists for hashmap when
> using bucket_lock ([1]).

The paragraph above describes why we can use bucket_lock, which is more
or less irrelevant to this change. Also, I am not sure why we refer to [1] here.

>
> If a bucket contains a lot of sockets, during bpf_iter traversing
> a bucket, concurrent bpf_sk_storage_{get,delete}() may experience
> some undesirable delays. Using rcu_read_lock() is a reasonable

It will be great to include some performance comparison.

> compromise here. Although it may lose some precision, e.g.,
> access stale sockets, but it will not hurt performance of other
> bpf programs.
>
> [1] https://lore.kernel.org/bpf/20200902235341.2001534-1-yhs@fb.com
>
> Cc: Martin KaFai Lau <kafai@fb.com>
> Signed-off-by: Yonghong Song <yhs@fb.com>

Other than these,

Acked-by: Song Liu <songliubraving@fb.com>

[...]
Yonghong Song Sept. 15, 2020, 5:25 a.m. UTC | #2
On 9/14/20 2:28 PM, Song Liu wrote:
> On Mon, Sep 14, 2020 at 11:47 AM Yonghong Song <yhs@fb.com> wrote:
>>
>> Currently, we use bucket_lock when traversing bpf_sk_storage_map
>> elements. Since bpf_iter programs cannot use bpf_sk_storage_get()
>> and bpf_sk_storage_delete() helpers which may also grab bucket lock,
>> we do not have a deadlock issue which exists for hashmap when
>> using bucket_lock ([1]).
> 
> The paragraph above describes why we can use bucket_lock, which is more
> or less irrelevant to this change. Also, I am not sure why we refer to [1] here.

What I try to clarify here is unlike [1], we do not have bugs
with the current code.
But I guess no bugs is implicit so I probably do not need to say
anything. Will skip the above paragraph in the next revision.

> 
>>
>> If a bucket contains a lot of sockets, during bpf_iter traversing
>> a bucket, concurrent bpf_sk_storage_{get,delete}() may experience
>> some undesirable delays. Using rcu_read_lock() is a reasonable
> 
> It will be great to include some performance comparison.

Sure. Let me give some try to collect some statistics.

> 
>> compromise here. Although it may lose some precision, e.g.,
>> access stale sockets, but it will not hurt performance of other
>> bpf programs.
>>
>> [1] https://lore.kernel.org/bpf/20200902235341.2001534-1-yhs@fb.com
>>
>> Cc: Martin KaFai Lau <kafai@fb.com>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
> 
> Other than these,
> 
> Acked-by: Song Liu <songliubraving@fb.com>
> 
> [...]
>
Jakub Kicinski Sept. 15, 2020, 3:33 p.m. UTC | #3
On Mon, 14 Sep 2020 11:46:30 -0700 Yonghong Song wrote:
> Currently, we use bucket_lock when traversing bpf_sk_storage_map
> elements. Since bpf_iter programs cannot use bpf_sk_storage_get()
> and bpf_sk_storage_delete() helpers which may also grab bucket lock,
> we do not have a deadlock issue which exists for hashmap when
> using bucket_lock ([1]).
> 
> If a bucket contains a lot of sockets, during bpf_iter traversing
> a bucket, concurrent bpf_sk_storage_{get,delete}() may experience
> some undesirable delays. Using rcu_read_lock() is a reasonable
> compromise here. Although it may lose some precision, e.g.,
> access stale sockets, but it will not hurt performance of other
> bpf programs.
> 
> [1] https://lore.kernel.org/bpf/20200902235341.2001534-1-yhs@fb.com
> 
> Cc: Martin KaFai Lau <kafai@fb.com>
> Signed-off-by: Yonghong Song <yhs@fb.com>

Sparse is not happy about it. Could you add some annotations, perhaps?

include/linux/rcupdate.h:686:9: warning: context imbalance in 'bpf_sk_storage_map_seq_find_next' - unexpected unlock
include/linux/rcupdate.h:686:9: warning: context imbalance in 'bpf_sk_storage_map_seq_stop' - unexpected unlock
Yonghong Song Sept. 15, 2020, 5:35 p.m. UTC | #4
On 9/15/20 8:33 AM, Jakub Kicinski wrote:
> On Mon, 14 Sep 2020 11:46:30 -0700 Yonghong Song wrote:
>> Currently, we use bucket_lock when traversing bpf_sk_storage_map
>> elements. Since bpf_iter programs cannot use bpf_sk_storage_get()
>> and bpf_sk_storage_delete() helpers which may also grab bucket lock,
>> we do not have a deadlock issue which exists for hashmap when
>> using bucket_lock ([1]).
>>
>> If a bucket contains a lot of sockets, during bpf_iter traversing
>> a bucket, concurrent bpf_sk_storage_{get,delete}() may experience
>> some undesirable delays. Using rcu_read_lock() is a reasonable
>> compromise here. Although it may lose some precision, e.g.,
>> access stale sockets, but it will not hurt performance of other
>> bpf programs.
>>
>> [1] https://lore.kernel.org/bpf/20200902235341.2001534-1-yhs@fb.com
>>
>> Cc: Martin KaFai Lau <kafai@fb.com>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
> 
> Sparse is not happy about it. Could you add some annotations, perhaps?
> 
> include/linux/rcupdate.h:686:9: warning: context imbalance in 'bpf_sk_storage_map_seq_find_next' - unexpected unlock
> include/linux/rcupdate.h:686:9: warning: context imbalance in 'bpf_sk_storage_map_seq_stop' - unexpected unlock

Okay, I will try.

On my system, sparse is unhappy and core dumped....

/data/users/yhs/work/net-next/include/linux/string.h:12:38: error: too 
many errors
/bin/sh: line 1: 2710132 Segmentation fault      (core dumped) sparse 
-D__linux__ -Dlinux -D__STDC__ -Dunix
-D__unix__ -Wbitwise -Wno-return-void -Wno-unknown-attribute 
-D__x86_64__ --arch=x86 -mlittle-endian -m64 -W
p,-MMD,net/core/.bpf_sk_storage.o.d -nostdinc -isystem
...
/data/users/yhs/work/net-next/net/core/bpf_sk_storage.c
make[3]: *** [net/core/bpf_sk_storage.o] Error 139
make[3]: *** Deleting file `net/core/bpf_sk_storage.o'

-bash-4.4$ rpm -qf /bin/sparse
sparse-0.5.2-1.el7.x86_64
-bash-4.4$
Jakub Kicinski Sept. 15, 2020, 5:40 p.m. UTC | #5
On Tue, 15 Sep 2020 10:35:50 -0700 Yonghong Song wrote:
> On 9/15/20 8:33 AM, Jakub Kicinski wrote:
> > On Mon, 14 Sep 2020 11:46:30 -0700 Yonghong Song wrote:  
> >> Currently, we use bucket_lock when traversing bpf_sk_storage_map
> >> elements. Since bpf_iter programs cannot use bpf_sk_storage_get()
> >> and bpf_sk_storage_delete() helpers which may also grab bucket lock,
> >> we do not have a deadlock issue which exists for hashmap when
> >> using bucket_lock ([1]).
> >>
> >> If a bucket contains a lot of sockets, during bpf_iter traversing
> >> a bucket, concurrent bpf_sk_storage_{get,delete}() may experience
> >> some undesirable delays. Using rcu_read_lock() is a reasonable
> >> compromise here. Although it may lose some precision, e.g.,
> >> access stale sockets, but it will not hurt performance of other
> >> bpf programs.
> >>
> >> [1] https://lore.kernel.org/bpf/20200902235341.2001534-1-yhs@fb.com
> >>
> >> Cc: Martin KaFai Lau <kafai@fb.com>
> >> Signed-off-by: Yonghong Song <yhs@fb.com>  
> > 
> > Sparse is not happy about it. Could you add some annotations, perhaps?
> > 
> > include/linux/rcupdate.h:686:9: warning: context imbalance in 'bpf_sk_storage_map_seq_find_next' - unexpected unlock
> > include/linux/rcupdate.h:686:9: warning: context imbalance in 'bpf_sk_storage_map_seq_stop' - unexpected unlock  
> 
> Okay, I will try.
> 
> On my system, sparse is unhappy and core dumped....
> 
> /data/users/yhs/work/net-next/include/linux/string.h:12:38: error: too 
> many errors
> /bin/sh: line 1: 2710132 Segmentation fault      (core dumped) sparse 
> -D__linux__ -Dlinux -D__STDC__ -Dunix
> -D__unix__ -Wbitwise -Wno-return-void -Wno-unknown-attribute 
> -D__x86_64__ --arch=x86 -mlittle-endian -m64 -W
> p,-MMD,net/core/.bpf_sk_storage.o.d -nostdinc -isystem
> ...
> /data/users/yhs/work/net-next/net/core/bpf_sk_storage.c
> make[3]: *** [net/core/bpf_sk_storage.o] Error 139
> make[3]: *** Deleting file `net/core/bpf_sk_storage.o'
> 
> -bash-4.4$ rpm -qf /bin/sparse
> sparse-0.5.2-1.el7.x86_64
> -bash-4.4$

I think you need to build from source, sadly :(

https://git.kernel.org/pub/scm//devel/sparse/sparse.git
Yonghong Song Sept. 15, 2020, 6:56 p.m. UTC | #6
On 9/15/20 10:40 AM, Jakub Kicinski wrote:
> On Tue, 15 Sep 2020 10:35:50 -0700 Yonghong Song wrote:
>> On 9/15/20 8:33 AM, Jakub Kicinski wrote:
>>> On Mon, 14 Sep 2020 11:46:30 -0700 Yonghong Song wrote:
>>>> Currently, we use bucket_lock when traversing bpf_sk_storage_map
>>>> elements. Since bpf_iter programs cannot use bpf_sk_storage_get()
>>>> and bpf_sk_storage_delete() helpers which may also grab bucket lock,
>>>> we do not have a deadlock issue which exists for hashmap when
>>>> using bucket_lock ([1]).
>>>>
>>>> If a bucket contains a lot of sockets, during bpf_iter traversing
>>>> a bucket, concurrent bpf_sk_storage_{get,delete}() may experience
>>>> some undesirable delays. Using rcu_read_lock() is a reasonable
>>>> compromise here. Although it may lose some precision, e.g.,
>>>> access stale sockets, but it will not hurt performance of other
>>>> bpf programs.
>>>>
>>>> [1] https://lore.kernel.org/bpf/20200902235341.2001534-1-yhs@fb.com
>>>>
>>>> Cc: Martin KaFai Lau <kafai@fb.com>
>>>> Signed-off-by: Yonghong Song <yhs@fb.com>
>>>
>>> Sparse is not happy about it. Could you add some annotations, perhaps?
>>>
>>> include/linux/rcupdate.h:686:9: warning: context imbalance in 'bpf_sk_storage_map_seq_find_next' - unexpected unlock
>>> include/linux/rcupdate.h:686:9: warning: context imbalance in 'bpf_sk_storage_map_seq_stop' - unexpected unlock
>>
>> Okay, I will try.
>>
>> On my system, sparse is unhappy and core dumped....
>>
>> /data/users/yhs/work/net-next/include/linux/string.h:12:38: error: too
>> many errors
>> /bin/sh: line 1: 2710132 Segmentation fault      (core dumped) sparse
>> -D__linux__ -Dlinux -D__STDC__ -Dunix
>> -D__unix__ -Wbitwise -Wno-return-void -Wno-unknown-attribute
>> -D__x86_64__ --arch=x86 -mlittle-endian -m64 -W
>> p,-MMD,net/core/.bpf_sk_storage.o.d -nostdinc -isystem
>> ...
>> /data/users/yhs/work/net-next/net/core/bpf_sk_storage.c
>> make[3]: *** [net/core/bpf_sk_storage.o] Error 139
>> make[3]: *** Deleting file `net/core/bpf_sk_storage.o'
>>
>> -bash-4.4$ rpm -qf /bin/sparse
>> sparse-0.5.2-1.el7.x86_64
>> -bash-4.4$
> 
> I think you need to build from source, sadly :(
> 
> https://git.kernel.org/pub/scm//devel/sparse/sparse.git

Indeed, building sparse from source works. After adding some 
__releases(RCU) and __acquires(RCU), I now have:
   context imbalance in 'bpf_sk_storage_map_seq_find_next' - different 
lock contexts for basic block
I may need to restructure code to please sparse...

>
Alexei Starovoitov Sept. 15, 2020, 7:03 p.m. UTC | #7
On Tue, Sep 15, 2020 at 11:56 AM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 9/15/20 10:40 AM, Jakub Kicinski wrote:
> > On Tue, 15 Sep 2020 10:35:50 -0700 Yonghong Song wrote:
> >> On 9/15/20 8:33 AM, Jakub Kicinski wrote:
> >>> On Mon, 14 Sep 2020 11:46:30 -0700 Yonghong Song wrote:
> >>>> Currently, we use bucket_lock when traversing bpf_sk_storage_map
> >>>> elements. Since bpf_iter programs cannot use bpf_sk_storage_get()
> >>>> and bpf_sk_storage_delete() helpers which may also grab bucket lock,
> >>>> we do not have a deadlock issue which exists for hashmap when
> >>>> using bucket_lock ([1]).
> >>>>
> >>>> If a bucket contains a lot of sockets, during bpf_iter traversing
> >>>> a bucket, concurrent bpf_sk_storage_{get,delete}() may experience
> >>>> some undesirable delays. Using rcu_read_lock() is a reasonable
> >>>> compromise here. Although it may lose some precision, e.g.,
> >>>> access stale sockets, but it will not hurt performance of other
> >>>> bpf programs.
> >>>>
> >>>> [1] https://lore.kernel.org/bpf/20200902235341.2001534-1-yhs@fb.com
> >>>>
> >>>> Cc: Martin KaFai Lau <kafai@fb.com>
> >>>> Signed-off-by: Yonghong Song <yhs@fb.com>
> >>>
> >>> Sparse is not happy about it. Could you add some annotations, perhaps?
> >>>
> >>> include/linux/rcupdate.h:686:9: warning: context imbalance in 'bpf_sk_storage_map_seq_find_next' - unexpected unlock
> >>> include/linux/rcupdate.h:686:9: warning: context imbalance in 'bpf_sk_storage_map_seq_stop' - unexpected unlock
> >>
> >> Okay, I will try.
> >>
> >> On my system, sparse is unhappy and core dumped....
> >>
> >> /data/users/yhs/work/net-next/include/linux/string.h:12:38: error: too
> >> many errors
> >> /bin/sh: line 1: 2710132 Segmentation fault      (core dumped) sparse
> >> -D__linux__ -Dlinux -D__STDC__ -Dunix
> >> -D__unix__ -Wbitwise -Wno-return-void -Wno-unknown-attribute
> >> -D__x86_64__ --arch=x86 -mlittle-endian -m64 -W
> >> p,-MMD,net/core/.bpf_sk_storage.o.d -nostdinc -isystem
> >> ...
> >> /data/users/yhs/work/net-next/net/core/bpf_sk_storage.c
> >> make[3]: *** [net/core/bpf_sk_storage.o] Error 139
> >> make[3]: *** Deleting file `net/core/bpf_sk_storage.o'
> >>
> >> -bash-4.4$ rpm -qf /bin/sparse
> >> sparse-0.5.2-1.el7.x86_64
> >> -bash-4.4$
> >
> > I think you need to build from source, sadly :(
> >
> > https://git.kernel.org/pub/scm//devel/sparse/sparse.git
>
> Indeed, building sparse from source works. After adding some
> __releases(RCU) and __acquires(RCU), I now have:
>    context imbalance in 'bpf_sk_storage_map_seq_find_next' - different
> lock contexts for basic block
> I may need to restructure code to please sparse...

I don't think sparse can handle such things even with all annotations.
I would spend too much time on it.
diff mbox series

Patch

diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index 4a86ea34f29e..a1db5e988d19 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -701,7 +701,7 @@  bpf_sk_storage_map_seq_find_next(struct bpf_iter_seq_sk_storage_map_info *info,
 		if (!selem) {
 			/* not found, unlock and go to the next bucket */
 			b = &smap->buckets[bucket_id++];
-			raw_spin_unlock_bh(&b->lock);
+			rcu_read_unlock();
 			skip_elems = 0;
 			break;
 		}
@@ -715,7 +715,7 @@  bpf_sk_storage_map_seq_find_next(struct bpf_iter_seq_sk_storage_map_info *info,
 
 	for (i = bucket_id; i < (1U << smap->bucket_log); i++) {
 		b = &smap->buckets[i];
-		raw_spin_lock_bh(&b->lock);
+		rcu_read_lock();
 		count = 0;
 		hlist_for_each_entry(selem, &b->list, map_node) {
 			sk_storage = rcu_dereference_raw(selem->local_storage);
@@ -726,7 +726,7 @@  bpf_sk_storage_map_seq_find_next(struct bpf_iter_seq_sk_storage_map_info *info,
 			}
 			count++;
 		}
-		raw_spin_unlock_bh(&b->lock);
+		rcu_read_unlock();
 		skip_elems = 0;
 	}
 
@@ -806,13 +806,10 @@  static void bpf_sk_storage_map_seq_stop(struct seq_file *seq, void *v)
 	struct bpf_local_storage_map *smap;
 	struct bpf_local_storage_map_bucket *b;
 
-	if (!v) {
+	if (!v)
 		(void)__bpf_sk_storage_map_seq_show(seq, v);
-	} else {
-		smap = (struct bpf_local_storage_map *)info->map;
-		b = &smap->buckets[info->bucket_id];
-		raw_spin_unlock_bh(&b->lock);
-	}
+	else
+		rcu_read_unlock();
 }
 
 static int bpf_iter_init_sk_storage_map(void *priv_data,