diff mbox series

Incorrect dependency handling with delayed ipset destroy ipset 7.21

Message ID CALFUNymhWkcy2p9hqt7eO4H4Hm5t70Y02=XodnpH1zgAZ0cVSw@mail.gmail.com
State New
Headers show
Series Incorrect dependency handling with delayed ipset destroy ipset 7.21 | expand

Commit Message

Alexander Maltsev April 13, 2024, 4:19 a.m. UTC
Hello.
I have a problem with recent kernels. Due to delayed ipset destroy I'm
unable to destroy ipset that was recently in use by another
(destroyed) ipset. It is demonstrated by this example:

#!/bin/bash
set -x

ipset create qwe1 list:set
ipset create asd1 hash:net
ipset add qwe1 asd1
ipset add asd1 1.1.1.1

ipset destroy qwe1
ipset list asd1 -t
ipset destroy asd1

Second ipset destroy reports an error "ipset v7.21: Set cannot be
destroyed: it is in use by a kernel component".
If this command is repeated after a short delay, it deletes ipset
without any problems.

It seems it could be fixed with that kernel module patch:


If you have any suggestions on how this problem should be approached
please let me know. Thanks.

Comments

Jozsef Kadlecsik April 13, 2024, 2:02 p.m. UTC | #1
On Sat, 13 Apr 2024, keltargw wrote:

> I have a problem with recent kernels. Due to delayed ipset destroy I'm 
> unable to destroy ipset that was recently in use by another (destroyed) 
> ipset. It is demonstrated by this example:
> 
> #!/bin/bash
> set -x
> 
> ipset create qwe1 list:set
> ipset create asd1 hash:net
> ipset add qwe1 asd1
> ipset add asd1 1.1.1.1
> 
> ipset destroy qwe1
> ipset list asd1 -t
> ipset destroy asd1
> 
> Second ipset destroy reports an error "ipset v7.21: Set cannot be
> destroyed: it is in use by a kernel component".
> If this command is repeated after a short delay, it deletes ipset
> without any problems.
> 
> It seems it could be fixed with that kernel module patch:
> 
> Index: linux-6.7.9/net/netfilter/ipset/ip_set_core.c
> ===================================================================
> --- linux-6.7.9.orig/net/netfilter/ipset/ip_set_core.c
> +++ linux-6.7.9/net/netfilter/ipset/ip_set_core.c
> @@ -1241,6 +1241,9 @@ static int ip_set_destroy(struct sk_buff
>   u32 flags = flag_exist(info->nlh);
>   u16 features = 0;
> 
> + /* Wait for flush to ensure references are cleared */
> + rcu_barrier();
> +
>   read_lock_bh(&ip_set_ref_lock);
>   s = find_set_and_id(inst, nla_data(attr[IPSET_ATTR_SETNAME]),
>      &i);
> 
> If you have any suggestions on how this problem should be approached
> please let me know.

I'd better solve it in the list type itself: your patch unnecessarily 
slows down all set destroy operations.

Best regards,
Jozsef
Alexander Maltsev April 14, 2024, 5:16 a.m. UTC | #2
Thanks for the suggestion. I'm not that familiar with ipset source
code, do you mean something like issuing a second rcu_barrier between
call_rcu and returning result code back to netlink (and only doing
that for list type)?

As I understand it there isn't much that could be done in e.g.
list_set_destroy as it might not be called yet, sitting in the rcu
wait queue.


On Sat, 13 Apr 2024 at 19:02, Jozsef Kadlecsik <kadlec@netfilter.org> wrote:
>
> On Sat, 13 Apr 2024, keltargw wrote:
>
> > I have a problem with recent kernels. Due to delayed ipset destroy I'm
> > unable to destroy ipset that was recently in use by another (destroyed)
> > ipset. It is demonstrated by this example:
> >
> > #!/bin/bash
> > set -x
> >
> > ipset create qwe1 list:set
> > ipset create asd1 hash:net
> > ipset add qwe1 asd1
> > ipset add asd1 1.1.1.1
> >
> > ipset destroy qwe1
> > ipset list asd1 -t
> > ipset destroy asd1
> >
> > Second ipset destroy reports an error "ipset v7.21: Set cannot be
> > destroyed: it is in use by a kernel component".
> > If this command is repeated after a short delay, it deletes ipset
> > without any problems.
> >
> > It seems it could be fixed with that kernel module patch:
> >
> > Index: linux-6.7.9/net/netfilter/ipset/ip_set_core.c
> > ===================================================================
> > --- linux-6.7.9.orig/net/netfilter/ipset/ip_set_core.c
> > +++ linux-6.7.9/net/netfilter/ipset/ip_set_core.c
> > @@ -1241,6 +1241,9 @@ static int ip_set_destroy(struct sk_buff
> >   u32 flags = flag_exist(info->nlh);
> >   u16 features = 0;
> >
> > + /* Wait for flush to ensure references are cleared */
> > + rcu_barrier();
> > +
> >   read_lock_bh(&ip_set_ref_lock);
> >   s = find_set_and_id(inst, nla_data(attr[IPSET_ATTR_SETNAME]),
> >      &i);
> >
> > If you have any suggestions on how this problem should be approached
> > please let me know.
>
> I'd better solve it in the list type itself: your patch unnecessarily
> slows down all set destroy operations.
>
> Best regards,
> Jozsef
> --
> E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.hu
> PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt
> Address : Wigner Research Centre for Physics
>           H-1525 Budapest 114, POB. 49, Hungary
Jozsef Kadlecsik April 15, 2024, 9:28 a.m. UTC | #3
On Sun, 14 Apr 2024, keltargw wrote:

> Thanks for the suggestion. I'm not that familiar with ipset source code, 
> do you mean something like issuing a second rcu_barrier between call_rcu 
> and returning result code back to netlink (and only doing that for list 
> type)?
> 
> As I understand it there isn't much that could be done in e.g. 
> list_set_destroy as it might not be called yet, sitting in the rcu wait 
> queue.

No, I meant release the reference counter of the element sets immediately 
when destroying a list type of set. Something like moving just the 
ip_set_put_byindex() call

        list_for_each_entry_safe(e, n, &map->members, list) {
		...
		ip_set_put_byindex(map->net, e->id);
		...
	}

from list_set_destroy() into list_set_cancel_gc(). That way the member 
sets can be destroyed without waiting for anything.

Best regards,
Jozsef 

> On Sat, 13 Apr 2024 at 19:02, Jozsef Kadlecsik <kadlec@netfilter.org> wrote:
> >
> > On Sat, 13 Apr 2024, keltargw wrote:
> >
> > > I have a problem with recent kernels. Due to delayed ipset destroy I'm
> > > unable to destroy ipset that was recently in use by another (destroyed)
> > > ipset. It is demonstrated by this example:
> > >
> > > #!/bin/bash
> > > set -x
> > >
> > > ipset create qwe1 list:set
> > > ipset create asd1 hash:net
> > > ipset add qwe1 asd1
> > > ipset add asd1 1.1.1.1
> > >
> > > ipset destroy qwe1
> > > ipset list asd1 -t
> > > ipset destroy asd1
> > >
> > > Second ipset destroy reports an error "ipset v7.21: Set cannot be
> > > destroyed: it is in use by a kernel component".
> > > If this command is repeated after a short delay, it deletes ipset
> > > without any problems.
> > >
> > > It seems it could be fixed with that kernel module patch:
> > >
> > > Index: linux-6.7.9/net/netfilter/ipset/ip_set_core.c
> > > ===================================================================
> > > --- linux-6.7.9.orig/net/netfilter/ipset/ip_set_core.c
> > > +++ linux-6.7.9/net/netfilter/ipset/ip_set_core.c
> > > @@ -1241,6 +1241,9 @@ static int ip_set_destroy(struct sk_buff
> > >   u32 flags = flag_exist(info->nlh);
> > >   u16 features = 0;
> > >
> > > + /* Wait for flush to ensure references are cleared */
> > > + rcu_barrier();
> > > +
> > >   read_lock_bh(&ip_set_ref_lock);
> > >   s = find_set_and_id(inst, nla_data(attr[IPSET_ATTR_SETNAME]),
> > >      &i);
> > >
> > > If you have any suggestions on how this problem should be approached
> > > please let me know.
> >
> > I'd better solve it in the list type itself: your patch unnecessarily
> > slows down all set destroy operations.
> >
> > Best regards,
> > Jozsef
> > --
> > E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.hu
> > PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt
> > Address : Wigner Research Centre for Physics
> >           H-1525 Budapest 114, POB. 49, Hungary
>
Alexander Maltsev April 15, 2024, 5:18 p.m. UTC | #4
Thank you. That seems to be working fine, but I'm not sure about how
readable it becomes, as destroy logic becomes splitted in two
inseparable parts.
How about adding list_set_flush() at the end of list_set_cancel_gc()
instead, would that be too heavy? E.g.

diff --git a/kernel/net/netfilter/ipset/ip_set_list_set.c
b/kernel/net/netfilter/ipset/ip_set_list_set.c
index cc2e5b9..1cdb68e 100644
--- a/kernel/net/netfilter/ipset/ip_set_list_set.c
+++ b/kernel/net/netfilter/ipset/ip_set_list_set.c
@@ -552,6 +552,8 @@ list_set_cancel_gc(struct ip_set *set)

       if (SET_WITH_TIMEOUT(set))
               timer_shutdown_sync(&map->gc);
+
+       list_set_flush(set);
}

static const struct ip_set_type_variant set_variant = {

On Mon, 15 Apr 2024 at 14:28, Jozsef Kadlecsik <kadlec@netfilter.org> wrote:
>
> On Sun, 14 Apr 2024, keltargw wrote:
>
> > Thanks for the suggestion. I'm not that familiar with ipset source code,
> > do you mean something like issuing a second rcu_barrier between call_rcu
> > and returning result code back to netlink (and only doing that for list
> > type)?
> >
> > As I understand it there isn't much that could be done in e.g.
> > list_set_destroy as it might not be called yet, sitting in the rcu wait
> > queue.
>
> No, I meant release the reference counter of the element sets immediately
> when destroying a list type of set. Something like moving just the
> ip_set_put_byindex() call
>
>         list_for_each_entry_safe(e, n, &map->members, list) {
>                 ...
>                 ip_set_put_byindex(map->net, e->id);
>                 ...
>         }
>
> from list_set_destroy() into list_set_cancel_gc(). That way the member
> sets can be destroyed without waiting for anything.
>
> Best regards,
> Jozsef
>
> > On Sat, 13 Apr 2024 at 19:02, Jozsef Kadlecsik <kadlec@netfilter.org> wrote:
> > >
> > > On Sat, 13 Apr 2024, keltargw wrote:
> > >
> > > > I have a problem with recent kernels. Due to delayed ipset destroy I'm
> > > > unable to destroy ipset that was recently in use by another (destroyed)
> > > > ipset. It is demonstrated by this example:
> > > >
> > > > #!/bin/bash
> > > > set -x
> > > >
> > > > ipset create qwe1 list:set
> > > > ipset create asd1 hash:net
> > > > ipset add qwe1 asd1
> > > > ipset add asd1 1.1.1.1
> > > >
> > > > ipset destroy qwe1
> > > > ipset list asd1 -t
> > > > ipset destroy asd1
> > > >
> > > > Second ipset destroy reports an error "ipset v7.21: Set cannot be
> > > > destroyed: it is in use by a kernel component".
> > > > If this command is repeated after a short delay, it deletes ipset
> > > > without any problems.
> > > >
> > > > It seems it could be fixed with that kernel module patch:
> > > >
> > > > Index: linux-6.7.9/net/netfilter/ipset/ip_set_core.c
> > > > ===================================================================
> > > > --- linux-6.7.9.orig/net/netfilter/ipset/ip_set_core.c
> > > > +++ linux-6.7.9/net/netfilter/ipset/ip_set_core.c
> > > > @@ -1241,6 +1241,9 @@ static int ip_set_destroy(struct sk_buff
> > > >   u32 flags = flag_exist(info->nlh);
> > > >   u16 features = 0;
> > > >
> > > > + /* Wait for flush to ensure references are cleared */
> > > > + rcu_barrier();
> > > > +
> > > >   read_lock_bh(&ip_set_ref_lock);
> > > >   s = find_set_and_id(inst, nla_data(attr[IPSET_ATTR_SETNAME]),
> > > >      &i);
> > > >
> > > > If you have any suggestions on how this problem should be approached
> > > > please let me know.
> > >
> > > I'd better solve it in the list type itself: your patch unnecessarily
> > > slows down all set destroy operations.
> > >
> > > Best regards,
> > > Jozsef
> > > --
> > > E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.hu
> > > PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt
> > > Address : Wigner Research Centre for Physics
> > >           H-1525 Budapest 114, POB. 49, Hungary
> >
>
> --
> E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.hu
> PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt
> Address : Wigner Research Centre for Physics
>           H-1525 Budapest 114, POB. 49, Hungary
Jozsef Kadlecsik April 17, 2024, 10:26 a.m. UTC | #5
On Mon, 15 Apr 2024, keltargw wrote:

> Thank you. That seems to be working fine, but I'm not sure about how 
> readable it becomes, as destroy logic becomes splitted in two 
> inseparable parts.

Yes, that'd make reading the code harder but would be fast :-).

> How about adding list_set_flush() at the end of list_set_cancel_gc()
> instead, would that be too heavy? E.g.

Again yes and it'd result in a slower destroy operation for list type of 
sets. But the list type should be used very lightly (if at all in 
production) so I don't regard it as a performance critical area in ipset.

Your patch is fine for me. Please submit it in a form that can be applied 
in git with proper commit text part and such. Thanks!

Best regards,
Jozsef
 
> diff --git a/kernel/net/netfilter/ipset/ip_set_list_set.c
> b/kernel/net/netfilter/ipset/ip_set_list_set.c
> index cc2e5b9..1cdb68e 100644
> --- a/kernel/net/netfilter/ipset/ip_set_list_set.c
> +++ b/kernel/net/netfilter/ipset/ip_set_list_set.c
> @@ -552,6 +552,8 @@ list_set_cancel_gc(struct ip_set *set)
> 
>        if (SET_WITH_TIMEOUT(set))
>                timer_shutdown_sync(&map->gc);
> +
> +       list_set_flush(set);
> }
> 
> static const struct ip_set_type_variant set_variant = {
> 
> On Mon, 15 Apr 2024 at 14:28, Jozsef Kadlecsik <kadlec@netfilter.org> wrote:
> >
> > On Sun, 14 Apr 2024, keltargw wrote:
> >
> > > Thanks for the suggestion. I'm not that familiar with ipset source code,
> > > do you mean something like issuing a second rcu_barrier between call_rcu
> > > and returning result code back to netlink (and only doing that for list
> > > type)?
> > >
> > > As I understand it there isn't much that could be done in e.g.
> > > list_set_destroy as it might not be called yet, sitting in the rcu wait
> > > queue.
> >
> > No, I meant release the reference counter of the element sets immediately
> > when destroying a list type of set. Something like moving just the
> > ip_set_put_byindex() call
> >
> >         list_for_each_entry_safe(e, n, &map->members, list) {
> >                 ...
> >                 ip_set_put_byindex(map->net, e->id);
> >                 ...
> >         }
> >
> > from list_set_destroy() into list_set_cancel_gc(). That way the member
> > sets can be destroyed without waiting for anything.
> >
> > Best regards,
> > Jozsef
> >
> > > On Sat, 13 Apr 2024 at 19:02, Jozsef Kadlecsik <kadlec@netfilter.org> wrote:
> > > >
> > > > On Sat, 13 Apr 2024, keltargw wrote:
> > > >
> > > > > I have a problem with recent kernels. Due to delayed ipset destroy I'm
> > > > > unable to destroy ipset that was recently in use by another (destroyed)
> > > > > ipset. It is demonstrated by this example:
> > > > >
> > > > > #!/bin/bash
> > > > > set -x
> > > > >
> > > > > ipset create qwe1 list:set
> > > > > ipset create asd1 hash:net
> > > > > ipset add qwe1 asd1
> > > > > ipset add asd1 1.1.1.1
> > > > >
> > > > > ipset destroy qwe1
> > > > > ipset list asd1 -t
> > > > > ipset destroy asd1
> > > > >
> > > > > Second ipset destroy reports an error "ipset v7.21: Set cannot be
> > > > > destroyed: it is in use by a kernel component".
> > > > > If this command is repeated after a short delay, it deletes ipset
> > > > > without any problems.
> > > > >
> > > > > It seems it could be fixed with that kernel module patch:
> > > > >
> > > > > Index: linux-6.7.9/net/netfilter/ipset/ip_set_core.c
> > > > > ===================================================================
> > > > > --- linux-6.7.9.orig/net/netfilter/ipset/ip_set_core.c
> > > > > +++ linux-6.7.9/net/netfilter/ipset/ip_set_core.c
> > > > > @@ -1241,6 +1241,9 @@ static int ip_set_destroy(struct sk_buff
> > > > >   u32 flags = flag_exist(info->nlh);
> > > > >   u16 features = 0;
> > > > >
> > > > > + /* Wait for flush to ensure references are cleared */
> > > > > + rcu_barrier();
> > > > > +
> > > > >   read_lock_bh(&ip_set_ref_lock);
> > > > >   s = find_set_and_id(inst, nla_data(attr[IPSET_ATTR_SETNAME]),
> > > > >      &i);
> > > > >
> > > > > If you have any suggestions on how this problem should be approached
> > > > > please let me know.
> > > >
> > > > I'd better solve it in the list type itself: your patch unnecessarily
> > > > slows down all set destroy operations.
> > > >
> > > > Best regards,
> > > > Jozsef
> > > > --
> > > > E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.hu
> > > > PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt
> > > > Address : Wigner Research Centre for Physics
> > > >           H-1525 Budapest 114, POB. 49, Hungary
> > >
> >
> > --
> > E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.hu
> > PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt
> > Address : Wigner Research Centre for Physics
> >           H-1525 Budapest 114, POB. 49, Hungary
>
diff mbox series

Patch

Index: linux-6.7.9/net/netfilter/ipset/ip_set_core.c
===================================================================
--- linux-6.7.9.orig/net/netfilter/ipset/ip_set_core.c
+++ linux-6.7.9/net/netfilter/ipset/ip_set_core.c
@@ -1241,6 +1241,9 @@  static int ip_set_destroy(struct sk_buff
  u32 flags = flag_exist(info->nlh);
  u16 features = 0;

+ /* Wait for flush to ensure references are cleared */
+ rcu_barrier();
+
  read_lock_bh(&ip_set_ref_lock);
  s = find_set_and_id(inst, nla_data(attr[IPSET_ATTR_SETNAME]),
     &i);