[net-next] ipv4: initialize ra_mutex in inet_init_net()

Message ID 20180914203242.2712-1-xiyou.wangcong@gmail.com
State Changes Requested
Delegated to: David Miller
Headers show
Series
  • [net-next] ipv4: initialize ra_mutex in inet_init_net()
Related show

Commit Message

Cong Wang Sept. 14, 2018, 8:32 p.m.
ra_mutex is a IPv4 specific mutex, it is inside struct netns_ipv4,
but its initialization is in the generic netns code, setup_net().

Move it to IPv4 specific net init code, inet_init_net().

Fixes: d9ff3049739e ("net: Replace ip_ra_lock with per-net mutex")
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 net/core/net_namespace.c | 1 -
 net/ipv4/af_inet.c       | 2 ++
 2 files changed, 2 insertions(+), 1 deletion(-)

Comments

Kirill Tkhai Sept. 17, 2018, 7:25 a.m. | #1
On 14.09.2018 23:32, Cong Wang wrote:
> ra_mutex is a IPv4 specific mutex, it is inside struct netns_ipv4,
> but its initialization is in the generic netns code, setup_net().
> 
> Move it to IPv4 specific net init code, inet_init_net().
> 
> Fixes: d9ff3049739e ("net: Replace ip_ra_lock with per-net mutex")
> Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---
>  net/core/net_namespace.c | 1 -
>  net/ipv4/af_inet.c       | 2 ++
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> index 670c84b1bfc2..b272ccfcbf63 100644
> --- a/net/core/net_namespace.c
> +++ b/net/core/net_namespace.c
> @@ -308,7 +308,6 @@ static __net_init int setup_net(struct net *net, struct user_namespace *user_ns)
>  	net->user_ns = user_ns;
>  	idr_init(&net->netns_ids);
>  	spin_lock_init(&net->nsid_lock);
> -	mutex_init(&net->ipv4.ra_mutex);
>  
>  	list_for_each_entry(ops, &pernet_list, list) {
>  		error = ops_init(ops, net);
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 20fda8fb8ffd..57b7bffb93e5 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1817,6 +1817,8 @@ static __net_init int inet_init_net(struct net *net)
>  	net->ipv4.sysctl_igmp_llm_reports = 1;
>  	net->ipv4.sysctl_igmp_qrv = 2;
>  
> +	mutex_init(&net->ipv4.ra_mutex);
> +

In inet_init() the order of registration is:

	ip_mr_init();
	init_inet_pernet_ops();

This means, ipmr_net_ops pernet operations are before af_inet_ops
in pernet_list. So, there is a theoretical probability, sometimes
in the future, we will have a problem during a fail of net initialization.

Say,

setup_net():
	ipmr_net_ops->init() returns 0
	xxx->init()          returns error
and then we do:
	ipmr_net_ops->exit(),

which could touch ra_mutex (theoretically).

Your patch is OK, but since you do this, we may also swap the order
of registration of ipmr_net_ops and af_inet_ops better too.

Kirill
David Miller Sept. 17, 2018, 3:02 p.m. | #2
From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Fri, 14 Sep 2018 13:32:42 -0700

> ra_mutex is a IPv4 specific mutex, it is inside struct netns_ipv4,
> but its initialization is in the generic netns code, setup_net().
> 
> Move it to IPv4 specific net init code, inet_init_net().
> 
> Fixes: d9ff3049739e ("net: Replace ip_ra_lock with per-net mutex")
> Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Please take into consideration Kirill's feedback.

Thank you.
Cong Wang Sept. 18, 2018, 8:17 p.m. | #3
On Mon, Sep 17, 2018 at 12:25 AM Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
> In inet_init() the order of registration is:
>
>         ip_mr_init();
>         init_inet_pernet_ops();
>
> This means, ipmr_net_ops pernet operations are before af_inet_ops
> in pernet_list. So, there is a theoretical probability, sometimes
> in the future, we will have a problem during a fail of net initialization.
>
> Say,
>
> setup_net():
>         ipmr_net_ops->init() returns 0
>         xxx->init()          returns error
> and then we do:
>         ipmr_net_ops->exit(),
>
> which could touch ra_mutex (theoretically).

How could ra_mutex be touched in this scenario?

ra_mutex is only used in ip_ra_control() which is called
only by {get,set}sockopt(). I don't see anything related
to netns exit() path here.
Kirill Tkhai Sept. 19, 2018, 8:25 a.m. | #4
On 18.09.2018 23:17, Cong Wang wrote:
> On Mon, Sep 17, 2018 at 12:25 AM Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
>> In inet_init() the order of registration is:
>>
>>         ip_mr_init();
>>         init_inet_pernet_ops();
>>
>> This means, ipmr_net_ops pernet operations are before af_inet_ops
>> in pernet_list. So, there is a theoretical probability, sometimes
>> in the future, we will have a problem during a fail of net initialization.
>>
>> Say,
>>
>> setup_net():
>>         ipmr_net_ops->init() returns 0
>>         xxx->init()          returns error
>> and then we do:
>>         ipmr_net_ops->exit(),
>>
>> which could touch ra_mutex (theoretically).
> 
> How could ra_mutex be touched in this scenario?
> 
> ra_mutex is only used in ip_ra_control() which is called
> only by {get,set}sockopt(). I don't see anything related
> to netns exit() path here.

Currently, it is not touched. But it's an ordinary practice,
someone closes sockets in pernet ->exit methods. For example,
we close percpu icmp sockets in icmp_sk_exit(), which are
also of RAW type, and there is also called ip_ra_control()
for them. Yes, they differ by their protocol; icmp sockets
are of IPPROTO_ICMP protocol, while ip_ra_control() acts
on IPPROTO_RAW sockets, but it's not good anyway. This does
not look reliable for the future. In case of someone changes
something here, we may do not notice this for the long time,
while some users will meet bugs on their places.

Problems on error paths is not easy to detect on testing,
while user may meet them. We had issue of same type with
uninitialized xfrm_policy_lock. It was introduced in 2013,
while the problem was found only in 2017:

	introduced by 283bc9f35bbb
	fixed      by c282222a45cb

(Last week I met it on RH7 kernel, which still has no a fix.
 But this talk is not about distribution kernels, just about
 the way).

I just want to say if someone makes some creativity on top
of this code, it will be to more friendly from us to him/her
to not force this person to think about such not obvious details,
but just to implement nice architecture right now.

Thanks,
Kirill
Cong Wang Sept. 19, 2018, 9:28 p.m. | #5
On Wed, Sep 19, 2018 at 1:25 AM Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
>
> On 18.09.2018 23:17, Cong Wang wrote:
> > On Mon, Sep 17, 2018 at 12:25 AM Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
> >> In inet_init() the order of registration is:
> >>
> >>         ip_mr_init();
> >>         init_inet_pernet_ops();
> >>
> >> This means, ipmr_net_ops pernet operations are before af_inet_ops
> >> in pernet_list. So, there is a theoretical probability, sometimes
> >> in the future, we will have a problem during a fail of net initialization.
> >>
> >> Say,
> >>
> >> setup_net():
> >>         ipmr_net_ops->init() returns 0
> >>         xxx->init()          returns error
> >> and then we do:
> >>         ipmr_net_ops->exit(),
> >>
> >> which could touch ra_mutex (theoretically).
> >
> > How could ra_mutex be touched in this scenario?
> >
> > ra_mutex is only used in ip_ra_control() which is called
> > only by {get,set}sockopt(). I don't see anything related
> > to netns exit() path here.
>
> Currently, it is not touched. But it's an ordinary practice,
> someone closes sockets in pernet ->exit methods. For example,
> we close percpu icmp sockets in icmp_sk_exit(), which are
> also of RAW type, and there is also called ip_ra_control()
> for them. Yes, they differ by their protocol; icmp sockets
> are of IPPROTO_ICMP protocol, while ip_ra_control() acts
> on IPPROTO_RAW sockets, but it's not good anyway. This does
> not look reliable for the future. In case of someone changes
> something here, we may do not notice this for the long time,
> while some users will meet bugs on their places.

First of all, we only consider current code base. Even if you
really planned to changed this in the future, it would be still your
responsibility to take care of it. Why? Simple, FIFO. My patch
comes ahead of any future changes here, obviously.

Secondly, it is certainly not hard to notice. I am pretty sure
you would get a warning/crash if this bug triggered.



>
> Problems on error paths is not easy to detect on testing,
> while user may meet them. We had issue of same type with
> uninitialized xfrm_policy_lock. It was introduced in 2013,
> while the problem was found only in 2017:

Been there, done that, I've fixed multiple IPv6 init code
error path. This is not where we disagree, by the way.


>
>         introduced by 283bc9f35bbb
>         fixed      by c282222a45cb
>
> (Last week I met it on RH7 kernel, which still has no a fix.
>  But this talk is not about distribution kernels, just about
>  the way).
>
> I just want to say if someone makes some creativity on top
> of this code, it will be to more friendly from us to him/her
> to not force this person to think about such not obvious details,
> but just to implement nice architecture right now.

You keep saying nice architecture, how did you architect
ipv4.ra_mutex into net/core/net_namespace.c? It is apparently
not nice.

Good luck with your future.

I am tired, let's just drop it. I have no interest to fix your mess.

Thanks!
Kirill Tkhai Sept. 20, 2018, 9:04 a.m. | #6
On 20.09.2018 0:28, Cong Wang wrote:
> On Wed, Sep 19, 2018 at 1:25 AM Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
>>
>> On 18.09.2018 23:17, Cong Wang wrote:
>>> On Mon, Sep 17, 2018 at 12:25 AM Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
>>>> In inet_init() the order of registration is:
>>>>
>>>>         ip_mr_init();
>>>>         init_inet_pernet_ops();
>>>>
>>>> This means, ipmr_net_ops pernet operations are before af_inet_ops
>>>> in pernet_list. So, there is a theoretical probability, sometimes
>>>> in the future, we will have a problem during a fail of net initialization.
>>>>
>>>> Say,
>>>>
>>>> setup_net():
>>>>         ipmr_net_ops->init() returns 0
>>>>         xxx->init()          returns error
>>>> and then we do:
>>>>         ipmr_net_ops->exit(),
>>>>
>>>> which could touch ra_mutex (theoretically).
>>>
>>> How could ra_mutex be touched in this scenario?
>>>
>>> ra_mutex is only used in ip_ra_control() which is called
>>> only by {get,set}sockopt(). I don't see anything related
>>> to netns exit() path here.
>>
>> Currently, it is not touched. But it's an ordinary practice,
>> someone closes sockets in pernet ->exit methods. For example,
>> we close percpu icmp sockets in icmp_sk_exit(), which are
>> also of RAW type, and there is also called ip_ra_control()
>> for them. Yes, they differ by their protocol; icmp sockets
>> are of IPPROTO_ICMP protocol, while ip_ra_control() acts
>> on IPPROTO_RAW sockets, but it's not good anyway. This does
>> not look reliable for the future. In case of someone changes
>> something here, we may do not notice this for the long time,
>> while some users will meet bugs on their places.
> 
> First of all, we only consider current code base. Even if you
> really planned to changed this in the future, it would be still your
> responsibility to take care of it. Why? Simple, FIFO. My patch
> comes ahead of any future changes here, obviously.
> 
> Secondly, it is certainly not hard to notice. I am pretty sure
> you would get a warning/crash if this bug triggered.
> 
> 
> 
>>
>> Problems on error paths is not easy to detect on testing,
>> while user may meet them. We had issue of same type with
>> uninitialized xfrm_policy_lock. It was introduced in 2013,
>> while the problem was found only in 2017:
> 
> Been there, done that, I've fixed multiple IPv6 init code
> error path. This is not where we disagree, by the way.
> 
> 
>>
>>         introduced by 283bc9f35bbb
>>         fixed      by c282222a45cb
>>
>> (Last week I met it on RH7 kernel, which still has no a fix.
>>  But this talk is not about distribution kernels, just about
>>  the way).
>>
>> I just want to say if someone makes some creativity on top
>> of this code, it will be to more friendly from us to him/her
>> to not force this person to think about such not obvious details,
>> but just to implement nice architecture right now.
> 
> You keep saying nice architecture, how did you architect
> ipv4.ra_mutex into net/core/net_namespace.c? It is apparently
> not nice.
> 
> Good luck with your future.
> 
> I am tired, let's just drop it. I have no interest to fix your mess.

You added me to CC, so you probably want to know my opinion about this.
Since it's not a real problem fix, but just a refactoring, I say you
my opinion, how this refactoring may be made better. If you don't want
to know my opinion, you may consider not to CC me.

Just this, not a reason to take offense.

Thanks,
Kirill
Cong Wang Sept. 20, 2018, 5:05 p.m. | #7
On Thu, Sep 20, 2018 at 2:04 AM Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
>
> On 20.09.2018 0:28, Cong Wang wrote:
> You added me to CC, so you probably want to know my opinion about this.
> Since it's not a real problem fix, but just a refactoring, I say you
> my opinion, how this refactoring may be made better. If you don't want
> to know my opinion, you may consider not to CC me.

Sure, with respects to your opinions, I prefer to drop it happily.

Thanks.

Patch

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 670c84b1bfc2..b272ccfcbf63 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -308,7 +308,6 @@  static __net_init int setup_net(struct net *net, struct user_namespace *user_ns)
 	net->user_ns = user_ns;
 	idr_init(&net->netns_ids);
 	spin_lock_init(&net->nsid_lock);
-	mutex_init(&net->ipv4.ra_mutex);
 
 	list_for_each_entry(ops, &pernet_list, list) {
 		error = ops_init(ops, net);
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 20fda8fb8ffd..57b7bffb93e5 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1817,6 +1817,8 @@  static __net_init int inet_init_net(struct net *net)
 	net->ipv4.sysctl_igmp_llm_reports = 1;
 	net->ipv4.sysctl_igmp_qrv = 2;
 
+	mutex_init(&net->ipv4.ra_mutex);
+
 	return 0;
 }