diff mbox

[net] net: try harder to not reuse ifindex when moving interfaces

Message ID 294c7a9df554506e684adbeb9bbed070e6fed260.1444993627.git.jbenc@redhat.com
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Jiri Benc Oct. 16, 2015, 11:07 a.m. UTC
When moving interfaces to a different netns, the ifindex of the interface is
kept if possible. However, this is not kept in sync with allocation of new
interfaces in the target netns. While the ifindex will be skipped when
creating a new interface in the netns, it will be reused when the old
interface disappeared since.

This causes races for GUI tools in situations like this:

1. create netns 'new_netns'
2. in root netns, move the interface with ifindex 2 to new_netns
3. in new_netns, delete the interface with ifindex 2
4. in new_netns, create an interface - it will get ifindex 2

Ensure that newly allocated interfaces in a netns get ifindex higher than
any interface that has appeared in the netns. This of course does not fix
the reuse problem for the applications; it just makes it less likely to be
hit in common usage patterns.

Reported-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
---
 net/core/dev.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

Comments

Alexei Starovoitov Oct. 18, 2015, 3:11 p.m. UTC | #1
On Fri, Oct 16, 2015 at 01:07:59PM +0200, Jiri Benc wrote:
> When moving interfaces to a different netns, the ifindex of the interface is
> kept if possible. However, this is not kept in sync with allocation of new
> interfaces in the target netns. While the ifindex will be skipped when
> creating a new interface in the netns, it will be reused when the old
> interface disappeared since.
> 
> This causes races for GUI tools in situations like this:
> 
> 1. create netns 'new_netns'
> 2. in root netns, move the interface with ifindex 2 to new_netns
> 3. in new_netns, delete the interface with ifindex 2
> 4. in new_netns, create an interface - it will get ifindex 2
> 
> Ensure that newly allocated interfaces in a netns get ifindex higher than
> any interface that has appeared in the netns. This of course does not fix
> the reuse problem for the applications; it just makes it less likely to be
> hit in common usage patterns.
> 
> Reported-by: Thomas Haller <thaller@redhat.com>
> Signed-off-by: Jiri Benc <jbenc@redhat.com>
> ---
>  net/core/dev.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 6bb6470f5b7b..e3d05c20f0ef 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -6137,6 +6137,23 @@ static int dev_new_index(struct net *net)
>  	}
>  }
>  
> +/**
> + *	dev_update_index - update the ifindex used for allocation
> + *	@net: the applicable net namespace
> + *	@ifindex: the assigned ifindex
> + *
> + *	Updates the notion of currently allocated maximal ifindex to
> + *	decrease likelihood of ifindex reuse when the ifindex was assigned
> + *	by other means than calling dev_new_index (e.g. when moving
> + *	interface across net namespaces).  The caller must hold the rtnl
> + *	semaphore or the dev_base_lock.
> + */
> +static void dev_update_index(struct net *net, int ifindex)
> +{
> +	if (ifindex > net->ifindex)
> +		net->ifindex = ifindex;
> +}
> +

it looks dangerous.
Does it mean that 'for (4B) { create new dev; free old dev; }
will keep incrementing that max index and dos it eventually?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Benc Oct. 19, 2015, 9:06 a.m. UTC | #2
On Sun, 18 Oct 2015 08:11:58 -0700, Alexei Starovoitov wrote:
> it looks dangerous.
> Does it mean that 'for (4B) { create new dev; free old dev; }
> will keep incrementing that max index and dos it eventually?

This is not changed by this patch in any way. As for the current
behavior (with or without my patch), by creating and deleting an
interface, the max index indeed keeps incrementing. There's no DoS,
however, as the index simply wraps to 1 when reaching maxint. See
dev_new_index(). This is something I count on in this patch.

 Jiri
Alexei Starovoitov Oct. 19, 2015, 3:36 p.m. UTC | #3
On Mon, Oct 19, 2015 at 11:06:49AM +0200, Jiri Benc wrote:
> On Sun, 18 Oct 2015 08:11:58 -0700, Alexei Starovoitov wrote:
> > it looks dangerous.
> > Does it mean that 'for (4B) { create new dev; free old dev; }
> > will keep incrementing that max index and dos it eventually?
> 
> This is not changed by this patch in any way. As for the current
> behavior (with or without my patch), by creating and deleting an
> interface, the max index indeed keeps incrementing. There's no DoS,
> however, as the index simply wraps to 1 when reaching maxint. See
> dev_new_index(). This is something I count on in this patch.

makes sense. thanks for explaining.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Oct. 21, 2015, 2:43 p.m. UTC | #4
From: Jiri Benc <jbenc@redhat.com>
Date: Fri, 16 Oct 2015 13:07:59 +0200

> This of course does not fix the reuse problem for the applications;
> it just makes it less likely to be hit in common usage patterns.

Not only does this not fix the problem, it makes the incentive to fix
that problem much smaller.

Therefore I am not applying this patch, sorry.

Fix the real problem, then come talk to us.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Benc Oct. 21, 2015, 2:46 p.m. UTC | #5
On Wed, 21 Oct 2015 07:43:32 -0700 (PDT), David Miller wrote:
> Fix the real problem, then come talk to us.

I don't think the real problem is fixable, given that any kind of
unique non-settable identifier would break CRIU. And anything settable
will have the exact same problem. All we can do is narrowing the race
window.

For example, we could always alloc a new ifindex when moving interfaces
between name spaces. That would be probably the tiniest race window we
could get to (still not zero!) but I guess it would break apps that
assume that ifindex doesn't change when moving interfaces between name
spaces (which is not true, such apps are already broken, they just
happen to work in 99% of cases). The second best solution that doesn't
break those apps at the cost of leaving the race window wider, is this
patch.

But whatever, I don't care enough about this.

 Jiri
Jiri Benc Oct. 21, 2015, 3:25 p.m. UTC | #6
On Wed, 21 Oct 2015 08:32:14 -0700 (PDT), David Miller wrote:
> As you say the apps are broken, so file a bug and have them fixed.
> 
> The assumption is clearly invalid, so apps cannot make such an
> assumption.

Does it mean you would be okay with a patch that always allocates and
assigns a new ifindex in the target netns when interface is moved
between name spaces?

 Jiri
David Miller Oct. 21, 2015, 3:32 p.m. UTC | #7
From: Jiri Benc <jbenc@redhat.com>
Date: Wed, 21 Oct 2015 16:46:13 +0200

> For example, we could always alloc a new ifindex when moving interfaces
> between name spaces. That would be probably the tiniest race window we
> could get to (still not zero!) but I guess it would break apps that
> assume that ifindex doesn't change when moving interfaces between name
> spaces (which is not true, such apps are already broken, they just
> happen to work in 99% of cases). The second best solution that doesn't
> break those apps at the cost of leaving the race window wider, is this
> patch.

As you say the apps are broken, so file a bug and have them fixed.

The assumption is clearly invalid, so apps cannot make such an
assumption.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Oct. 21, 2015, 3:56 p.m. UTC | #8
From: Jiri Benc <jbenc@redhat.com>
Date: Wed, 21 Oct 2015 17:25:02 +0200

> On Wed, 21 Oct 2015 08:32:14 -0700 (PDT), David Miller wrote:
>> As you say the apps are broken, so file a bug and have them fixed.
>> 
>> The assumption is clearly invalid, so apps cannot make such an
>> assumption.
> 
> Does it mean you would be okay with a patch that always allocates and
> assigns a new ifindex in the target netns when interface is moved
> between name spaces?

I think you're misunderstanding me if you're still recommending
kernel changes.

I'm plainly saying to remove the assumption in the apps.

If you don't show me exactly how some kernel change can lead to
the apps implementing things properly, without the invalid
assumptions, then I can only assume you didn't hear what I
said.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Oct. 21, 2015, 5:12 p.m. UTC | #9
Hello,

On Wed, Oct 21, 2015, at 17:56, David Miller wrote:
> From: Jiri Benc <jbenc@redhat.com>
> Date: Wed, 21 Oct 2015 17:25:02 +0200
> 
> > On Wed, 21 Oct 2015 08:32:14 -0700 (PDT), David Miller wrote:
> >> As you say the apps are broken, so file a bug and have them fixed.
> >> 
> >> The assumption is clearly invalid, so apps cannot make such an
> >> assumption.
> > 
> > Does it mean you would be okay with a patch that always allocates and
> > assigns a new ifindex in the target netns when interface is moved
> > between name spaces?
> 
> I think you're misunderstanding me if you're still recommending
> kernel changes.
> 
> I'm plainly saying to remove the assumption in the apps.
> 
> If you don't show me exactly how some kernel change can lead to
> the apps implementing things properly, without the invalid
> assumptions, then I can only assume you didn't hear what I
> said.

I think the reason why ifindexes exists as ints is that we want to have
lightweight way to refer to interfaces without taking references or
timestamps or generation ids which completely remove the possibility for
races. But the racy nature in ifindexes is something we actually want,
otherwise a user space program acquiring an ifindex would need to get a
reference on the device and either during socket close or program
termination release it, that would be very costly.

This patch minimizes the race quite a lot, from something we could
actually see in everydays container creation to probably something only
some users will expire with depleting the ifindex pool or playing around
with CRIU.

We could come up with more heavy machinery to close the race further for
CRIU by keeping track of "poisoned" ifindexes, which would need a
hashmap which could become pretty big and we could recycle when ifindex
wraps around, but this seems too heavy weight to me.

I am in favor of a solution to minimize this race in the kernel even
though we cannot ever close it completely.

Bye,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicolas Dichtel Oct. 22, 2015, 2:52 p.m. UTC | #10
Le 21/10/2015 19:12, Hannes Frederic Sowa a écrit :
> Hello,
>
> On Wed, Oct 21, 2015, at 17:56, David Miller wrote:
>> From: Jiri Benc <jbenc@redhat.com>
>> Date: Wed, 21 Oct 2015 17:25:02 +0200
>>
>>> On Wed, 21 Oct 2015 08:32:14 -0700 (PDT), David Miller wrote:
>>>> As you say the apps are broken, so file a bug and have them fixed.
>>>>
>>>> The assumption is clearly invalid, so apps cannot make such an
>>>> assumption.
>>>
>>> Does it mean you would be okay with a patch that always allocates and
>>> assigns a new ifindex in the target netns when interface is moved
>>> between name spaces?
>>
>> I think you're misunderstanding me if you're still recommending
>> kernel changes.
>>
>> I'm plainly saying to remove the assumption in the apps.
>>
>> If you don't show me exactly how some kernel change can lead to
>> the apps implementing things properly, without the invalid
>> assumptions, then I can only assume you didn't hear what I
>> said.
>
> I think the reason why ifindexes exists as ints is that we want to have
> lightweight way to refer to interfaces without taking references or
> timestamps or generation ids which completely remove the possibility for
> races. But the racy nature in ifindexes is something we actually want,
> otherwise a user space program acquiring an ifindex would need to get a
> reference on the device and either during socket close or program
> termination release it, that would be very costly.
>
> This patch minimizes the race quite a lot, from something we could
> actually see in everydays container creation to probably something only
> some users will expire with depleting the ifindex pool or playing around
> with CRIU.
>
> We could come up with more heavy machinery to close the race further for
> CRIU by keeping track of "poisoned" ifindexes, which would need a
> hashmap which could become pretty big and we could recycle when ifindex
> wraps around, but this seems too heavy weight to me.
>
> I am in favor of a solution to minimize this race in the kernel even
> though we cannot ever close it completely.
I probably miss something, but if the app listens netlink, I don't see how such
app may have a race window.

With the proposed scenario:
1. create netns 'new_netns'
2. in root netns, move the interface with ifindex 2 to new_netns
3. in new_netns, delete the interface with ifindex 2
4. in new_netns, create an interface - it will get ifindex 2

Operation 2 and 4 are done by dev_change_net_namespace() under rtnl_lock().
RTM_DELLINK(root netns) and RTM_NEWLINK(new_netns) are sent by this function.
It means that operation 3 has been done before and that RTM_DELLINK(new_netns)
has been sent before.

Regards,
Nicolas
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Benc Oct. 22, 2015, 3 p.m. UTC | #11
On Thu, 22 Oct 2015 16:52:13 +0200, Nicolas Dichtel wrote:
> With the proposed scenario:
> 1. create netns 'new_netns'
> 2. in root netns, move the interface with ifindex 2 to new_netns
> 3. in new_netns, delete the interface with ifindex 2
> 4. in new_netns, create an interface - it will get ifindex 2
> 
> Operation 2 and 4 are done by dev_change_net_namespace() under rtnl_lock().
> RTM_DELLINK(root netns) and RTM_NEWLINK(new_netns) are sent by this function.
> It means that operation 3 has been done before and that RTM_DELLINK(new_netns)
> has been sent before.

Imagine the application trying to configure the interface with ifindex 2
after your step 2. It constructs a netlink message and sends it to the
kernel; but while doing so, steps 3 and 4 happen. Now the application
ends up configuring a different interface than it intended to. After
that, it polls the netlink socket and receives the notifications about
interface disappearing and a new one appearing.

I don't see any way the user space application can prevent this. There
will always be a race between receiving netlink notifications and
sending config requests.

I guess Thomas Haller can elaborate more as he ran into this.

 Jiri
Hannes Frederic Sowa Oct. 22, 2015, 3:10 p.m. UTC | #12
Hello,

On Thu, Oct 22, 2015, at 17:00, Jiri Benc wrote:
> On Thu, 22 Oct 2015 16:52:13 +0200, Nicolas Dichtel wrote:
> > With the proposed scenario:
> > 1. create netns 'new_netns'
> > 2. in root netns, move the interface with ifindex 2 to new_netns
> > 3. in new_netns, delete the interface with ifindex 2
> > 4. in new_netns, create an interface - it will get ifindex 2
> > 
> > Operation 2 and 4 are done by dev_change_net_namespace() under rtnl_lock().
> > RTM_DELLINK(root netns) and RTM_NEWLINK(new_netns) are sent by this function.
> > It means that operation 3 has been done before and that RTM_DELLINK(new_netns)
> > has been sent before.
> 
> Imagine the application trying to configure the interface with ifindex 2
> after your step 2. It constructs a netlink message and sends it to the
> kernel; but while doing so, steps 3 and 4 happen. Now the application
> ends up configuring a different interface than it intended to. After
> that, it polls the netlink socket and receives the notifications about
> interface disappearing and a new one appearing.
> 
> I don't see any way the user space application can prevent this. There
> will always be a race between receiving netlink notifications and
> sending config requests.

ifindexes are not only used with netlink monitor but also normal socket
api, so it would make sense to try to not reassign any ifindex before
the overflow of the net->ifindex as they don't listen for notifications
of interface removals or creations.

Thanks,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Haller Oct. 22, 2015, 3:20 p.m. UTC | #13
On Thu, 2015-10-22 at 17:00 +0200, Jiri Benc wrote:
> On Thu, 22 Oct 2015 16:52:13 +0200, Nicolas Dichtel wrote:
> > With the proposed scenario:
> > 1. create netns 'new_netns'
> > 2. in root netns, move the interface with ifindex 2 to new_netns
> > 3. in new_netns, delete the interface with ifindex 2
> > 4. in new_netns, create an interface - it will get ifindex 2
> > 
> > Operation 2 and 4 are done by dev_change_net_namespace() under
> > rtnl_lock().
> > RTM_DELLINK(root netns) and RTM_NEWLINK(new_netns) are sent by this
> > function.
> > It means that operation 3 has been done before and that
> > RTM_DELLINK(new_netns)
> > has been sent before.
> 
> Imagine the application trying to configure the interface with
> ifindex 2
> after your step 2. It constructs a netlink message and sends it to
> the
> kernel; but while doing so, steps 3 and 4 happen. Now the application
> ends up configuring a different interface than it intended to. After
> that, it polls the netlink socket and receives the notifications
> about
> interface disappearing and a new one appearing.
> 
> I don't see any way the user space application can prevent this.
> There
> will always be a race between receiving netlink notifications and
> sending config requests.
> 
> I guess Thomas Haller can elaborate more as he ran into this.

Jiri,

It's really just what you said.

Whatever action the application wants to perform when using the
ifindex, there is a time-window between learning about the ifindex
and using it.

There is nothing userspace can do except trying to hurry and hoping for
the best.


Thomas
Thomas Haller Oct. 22, 2015, 3:21 p.m. UTC | #14
On Thu, 2015-10-22 at 16:52 +0200, Nicolas Dichtel wrote:
> Le 21/10/2015 19:12, Hannes Frederic Sowa a écrit :
> > Hello,
> > 
> > On Wed, Oct 21, 2015, at 17:56, David Miller wrote:
> > > From: Jiri Benc <jbenc@redhat.com>
> > > Date: Wed, 21 Oct 2015 17:25:02 +0200
> > > 
> > > > On Wed, 21 Oct 2015 08:32:14 -0700 (PDT), David Miller wrote:
> > > > > As you say the apps are broken, so file a bug and have them
> > > > > fixed.
> > > > > 
> > > > > The assumption is clearly invalid, so apps cannot make such
> > > > > an
> > > > > assumption.
> > > > 
> > > > Does it mean you would be okay with a patch that always
> > > > allocates and
> > > > assigns a new ifindex in the target netns when interface is
> > > > moved
> > > > between name spaces?
> > > 
> > > I think you're misunderstanding me if you're still recommending
> > > kernel changes.
> > > 
> > > I'm plainly saying to remove the assumption in the apps.
> > > 
> > > If you don't show me exactly how some kernel change can lead to
> > > the apps implementing things properly, without the invalid
> > > assumptions, then I can only assume you didn't hear what I
> > > said.
> > 
> > I think the reason why ifindexes exists as ints is that we want to
> > have
> > lightweight way to refer to interfaces without taking references or
> > timestamps or generation ids which completely remove the
> > possibility for
> > races. But the racy nature in ifindexes is something we actually
> > want,
> > otherwise a user space program acquiring an ifindex would need to
> > get a
> > reference on the device and either during socket close or program
> > termination release it, that would be very costly.
> > 
> > This patch minimizes the race quite a lot, from something we could
> > actually see in everydays container creation to probably something
> > only
> > some users will expire with depleting the ifindex pool or playing
> > around
> > with CRIU.
> > 
> > We could come up with more heavy machinery to close the race
> > further for
> > CRIU by keeping track of "poisoned" ifindexes, which would need a
> > hashmap which could become pretty big and we could recycle when
> > ifindex
> > wraps around, but this seems too heavy weight to me.
> > 
> > I am in favor of a solution to minimize this race in the kernel
> > even
> > though we cannot ever close it completely.
> I probably miss something, but if the app listens netlink, I don't
> see how such
> app may have a race window.


Userspace uses the ifindex as identifier for the interface, for example
when changing a link or adding and IP address.

(1) Say, the application listens on netlink and learns about an
interface (and its ifindex).

(2) Immediately it does something with the ifindex (e.g. IF_UP the
interface).


Between (1) and (2) is a possibility of a race, that the application
cannot avoid. It can only hurry and hope for the best.



Thomas
Nicolas Dichtel Oct. 22, 2015, 3:23 p.m. UTC | #15
Le 22/10/2015 17:00, Jiri Benc a écrit :
> On Thu, 22 Oct 2015 16:52:13 +0200, Nicolas Dichtel wrote:
>> With the proposed scenario:
>> 1. create netns 'new_netns'
>> 2. in root netns, move the interface with ifindex 2 to new_netns
>> 3. in new_netns, delete the interface with ifindex 2
>> 4. in new_netns, create an interface - it will get ifindex 2
>>
>> Operation 2 and 4 are done by dev_change_net_namespace() under rtnl_lock().
>> RTM_DELLINK(root netns) and RTM_NEWLINK(new_netns) are sent by this function.
>> It means that operation 3 has been done before and that RTM_DELLINK(new_netns)
>> has been sent before.
>
> Imagine the application trying to configure the interface with ifindex 2
> after your step 2. It constructs a netlink message and sends it to the
> kernel; but while doing so, steps 3 and 4 happen. Now the application
> ends up configuring a different interface than it intended to. After
> that, it polls the netlink socket and receives the notifications about
> interface disappearing and a new one appearing.
Understood.

>
> I don't see any way the user space application can prevent this. There
> will always be a race between receiving netlink notifications and
> sending config requests.
Yeah, I'm also starting to think that reducing this window is the way to go.


Thank you,
Nicolas
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Graf Oct. 22, 2015, 4:45 p.m. UTC | #16
On 10/22/15 at 05:00pm, Jiri Benc wrote:
> On Thu, 22 Oct 2015 16:52:13 +0200, Nicolas Dichtel wrote:
> > With the proposed scenario:
> > 1. create netns 'new_netns'
> > 2. in root netns, move the interface with ifindex 2 to new_netns
> > 3. in new_netns, delete the interface with ifindex 2
> > 4. in new_netns, create an interface - it will get ifindex 2
> > 
> > Operation 2 and 4 are done by dev_change_net_namespace() under rtnl_lock().
> > RTM_DELLINK(root netns) and RTM_NEWLINK(new_netns) are sent by this function.
> > It means that operation 3 has been done before and that RTM_DELLINK(new_netns)
> > has been sent before.
> 
> Imagine the application trying to configure the interface with ifindex 2
> after your step 2. It constructs a netlink message and sends it to the
> kernel; but while doing so, steps 3 and 4 happen. Now the application
> ends up configuring a different interface than it intended to. After
> that, it polls the netlink socket and receives the notifications about
> interface disappearing and a new one appearing.
> 
> I don't see any way the user space application can prevent this. There
> will always be a race between receiving netlink notifications and
> sending config requests.
> 
> I guess Thomas Haller can elaborate more as he ran into this.

I understand the race but when does it occur? Whoever creates
the original interface owns it and is responsible for its
lifecycle. *Iff* for some reason multiple entities manipulate
the interface, then it's probably a lot safer to just use flock
or something similar to serialize access entirely in user space.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Oct. 22, 2015, 5:21 p.m. UTC | #17
Hi Thomas,

On Thu, Oct 22, 2015, at 18:45, Thomas Graf wrote:
> On 10/22/15 at 05:00pm, Jiri Benc wrote:
> > On Thu, 22 Oct 2015 16:52:13 +0200, Nicolas Dichtel wrote:
> > > With the proposed scenario:
> > > 1. create netns 'new_netns'
> > > 2. in root netns, move the interface with ifindex 2 to new_netns
> > > 3. in new_netns, delete the interface with ifindex 2
> > > 4. in new_netns, create an interface - it will get ifindex 2
> > > 
> > > Operation 2 and 4 are done by dev_change_net_namespace() under rtnl_lock().
> > > RTM_DELLINK(root netns) and RTM_NEWLINK(new_netns) are sent by this function.
> > > It means that operation 3 has been done before and that RTM_DELLINK(new_netns)
> > > has been sent before.
> > 
> > Imagine the application trying to configure the interface with ifindex 2
> > after your step 2. It constructs a netlink message and sends it to the
> > kernel; but while doing so, steps 3 and 4 happen. Now the application
> > ends up configuring a different interface than it intended to. After
> > that, it polls the netlink socket and receives the notifications about
> > interface disappearing and a new one appearing.
> > 
> > I don't see any way the user space application can prevent this. There
> > will always be a race between receiving netlink notifications and
> > sending config requests.
> > 
> > I guess Thomas Haller can elaborate more as he ran into this.
> 
> I understand the race but when does it occur? Whoever creates
> the original interface owns it and is responsible for its
> lifecycle. *Iff* for some reason multiple entities manipulate
> the interface, then it's probably a lot safer to just use flock
> or something similar to serialize access entirely in user space.

This only works if all networking configuration programs would
standardize on the same flock. Also, under memory pressure we lose
netlink monitor messages, so we need to deal with timeouts and retries
and manual sync up on the networking configuration, which makes this
scheme a lot harder. For normal socket io, where we specify e.g. ifindex
in sin6_addr, this is not really usable at all.

Bye,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Graf Oct. 22, 2015, 6:56 p.m. UTC | #18
On 10/22/15 at 07:21pm, Hannes Frederic Sowa wrote:
> Hi Thomas,
> 
> On Thu, Oct 22, 2015, at 18:45, Thomas Graf wrote:
> > I understand the race but when does it occur? Whoever creates
> > the original interface owns it and is responsible for its
> > lifecycle. *Iff* for some reason multiple entities manipulate
> > the interface, then it's probably a lot safer to just use flock
> > or something similar to serialize access entirely in user space.
> 
> This only works if all networking configuration programs would
> standardize on the same flock. Also, under memory pressure we lose
> netlink monitor messages, so we need to deal with timeouts and retries
> and manual sync up on the networking configuration, which makes this
> scheme a lot harder. For normal socket io, where we specify e.g. ifindex
> in sin6_addr, this is not really usable at all.

Again, what is the scenario where this happens? Is this being
hit or are we talking theoretical races? I'd like to understand
the background of this.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Haller Oct. 23, 2015, 10:40 a.m. UTC | #19
On Thu, 2015-10-22 at 20:56 +0200, Thomas Graf wrote:
> On 10/22/15 at 07:21pm, Hannes Frederic Sowa wrote:
> > Hi Thomas,
> > 
> > On Thu, Oct 22, 2015, at 18:45, Thomas Graf wrote:
> > > I understand the race but when does it occur? Whoever creates
> > > the original interface owns it and is responsible for its
> > > lifecycle. *Iff* for some reason multiple entities manipulate
> > > the interface, then it's probably a lot safer to just use flock
> > > or something similar to serialize access entirely in user space.
> > 
> > This only works if all networking configuration programs would
> > standardize on the same flock. Also, under memory pressure we lose
> > netlink monitor messages, so we need to deal with timeouts and
> > retries
> > and manual sync up on the networking configuration, which makes
> > this
> > scheme a lot harder. For normal socket io, where we specify e.g.
> > ifindex
> > in sin6_addr, this is not really usable at all.
> 
> Again, what is the scenario where this happens? Is this being
> hit or are we talking theoretical races? I'd like to understand
> the background of this.

  ip netns add N1
  ip netns add N2

  ip netns exec N1 ip link add type dummy
  ip netns exec N2 ip link add type dummy

  ip netns exec N1 ip monitor &

  ip netns exec N1 ip link delete dummy0
  ip netns exec N2 ip link set dummy0 netns N1


Honestly, I didn't experience a concrete bug due to this.

But it's common to treat the ifindex as unique identifier.
By reusing the ifindex immediately as in the example above, it
could happen to mix up interfaces.


Thomas
diff mbox

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index 6bb6470f5b7b..e3d05c20f0ef 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6137,6 +6137,23 @@  static int dev_new_index(struct net *net)
 	}
 }
 
+/**
+ *	dev_update_index - update the ifindex used for allocation
+ *	@net: the applicable net namespace
+ *	@ifindex: the assigned ifindex
+ *
+ *	Updates the notion of currently allocated maximal ifindex to
+ *	decrease likelihood of ifindex reuse when the ifindex was assigned
+ *	by other means than calling dev_new_index (e.g. when moving
+ *	interface across net namespaces).  The caller must hold the rtnl
+ *	semaphore or the dev_base_lock.
+ */
+static void dev_update_index(struct net *net, int ifindex)
+{
+	if (ifindex > net->ifindex)
+		net->ifindex = ifindex;
+}
+
 /* Delayed registration/unregisteration */
 static LIST_HEAD(net_todo_list);
 DECLARE_WAIT_QUEUE_HEAD(netdev_unregistering_wq);
@@ -7262,6 +7279,8 @@  int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
 	/* If there is an ifindex conflict assign a new one */
 	if (__dev_get_by_index(net, dev->ifindex))
 		dev->ifindex = dev_new_index(net);
+	else
+		dev_update_index(net, dev->ifindex);
 
 	/* Send a netdev-add uevent to the new namespace */
 	kobject_uevent(&dev->dev.kobj, KOBJ_ADD);