Scalability of interface creation and deletion

Message ID	1304783684.9216.2.camel@edumazet-laptop
State	RFC, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=subject:from:to:cc:in-reply-to:references:content-type:date :message-id:mime-version:x-mailer:content-transfer-encoding; b=FPiZ1m1UCp4SWGs79ucQ2T9P/m4OBBV6U3SwTGUDO+sa8sxfvcQmXHic95bbrYwzEG 14CCwtAtOsaVeIQDViqhW2FGEYyl52dTRThQpkhX+hWO3vdRGzyMbsgXsCrfyVJhQMb9 l9CyVdfAAi7ai6Tma+iFyhVlpv2vD0sWjrLH4= Subject: Re: Scalability of interface creation and deletion From: Eric Dumazet <eric.dumazet@gmail.com> To: Alex Bligh <alex@alex.org.uk> Cc: netdev@vger.kernel.org In-Reply-To: <0F4A638C2A523577CDBC295E@Ximines.local> References: <891B02256A0667292521A4BF@Ximines.local> <1304770926.2821.1157.camel@edumazet-laptop> <0F4A638C2A523577CDBC295E@Ximines.local> Content-Type: text/plain; charset="UTF-8" Date: Sat, 07 May 2011 17:54:44 +0200 Message-ID: <1304783684.9216.2.camel@edumazet-laptop> Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: netdev-owner@vger.kernel.org Precedence: bulk

Eric Dumazet May 7, 2011, 3:54 p.m. UTC

Le samedi 07 mai 2011 à 16:26 +0100, Alex Bligh a écrit :
> Well, I patched it (patch attached for what it's worth) and it made
> no difference in this case. I would suggest however that it might
> be the right think to do anyway.
> 

As I said, this code should not be entered in normal situations.

You are not the first to suggest a change, but it wont help you at all.




> On the current 8 core box I am testing, I see 280ms per interface
> delete **even with only 10 interfaces**. I see 260ms with one
> interface. I know doing lots of rcu sync stuff can be slow, but
> 260ms to remove one veth pair sounds like more than rcu sync going
> on. It sounds like a sleep (though I may not have found the
> right one). I see no CPU load.
> 
> Equally, with one interface (remember I'm doing this in unshare -n
> so there is only a loopback interface there), this bit surely
> can't be sysfs.
> 

synchronize_rcu() calls are not consuming cpu, they just _wait_
rcu grace period.

I suggest you read Documentation/RCU files if you really want to :)

If you want to check how expensive it is, its quite easy:
add a trace in synchronize_net() 






--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ben Greear May 7, 2011, 4:23 p.m. UTC | #1

On 05/07/2011 08:54 AM, Eric Dumazet wrote:
> Le samedi 07 mai 2011 à 16:26 +0100, Alex Bligh a écrit :
>> Well, I patched it (patch attached for what it's worth) and it made
>> no difference in this case. I would suggest however that it might
>> be the right think to do anyway.
>>
>
> As I said, this code should not be entered in normal situations.
>
> You are not the first to suggest a change, but it wont help you at all.
>
>
>
>
>> On the current 8 core box I am testing, I see 280ms per interface
>> delete **even with only 10 interfaces**. I see 260ms with one
>> interface. I know doing lots of rcu sync stuff can be slow, but
>> 260ms to remove one veth pair sounds like more than rcu sync going
>> on. It sounds like a sleep (though I may not have found the
>> right one). I see no CPU load.
>>
>> Equally, with one interface (remember I'm doing this in unshare -n
>> so there is only a loopback interface there), this bit surely
>> can't be sysfs.
>>
>
> synchronize_rcu() calls are not consuming cpu, they just _wait_
> rcu grace period.
>
> I suggest you read Documentation/RCU files if you really want to :)
>
> If you want to check how expensive it is, its quite easy:
> add a trace in synchronize_net()
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 856b6ee..70f3c46 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -5915,8 +5915,10 @@ EXPORT_SYMBOL(free_netdev);
>    */
>   void synchronize_net(void)
>   {
> +	pr_err("begin synchronize_net()\n");
>   	might_sleep();
>   	synchronize_rcu();
> +	pr_err("end synchronize_net()\n");
>   }
>   EXPORT_SYMBOL(synchronize_net);

I wonder if it would be worth having a 'delete me soon'
method to delete interfaces that would not block on the
RCU code.

The controlling programs could use netlink messages to
know exactly when an interface was truly gone.

That should allow some batching in the sync-net logic
too, if user-space code deletes 1000 interfaces very
quickly, for instance...

Thanks,
Ben

>
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Eric Dumazet May 7, 2011, 4:37 p.m. UTC | #2

Le samedi 07 mai 2011 à 09:23 -0700, Ben Greear a écrit :

> I wonder if it would be worth having a 'delete me soon'
> method to delete interfaces that would not block on the
> RCU code.
> 
> The controlling programs could use netlink messages to
> know exactly when an interface was truly gone.
> 
> That should allow some batching in the sync-net logic
> too, if user-space code deletes 1000 interfaces very
> quickly, for instance...
> 

I suggested in the past to have an extension of batch capabilities, so
that one kthread could have 3 separate lists of devices being destroyed
in //,

This daemon would basically loop on one call to synchronize_rcu(), and
transfert list3 to deletion, list2 to list3, list1 to list2, loop,
eventually releasing RTNL while blocked in synchronize_rcu()

This would need to allow as you suggest an asynchronous deletion method,
or use a callback to wake the process blocked on device delete.

Right now, we hold RTNL for the whole 3 steps process, so we cannot use
any parallelism.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ben Greear May 7, 2011, 4:44 p.m. UTC | #3

On 05/07/2011 09:37 AM, Eric Dumazet wrote:
> Le samedi 07 mai 2011 à 09:23 -0700, Ben Greear a écrit :
>
>> I wonder if it would be worth having a 'delete me soon'
>> method to delete interfaces that would not block on the
>> RCU code.
>>
>> The controlling programs could use netlink messages to
>> know exactly when an interface was truly gone.
>>
>> That should allow some batching in the sync-net logic
>> too, if user-space code deletes 1000 interfaces very
>> quickly, for instance...
>>
>
> I suggested in the past to have an extension of batch capabilities, so
> that one kthread could have 3 separate lists of devices being destroyed
> in //,
>
> This daemon would basically loop on one call to synchronize_rcu(), and
> transfert list3 to deletion, list2 to list3, list1 to list2, loop,
> eventually releasing RTNL while blocked in synchronize_rcu()
>
> This would need to allow as you suggest an asynchronous deletion method,
> or use a callback to wake the process blocked on device delete.

I'd want to at least have the option to not block the calling
process...otherwise, it would be a lot more difficult to
quickly delete 1000 interfaces.  You'd need 1000 threads, or
sockets, or something to parallelize it otherwise, eh?

Thanks,
Ben

Eric Dumazet May 7, 2011, 4:51 p.m. UTC | #4

Le samedi 07 mai 2011 à 09:44 -0700, Ben Greear a écrit :
> On 05/07/2011 09:37 AM, Eric Dumazet wrote:
> > Le samedi 07 mai 2011 à 09:23 -0700, Ben Greear a écrit :
> >
> >> I wonder if it would be worth having a 'delete me soon'
> >> method to delete interfaces that would not block on the
> >> RCU code.
> >>
> >> The controlling programs could use netlink messages to
> >> know exactly when an interface was truly gone.
> >>
> >> That should allow some batching in the sync-net logic
> >> too, if user-space code deletes 1000 interfaces very
> >> quickly, for instance...
> >>
> >
> > I suggested in the past to have an extension of batch capabilities, so
> > that one kthread could have 3 separate lists of devices being destroyed
> > in //,
> >
> > This daemon would basically loop on one call to synchronize_rcu(), and
> > transfert list3 to deletion, list2 to list3, list1 to list2, loop,
> > eventually releasing RTNL while blocked in synchronize_rcu()
> >
> > This would need to allow as you suggest an asynchronous deletion method,
> > or use a callback to wake the process blocked on device delete.
> 
> I'd want to at least have the option to not block the calling
> process...otherwise, it would be a lot more difficult to
> quickly delete 1000 interfaces.  You'd need 1000 threads, or
> sockets, or something to parallelize it otherwise, eh?

Yes, if you can afford not receive a final notification of device being
fully freed, it should be possible.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ben Greear May 8, 2011, 3:45 a.m. UTC | #5

On 05/07/2011 09:51 AM, Eric Dumazet wrote:
> Le samedi 07 mai 2011 à 09:44 -0700, Ben Greear a écrit :
>> On 05/07/2011 09:37 AM, Eric Dumazet wrote:
>>> Le samedi 07 mai 2011 à 09:23 -0700, Ben Greear a écrit :
>>>
>>>> I wonder if it would be worth having a 'delete me soon'
>>>> method to delete interfaces that would not block on the
>>>> RCU code.
>>>>
>>>> The controlling programs could use netlink messages to
>>>> know exactly when an interface was truly gone.
>>>>
>>>> That should allow some batching in the sync-net logic
>>>> too, if user-space code deletes 1000 interfaces very
>>>> quickly, for instance...
>>>>
>>>
>>> I suggested in the past to have an extension of batch capabilities, so
>>> that one kthread could have 3 separate lists of devices being destroyed
>>> in //,
>>>
>>> This daemon would basically loop on one call to synchronize_rcu(), and
>>> transfert list3 to deletion, list2 to list3, list1 to list2, loop,
>>> eventually releasing RTNL while blocked in synchronize_rcu()
>>>
>>> This would need to allow as you suggest an asynchronous deletion method,
>>> or use a callback to wake the process blocked on device delete.
>>
>> I'd want to at least have the option to not block the calling
>> process...otherwise, it would be a lot more difficult to
>> quickly delete 1000 interfaces.  You'd need 1000 threads, or
>> sockets, or something to parallelize it otherwise, eh?
>
> Yes, if you can afford not receive a final notification of device being
> fully freed, it should be possible.

Well, I'd hope to get a netlink message about the device being deleted, and
after that, be able to create another one with the same name, etc.

Whether the memory is actually freed in the kernel or not wouldn't matter
to me...

Thanks,
Ben

Alex Bligh May 8, 2011, 8:08 a.m. UTC | #6

--On 7 May 2011 20:45:07 -0700 Ben Greear <greearb@candelatech.com> wrote:

> Well, I'd hope to get a netlink message about the device being deleted,
> and
> after that, be able to create another one with the same name, etc.
>
> Whether the memory is actually freed in the kernel or not wouldn't matter
> to me...

Provided the former para is always done, I can't actually think of a case
where the caller would /ever/ care about the latter (save perhaps
a final shutdown of the whole net subsystem).

Octavian Purdila May 9, 2011, 9:46 p.m. UTC | #7

On Sat, May 7, 2011 at 6:54 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:

>
> synchronize_rcu() calls are not consuming cpu, they just _wait_
> rcu grace period.
>
> I suggest you read Documentation/RCU files if you really want to :)
>
> If you want to check how expensive it is, its quite easy:
> add a trace in synchronize_net()
>
<snip>

I proposed adding a "wait" software counter to perf [1] a while ago,
which would allow people identify sync_rcu hotspots:

http://marc.info/?l=linux-kernel&m=129188584110162

I don't know how much visibility it got, so given this context, I
thought of bringing it up again :)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Scalability of interface creation and deletion

Commit Message

Comments

Patch