diff mbox series

mlx5: Remove call to ida_pre_get

Message ID 20180315025724.GB9973@bombadil.infradead.org
State Accepted, archived
Delegated to: David Miller
Headers show
Series mlx5: Remove call to ida_pre_get | expand

Commit Message

Matthew Wilcox March 15, 2018, 2:57 a.m. UTC
From: Matthew Wilcox <mawilcox@microsoft.com>

The mlx5 driver calls ida_pre_get() in a loop for no readily apparent
reason.  The driver uses ida_simple_get() which will call ida_pre_get()
by itself and there's no need to use ida_pre_get() unless using
ida_get_new().

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>

Comments

Saeed Mahameed March 15, 2018, 11:58 p.m. UTC | #1
On Wed, 2018-03-14 at 19:57 -0700, Matthew Wilcox wrote:
> From: Matthew Wilcox <mawilcox@microsoft.com>

> 

> The mlx5 driver calls ida_pre_get() in a loop for no readily apparent

> reason.  The driver uses ida_simple_get() which will call

> ida_pre_get()

> by itself and there's no need to use ida_pre_get() unless using

> ida_get_new().

> 


Hi Matthew,

Is this is causing any issues ? or just a simple cleanup ?

Adding Maor, the author of this change,

I believe the idea is to speed up insert_fte (which calls
ida_simple_get) since insert_fte runs under the FTE write semaphore,
in this case if ida_pre_get was successful before taking the semaphore
for all the FTE nodes in the loop, this will be a huge win for
ida_simple_get which will immediately return success without even
trying to allocate.

so it is a best effort to speed up critical path.

Maor, if this is really the case and this is not causing any issues,
then we need to consider adding a comment.


> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>

> 

> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c

> b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c

> index 10e16381f20a..3ba07c7096ef 100644

> --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c

> +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c

> @@ -1647,7 +1647,6 @@ try_add_to_existing_fg(struct mlx5_flow_table

> *ft,

>  

>  	list_for_each_entry(iter, match_head, list) {

>  		nested_down_read_ref_node(&iter->g->node,

> FS_LOCK_PARENT);

> -		ida_pre_get(&iter->g->fte_allocator, GFP_KERNEL);

>  	}

>  

>  search_again_locked:

>
Matthew Wilcox March 16, 2018, 1:30 a.m. UTC | #2
On Thu, Mar 15, 2018 at 11:58:07PM +0000, Saeed Mahameed wrote:
> On Wed, 2018-03-14 at 19:57 -0700, Matthew Wilcox wrote:
> > From: Matthew Wilcox <mawilcox@microsoft.com>
> > 
> > The mlx5 driver calls ida_pre_get() in a loop for no readily apparent
> > reason.  The driver uses ida_simple_get() which will call
> > ida_pre_get()
> > by itself and there's no need to use ida_pre_get() unless using
> > ida_get_new().
> > 
> 
> Hi Matthew,
> 
> Is this is causing any issues ? or just a simple cleanup ?

I'm removing the API.  At the end of this cleanup, there will be no more
preallocation; instead we will rely on the slab allocator not sucking.

> Adding Maor, the author of this change,
> 
> I believe the idea is to speed up insert_fte (which calls
> ida_simple_get) since insert_fte runs under the FTE write semaphore,
> in this case if ida_pre_get was successful before taking the semaphore
> for all the FTE nodes in the loop, this will be a huge win for
> ida_simple_get which will immediately return success without even
> trying to allocate.

I think that's misguided.  The IDA allocator is only going to allocate
memory once in every 1024 allocations.  Also, it does try to allocate,
even if there are preallocated nodes.  So you're just wasting time,
unfortunately.
Saeed Mahameed March 20, 2018, 3:29 a.m. UTC | #3
On Thu, 2018-03-15 at 18:30 -0700, Matthew Wilcox wrote:
> On Thu, Mar 15, 2018 at 11:58:07PM +0000, Saeed Mahameed wrote:

> > On Wed, 2018-03-14 at 19:57 -0700, Matthew Wilcox wrote:

> > > From: Matthew Wilcox <mawilcox@microsoft.com>

> > > 

> > > The mlx5 driver calls ida_pre_get() in a loop for no readily

> > > apparent

> > > reason.  The driver uses ida_simple_get() which will call

> > > ida_pre_get()

> > > by itself and there's no need to use ida_pre_get() unless using

> > > ida_get_new().

> > > 

> > 

> > Hi Matthew,

> > 

> > Is this is causing any issues ? or just a simple cleanup ?

> 

> I'm removing the API.  At the end of this cleanup, there will be no

> more

> preallocation; instead we will rely on the slab allocator not

> sucking.

> 


Ok, Seems reasonable, I am ok with this.

> > Adding Maor, the author of this change,

> > 

> > I believe the idea is to speed up insert_fte (which calls

> > ida_simple_get) since insert_fte runs under the FTE write

> > semaphore,

> > in this case if ida_pre_get was successful before taking the

> > semaphore

> > for all the FTE nodes in the loop, this will be a huge win for

> > ida_simple_get which will immediately return success without even

> > trying to allocate.

> 

> I think that's misguided.  The IDA allocator is only going to

> allocate

> memory once in every 1024 allocations.  Also, it does try to

> allocate,

> even if there are preallocated nodes.  So you're just wasting time,

> unfortunately.

> 


Well just by looking at the code you can tell for sure that 
two consecutive calls to ida_pre_get will result in one allocation
only.
due to "if (!this_cpu_read(ida_bitmap))"

but i didn't dig into details and didn't go through the whole
ida_get_new_above, so i will count on your judgment here.

Still i would like to wait for Maor's input here, the author..
I Will ping him today.

Thanks,
Saeed.
Maor Gottlieb March 20, 2018, 12:41 p.m. UTC | #4
On 3/20/2018 5:29 AM, Saeed Mahameed wrote:
> On Thu, 2018-03-15 at 18:30 -0700, Matthew Wilcox wrote:
>> On Thu, Mar 15, 2018 at 11:58:07PM +0000, Saeed Mahameed wrote:
>>> On Wed, 2018-03-14 at 19:57 -0700, Matthew Wilcox wrote:
>>>> From: Matthew Wilcox <mawilcox@microsoft.com>
>>>>
>>>> The mlx5 driver calls ida_pre_get() in a loop for no readily
>>>> apparent
>>>> reason.  The driver uses ida_simple_get() which will call
>>>> ida_pre_get()
>>>> by itself and there's no need to use ida_pre_get() unless using
>>>> ida_get_new().
>>>>
>>> Hi Matthew,
>>>
>>> Is this is causing any issues ? or just a simple cleanup ?
>> I'm removing the API.  At the end of this cleanup, there will be no
>> more
>> preallocation; instead we will rely on the slab allocator not
>> sucking.
>>
> Ok, Seems reasonable, I am ok with this.
>
>>> Adding Maor, the author of this change,
>>>
>>> I believe the idea is to speed up insert_fte (which calls
>>> ida_simple_get) since insert_fte runs under the FTE write
>>> semaphore,
>>> in this case if ida_pre_get was successful before taking the
>>> semaphore
>>> for all the FTE nodes in the loop, this will be a huge win for
>>> ida_simple_get which will immediately return success without even
>>> trying to allocate.
>> I think that's misguided.  The IDA allocator is only going to
>> allocate
>> memory once in every 1024 allocations.  Also, it does try to
>> allocate,
>> even if there are preallocated nodes.  So you're just wasting time,
>> unfortunately.
>>
> Well just by looking at the code you can tell for sure that
> two consecutive calls to ida_pre_get will result in one allocation
> only.
> due to "if (!this_cpu_read(ida_bitmap))"
>
> but i didn't dig into details and didn't go through the whole
> ida_get_new_above, so i will count on your judgment here.
>
> Still i would like to wait for Maor's input here, the author..
> I Will ping him today.
>
> Thanks,
> Saeed.

Saeed, Matan and I okay with this fix as well, it looks like it 
shouldn't impact on the insertion rate.
David Miller March 20, 2018, 2:46 p.m. UTC | #5
From: Maor Gottlieb <maorg@mellanox.com>
Date: Tue, 20 Mar 2018 14:41:49 +0200

> Saeed, Matan and I okay with this fix as well, it looks like it
> shouldn't impact on the insertion rate.

I've applied this to net-next, thanks everyone.
Matthew Wilcox March 20, 2018, 3:20 p.m. UTC | #6
On Tue, Mar 20, 2018 at 10:46:20AM -0400, David Miller wrote:
> From: Maor Gottlieb <maorg@mellanox.com>
> Date: Tue, 20 Mar 2018 14:41:49 +0200
> 
> > Saeed, Matan and I okay with this fix as well, it looks like it
> > shouldn't impact on the insertion rate.
> 
> I've applied this to net-next, thanks everyone.

Thanks, Dave.

I realised why this made sense when it was originally written.  Before
December 2016 (commit 7ad3d4d85c7a), ida_pre_get used to allocate one
bitmap per ida.  I moved it to a percpu variable, and at that point this
stopped making sense.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 10e16381f20a..3ba07c7096ef 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -1647,7 +1647,6 @@  try_add_to_existing_fg(struct mlx5_flow_table *ft,
 
 	list_for_each_entry(iter, match_head, list) {
 		nested_down_read_ref_node(&iter->g->node, FS_LOCK_PARENT);
-		ida_pre_get(&iter->g->fte_allocator, GFP_KERNEL);
 	}
 
 search_again_locked: