Message ID | 20180315025724.GB9973@bombadil.infradead.org |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
Series | mlx5: Remove call to ida_pre_get | expand |
On Wed, 2018-03-14 at 19:57 -0700, Matthew Wilcox wrote: > From: Matthew Wilcox <mawilcox@microsoft.com> > > The mlx5 driver calls ida_pre_get() in a loop for no readily apparent > reason. The driver uses ida_simple_get() which will call > ida_pre_get() > by itself and there's no need to use ida_pre_get() unless using > ida_get_new(). > Hi Matthew, Is this is causing any issues ? or just a simple cleanup ? Adding Maor, the author of this change, I believe the idea is to speed up insert_fte (which calls ida_simple_get) since insert_fte runs under the FTE write semaphore, in this case if ida_pre_get was successful before taking the semaphore for all the FTE nodes in the loop, this will be a huge win for ida_simple_get which will immediately return success without even trying to allocate. so it is a best effort to speed up critical path. Maor, if this is really the case and this is not causing any issues, then we need to consider adding a comment. > Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c > b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c > index 10e16381f20a..3ba07c7096ef 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c > @@ -1647,7 +1647,6 @@ try_add_to_existing_fg(struct mlx5_flow_table > *ft, > > list_for_each_entry(iter, match_head, list) { > nested_down_read_ref_node(&iter->g->node, > FS_LOCK_PARENT); > - ida_pre_get(&iter->g->fte_allocator, GFP_KERNEL); > } > > search_again_locked: >
On Thu, Mar 15, 2018 at 11:58:07PM +0000, Saeed Mahameed wrote: > On Wed, 2018-03-14 at 19:57 -0700, Matthew Wilcox wrote: > > From: Matthew Wilcox <mawilcox@microsoft.com> > > > > The mlx5 driver calls ida_pre_get() in a loop for no readily apparent > > reason. The driver uses ida_simple_get() which will call > > ida_pre_get() > > by itself and there's no need to use ida_pre_get() unless using > > ida_get_new(). > > > > Hi Matthew, > > Is this is causing any issues ? or just a simple cleanup ? I'm removing the API. At the end of this cleanup, there will be no more preallocation; instead we will rely on the slab allocator not sucking. > Adding Maor, the author of this change, > > I believe the idea is to speed up insert_fte (which calls > ida_simple_get) since insert_fte runs under the FTE write semaphore, > in this case if ida_pre_get was successful before taking the semaphore > for all the FTE nodes in the loop, this will be a huge win for > ida_simple_get which will immediately return success without even > trying to allocate. I think that's misguided. The IDA allocator is only going to allocate memory once in every 1024 allocations. Also, it does try to allocate, even if there are preallocated nodes. So you're just wasting time, unfortunately.
On Thu, 2018-03-15 at 18:30 -0700, Matthew Wilcox wrote: > On Thu, Mar 15, 2018 at 11:58:07PM +0000, Saeed Mahameed wrote: > > On Wed, 2018-03-14 at 19:57 -0700, Matthew Wilcox wrote: > > > From: Matthew Wilcox <mawilcox@microsoft.com> > > > > > > The mlx5 driver calls ida_pre_get() in a loop for no readily > > > apparent > > > reason. The driver uses ida_simple_get() which will call > > > ida_pre_get() > > > by itself and there's no need to use ida_pre_get() unless using > > > ida_get_new(). > > > > > > > Hi Matthew, > > > > Is this is causing any issues ? or just a simple cleanup ? > > I'm removing the API. At the end of this cleanup, there will be no > more > preallocation; instead we will rely on the slab allocator not > sucking. > Ok, Seems reasonable, I am ok with this. > > Adding Maor, the author of this change, > > > > I believe the idea is to speed up insert_fte (which calls > > ida_simple_get) since insert_fte runs under the FTE write > > semaphore, > > in this case if ida_pre_get was successful before taking the > > semaphore > > for all the FTE nodes in the loop, this will be a huge win for > > ida_simple_get which will immediately return success without even > > trying to allocate. > > I think that's misguided. The IDA allocator is only going to > allocate > memory once in every 1024 allocations. Also, it does try to > allocate, > even if there are preallocated nodes. So you're just wasting time, > unfortunately. > Well just by looking at the code you can tell for sure that two consecutive calls to ida_pre_get will result in one allocation only. due to "if (!this_cpu_read(ida_bitmap))" but i didn't dig into details and didn't go through the whole ida_get_new_above, so i will count on your judgment here. Still i would like to wait for Maor's input here, the author.. I Will ping him today. Thanks, Saeed.
On 3/20/2018 5:29 AM, Saeed Mahameed wrote: > On Thu, 2018-03-15 at 18:30 -0700, Matthew Wilcox wrote: >> On Thu, Mar 15, 2018 at 11:58:07PM +0000, Saeed Mahameed wrote: >>> On Wed, 2018-03-14 at 19:57 -0700, Matthew Wilcox wrote: >>>> From: Matthew Wilcox <mawilcox@microsoft.com> >>>> >>>> The mlx5 driver calls ida_pre_get() in a loop for no readily >>>> apparent >>>> reason. The driver uses ida_simple_get() which will call >>>> ida_pre_get() >>>> by itself and there's no need to use ida_pre_get() unless using >>>> ida_get_new(). >>>> >>> Hi Matthew, >>> >>> Is this is causing any issues ? or just a simple cleanup ? >> I'm removing the API. At the end of this cleanup, there will be no >> more >> preallocation; instead we will rely on the slab allocator not >> sucking. >> > Ok, Seems reasonable, I am ok with this. > >>> Adding Maor, the author of this change, >>> >>> I believe the idea is to speed up insert_fte (which calls >>> ida_simple_get) since insert_fte runs under the FTE write >>> semaphore, >>> in this case if ida_pre_get was successful before taking the >>> semaphore >>> for all the FTE nodes in the loop, this will be a huge win for >>> ida_simple_get which will immediately return success without even >>> trying to allocate. >> I think that's misguided. The IDA allocator is only going to >> allocate >> memory once in every 1024 allocations. Also, it does try to >> allocate, >> even if there are preallocated nodes. So you're just wasting time, >> unfortunately. >> > Well just by looking at the code you can tell for sure that > two consecutive calls to ida_pre_get will result in one allocation > only. > due to "if (!this_cpu_read(ida_bitmap))" > > but i didn't dig into details and didn't go through the whole > ida_get_new_above, so i will count on your judgment here. > > Still i would like to wait for Maor's input here, the author.. > I Will ping him today. > > Thanks, > Saeed. Saeed, Matan and I okay with this fix as well, it looks like it shouldn't impact on the insertion rate.
From: Maor Gottlieb <maorg@mellanox.com> Date: Tue, 20 Mar 2018 14:41:49 +0200 > Saeed, Matan and I okay with this fix as well, it looks like it > shouldn't impact on the insertion rate. I've applied this to net-next, thanks everyone.
On Tue, Mar 20, 2018 at 10:46:20AM -0400, David Miller wrote: > From: Maor Gottlieb <maorg@mellanox.com> > Date: Tue, 20 Mar 2018 14:41:49 +0200 > > > Saeed, Matan and I okay with this fix as well, it looks like it > > shouldn't impact on the insertion rate. > > I've applied this to net-next, thanks everyone. Thanks, Dave. I realised why this made sense when it was originally written. Before December 2016 (commit 7ad3d4d85c7a), ida_pre_get used to allocate one bitmap per ida. I moved it to a percpu variable, and at that point this stopped making sense.
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c index 10e16381f20a..3ba07c7096ef 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c @@ -1647,7 +1647,6 @@ try_add_to_existing_fg(struct mlx5_flow_table *ft, list_for_each_entry(iter, match_head, list) { nested_down_read_ref_node(&iter->g->node, FS_LOCK_PARENT); - ida_pre_get(&iter->g->fte_allocator, GFP_KERNEL); } search_again_locked: