diff mbox

[net-next] sfc: allocate rx pages on the same node as the interrupt

Message ID 1462892223.23934.78.camel@edumazet-glaptop3.roam.corp.google.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet May 10, 2016, 2:57 p.m. UTC
On Tue, 2016-05-10 at 07:55 -0700, Eric Dumazet wrote:
> On Tue, 2016-05-10 at 15:21 +0100, Edward Cree wrote:
> > From: Daniel Pieczko <dpieczko@solarflare.com>
> > 
> > When the interrupt servicing a channel is on a NUMA node that is
> > not local to the device, performance is improved by allocating
> > rx pages on the node local to the interrupt (remote to the device)
> > 
> > The performance-optimal case, where interrupts and applications
> > are pinned to CPUs on the same node as the device, is not altered
> > by this change.
> > 
> > This change gave a 1% improvement in transaction rate using Nginx
> > with all interrupts and Nginx threads on the node remote to the
> > device. It also gave a small reduction in round-trip latency,
> > again with the interrupt and application on a different node to
> > the device.
> 
> Yes, I advocated for such changes in the past for mlx4 NIC.
> 
> But your patch makes no sense to me.
> 
> alloc_pages() by default would run on the cpu servicing the IRQ, so
> would automatically provide pages on the local node.
> 
> If you care only of the initial pages allocated with GFP_KERNEL at
> device start, really that is a small detail as they should be consumed
> and replaced quite fast.
> 
> If you worry that "wrong" pages would be reused over and over,
> you could make sure that efx_reuse_page() wont reuse a page on the wrong
> node.
> 
> page_to_nid(page) != numa_mem_id() 
> 
> Note that this could happen even if IRQ are not changed since
> alloc_pages() could in stress situations give you a page from a remote
> node.

Something like this (untested) ?

Comments

Eric Dumazet May 10, 2016, 2:59 p.m. UTC | #1
On Tue, 2016-05-10 at 07:57 -0700, Eric Dumazet wrote:
> On Tue, 2016-05-10 at 07:55 -0700, Eric Dumazet wrote:
> > On Tue, 2016-05-10 at 15:21 +0100, Edward Cree wrote:
> > > From: Daniel Pieczko <dpieczko@solarflare.com>
> > > 
> > > When the interrupt servicing a channel is on a NUMA node that is
> > > not local to the device, performance is improved by allocating
> > > rx pages on the node local to the interrupt (remote to the device)
> > > 
> > > The performance-optimal case, where interrupts and applications
> > > are pinned to CPUs on the same node as the device, is not altered
> > > by this change.
> > > 
> > > This change gave a 1% improvement in transaction rate using Nginx
> > > with all interrupts and Nginx threads on the node remote to the
> > > device. It also gave a small reduction in round-trip latency,
> > > again with the interrupt and application on a different node to
> > > the device.
> > 
> > Yes, I advocated for such changes in the past for mlx4 NIC.
> > 
> > But your patch makes no sense to me.
> > 
> > alloc_pages() by default would run on the cpu servicing the IRQ, so
> > would automatically provide pages on the local node.
> > 
> > If you care only of the initial pages allocated with GFP_KERNEL at
> > device start, really that is a small detail as they should be consumed
> > and replaced quite fast.
> > 
> > If you worry that "wrong" pages would be reused over and over,
> > you could make sure that efx_reuse_page() wont reuse a page on the wrong
> > node.
> > 
> > page_to_nid(page) != numa_mem_id() 
> > 
> > Note that this could happen even if IRQ are not changed since
> > alloc_pages() could in stress situations give you a page from a remote
> > node.
> 
> Something like this (untested) ?
> 
> diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c
> index 8956995b2fe713b377f205a55fada36c8a04253a..90665a8992e29cceb29dd465b52f934dff3b3d4e 100644
> --- a/drivers/net/ethernet/sfc/rx.c
> +++ b/drivers/net/ethernet/sfc/rx.c
> @@ -124,7 +124,7 @@ static struct page *efx_reuse_page(struct efx_rx_queue *rx_queue)
>  		++rx_queue->page_remove;
>  
>  	/* If page_count is 1 then we hold the only reference to this page. */
> -	if (page_count(page) == 1) {
> +	if (page_count(page) == 1 && page_to_nid(page) == numa_mem_id()) {
>  		++rx_queue->page_recycle_count;
>  		return page;
>  	} else {
> 

You could also factorize code from Intel drivers in some common helper

static inline bool igb_page_is_reserved(struct page *page)
{
        return (page_to_nid(page) != numa_mem_id()) || page_is_pfmemalloc(page);
}
Daniel Pieczko (dpieczko) May 12, 2016, 10:35 a.m. UTC | #2
On 10/05/16 15:57, Eric Dumazet wrote:
> On Tue, 2016-05-10 at 07:55 -0700, Eric Dumazet wrote:
>> If you care only of the initial pages allocated with GFP_KERNEL at
>> device start, really that is a small detail as they should be consumed
>> and replaced quite fast.

This patch was addressing just that situation.

>> If you worry that "wrong" pages would be reused over and over,
>> you could make sure that efx_reuse_page() wont reuse a page on the wrong
>> node.
>>
>> page_to_nid(page) != numa_mem_id()

Yes, that would solve the general problem and your patch looks correct
for the code as it is right now.

However, we have some pending changes to our rx recycle ring that
haven't been submitted upstream yet which will alter the way that pages
are reused, so the solution may no longer be so simple.  I think we
avoided making the simple change of checking page_to_nid() right now
because of these future changes to rx page recycling.

So I think that the original patch in this submission is valuable in
addressing the initial startup case - that won't be changed by the other
rx page recycling changes.

I have no preference on whether we take Eric's follow-up patch to check
page_to_nid() right now or wait until the other rx changes are submitted
and then fix it up as required.


Daniel
diff mbox

Patch

diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c
index 8956995b2fe713b377f205a55fada36c8a04253a..90665a8992e29cceb29dd465b52f934dff3b3d4e 100644
--- a/drivers/net/ethernet/sfc/rx.c
+++ b/drivers/net/ethernet/sfc/rx.c
@@ -124,7 +124,7 @@  static struct page *efx_reuse_page(struct efx_rx_queue *rx_queue)
 		++rx_queue->page_remove;
 
 	/* If page_count is 1 then we hold the only reference to this page. */
-	if (page_count(page) == 1) {
+	if (page_count(page) == 1 && page_to_nid(page) == numa_mem_id()) {
 		++rx_queue->page_recycle_count;
 		return page;
 	} else {