From patchwork Mon Dec 16 04:02:04 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Li RongQing <lirongqing@baidu.com>
X-Patchwork-Id: 1210066
X-Patchwork-Delegate: dsahern@gmail.com
Return-Path: <netdev-owner@vger.kernel.org>
X-Original-To: patchwork-incoming-netdev@ozlabs.org
Delivered-To: patchwork-incoming-netdev@ozlabs.org
Authentication-Results: ozlabs.org; spf=none (no SPF record)
	smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67;
	helo=vger.kernel.org;
	envelope-from=netdev-owner@vger.kernel.org;
	receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org;
	dmarc=none (p=none dis=none) header.from=baidu.com
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by ozlabs.org (Postfix) with ESMTP id 47bng20MJfz9sPT
	for <patchwork-incoming-netdev@ozlabs.org>;
	Mon, 16 Dec 2019 15:02:54 +1100 (AEDT)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1726682AbfLPECK (ORCPT
	<rfc822;patchwork-incoming-netdev@ozlabs.org>);
	Sun, 15 Dec 2019 23:02:10 -0500
Received: from mx21.baidu.com ([220.181.3.85]:35752 "EHLO baidu.com"
	rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP
	id S1726437AbfLPECJ (ORCPT <rfc822;netdev@vger.kernel.org>);
	Sun, 15 Dec 2019 23:02:09 -0500
Received: from BC-Mail-Ex14.internal.baidu.com (unknown [172.31.51.54])
	by Forcepoint Email with ESMTPS id E2AAA99D2612B9F25FA0;
	Mon, 16 Dec 2019 12:02:04 +0800 (CST)
Received: from BJHW-Mail-Ex13.internal.baidu.com (10.127.64.36) by
	BC-Mail-Ex14.internal.baidu.com (172.31.51.54) with Microsoft SMTP
	Server
	(version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id
	15.1.1531.3; Mon, 16 Dec 2019 12:02:04 +0800
Received: from BJHW-Mail-Ex13.internal.baidu.com ([100.100.100.36]) by
	BJHW-Mail-Ex13.internal.baidu.com ([100.100.100.36]) with mapi id
	15.01.1713.004; Mon, 16 Dec 2019 12:02:04 +0800
From: "Li,Rongqing" <lirongqing@baidu.com>
To: Yunsheng Lin <linyunsheng@huawei.com>,
	Jesper Dangaard Brouer <brouer@redhat.com>
CC: Saeed Mahameed <saeedm@mellanox.com>, "ilias.apalodimas@linaro.org"
	<ilias.apalodimas@linaro.org>, "jonathan.lemon@gmail.com"
	<jonathan.lemon@gmail.com>, "netdev@vger.kernel.org"
	<netdev@vger.kernel.org>, "mhocko@kernel.org" <mhocko@kernel.org>,
	"peterz@infradead.org" <peterz@infradead.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>, "bhelgaas@google.com"
	<bhelgaas@google.com>, "linux-kernel@vger.kernel.org"
	<linux-kernel@vger.kernel.org>, =?utf-8?b?QmrDtnJuIFTDtnBlbA==?=
	<bjorn.topel@intel.com>
Subject: =?utf-8?b?562U5aSNOiBbUEFUQ0hdW3YyXSBwYWdlX3Bvb2w6IGhhbmRsZSBw?=
	=?utf-8?q?age_recycle_for_NUMA=5FNO=5FNODE_condition?=
Thread-Topic: [PATCH][v2] page_pool: handle page recycle for NUMA_NO_NODE
	condition
Thread-Index: AQHVsZIrr0J3dgd4XE+VTzKxYVWGCKe7fWqAgACmZRA=
Date: Mon, 16 Dec 2019 04:02:04 +0000
Message-ID: <a5dea60221d84886991168781361b591@baidu.com>
References: <1575624767-3343-1-git-send-email-lirongqing@baidu.com>
	<9fecbff3518d311ec7c3aee9ae0315a73682a4af.camel@mellanox.com>
	<20191211194933.15b53c11@carbon>
	<831ed886842c894f7b2ffe83fe34705180a86b3b.camel@mellanox.com>
	<0a252066-fdc3-a81d-7a36-8f49d2babc01@huawei.com>
	<20191212111831.2a9f05d3@carbon>
	<7c555cb1-6beb-240d-08f8-7044b9087fe4@huawei.com>
	<1d4f10f4c0f1433bae658df8972a904f@baidu.com>
	<079a0315-efea-9221-8538-47decf263684@huawei.com>
	<20191213094845.56fb42a4@carbon>
	<15be326d-1811-329c-424c-6dd22b0604a8@huawei.com>
In-Reply-To: <15be326d-1811-329c-424c-6dd22b0604a8@huawei.com>
Accept-Language: zh-CN, en-US
Content-Language: zh-CN
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [172.22.198.6]
x-baidu-bdmsfe-datecheck: 1_BC-Mail-Ex14_2019-12-16 12:02:04:855
x-baidu-bdmsfe-viruscheck: BC-Mail-Ex14_GRAY_Inside_WithoutAtta_2019-12-16
	12:02:04:824
MIME-Version: 1.0
Sender: netdev-owner@vger.kernel.org
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org

> -----邮件原件-----
> 发件人: Yunsheng Lin [mailto:linyunsheng@huawei.com]
> 发送时间: 2019年12月16日 9:51
> 收件人: Jesper Dangaard Brouer <brouer@redhat.com>
> 抄送: Li,Rongqing <lirongqing@baidu.com>; Saeed Mahameed
> <saeedm@mellanox.com>; ilias.apalodimas@linaro.org;
> jonathan.lemon@gmail.com; netdev@vger.kernel.org; mhocko@kernel.org;
> peterz@infradead.org; Greg Kroah-Hartman <gregkh@linuxfoundation.org>;
> bhelgaas@google.com; linux-kernel@vger.kernel.org; Björn Töpel
> <bjorn.topel@intel.com>
> 主题: Re: [PATCH][v2] page_pool: handle page recycle for NUMA_NO_NODE
> condition
> 
> On 2019/12/13 16:48, Jesper Dangaard Brouer wrote:> You are basically saying
> that the NUMA check should be moved to
> > allocation time, as it is running the RX-CPU (NAPI).  And eventually
> > after some time the pages will come from correct NUMA node.
> >
> > I think we can do that, and only affect the semi-fast-path.
> > We just need to handle that pages in the ptr_ring that are recycled
> > can be from the wrong NUMA node.  In __page_pool_get_cached() when
> > consuming pages from the ptr_ring (__ptr_ring_consume_batched), then
> > we can evict pages from wrong NUMA node.
> 
> Yes, that's workable.
> 
> >
> > For the pool->alloc.cache we either accept, that it will eventually
> > after some time be emptied (it is only in a 100% XDP_DROP workload that
> > it will continue to reuse same pages).   Or we simply clear the
> > pool->alloc.cache when calling page_pool_update_nid().
> 
> Simply clearing the pool->alloc.cache when calling page_pool_update_nid()
> seems better.
> 

How about the below codes, the driver can configure p.nid to any, which will be adjusted in NAPI polling, irq migration will not be problem, but it will add a check into hot path.

Thanks

-Li
> >
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index a6aefe989043..4374a6239d17 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -108,6 +108,10 @@ static struct page *__page_pool_get_cached(struct page_pool *pool)
                if (likely(pool->alloc.count)) {
                        /* Fast-path */
                        page = pool->alloc.cache[--pool->alloc.count];
+
+                       if (unlikely(READ_ONCE(pool->p.nid) != numa_mem_id()))
+                               WRITE_ONCE(pool->p.nid, numa_mem_id());
+
                        return page;
                }
                refill = true;
@@ -155,6 +159,10 @@ static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool,
        if (pool->p.order)
                gfp |= __GFP_COMP;
 
+
+       if (unlikely(READ_ONCE(pool->p.nid) != numa_mem_id()))
+               WRITE_ONCE(pool->p.nid, numa_mem_id());
+
        /* FUTURE development:
         *
         * Current slow-path essentially falls back to single page