From patchwork Thu Feb 9 13:58:32 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 726124 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3vK0HS6flBz9s6s for ; Fri, 10 Feb 2017 01:06:28 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.b="OcwVSmiu"; dkim-atps=neutral Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752562AbdBIOG0 (ORCPT ); Thu, 9 Feb 2017 09:06:26 -0500 Received: from mail-pf0-f175.google.com ([209.85.192.175]:35405 "EHLO mail-pf0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752236AbdBIOGY (ORCPT ); Thu, 9 Feb 2017 09:06:24 -0500 Received: by mail-pf0-f175.google.com with SMTP id f144so989428pfa.2 for ; Thu, 09 Feb 2017 06:05:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=POzs87DWDzZP9PQw9QPKZyMUoS/1SR3yzfV2FOV3ZYs=; b=OcwVSmiupv9YPtmkGkYhMv/XBzWOogNcV8F6IN3bkRD+DIqLQRcVUl/7Es4ox/H714 64XD8RCHJSypBgISsbQlJG5ck2y9l8X/nrJCcMnJt8HgE7/B9xgWsDUhOH7x0fzwHtMJ EASTIpWKDhKVcmhwAnmgx6ffAQRlmpK4vR0o2Fa9wR+OMg2W3quUmB10ojvUcC/I76Fa 0+fYNxpr1daH6YXr6y2jAgWlOY2ZsTwjjivxPYZersZC4br42xYGyYMVRvVcIHKX+nIK BkQMIJ70ld4EWkQI2W2XP2FoY5eI62K32rDyzqrNyx+1Tad79bD98Uyv+GTg3iApk3S2 xAdw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=POzs87DWDzZP9PQw9QPKZyMUoS/1SR3yzfV2FOV3ZYs=; b=PtoNk91O7/Lt9olvlxzLgsK5FzCTviqReKwo07F2RL8hUgLU3d01kyMXioV5xDT643 YJ265+Umzz4/8p2sqb0S+6xtp42vwU8+sIu+VC2rD+VTw196FlTaPAu9kofvhrKH2dgS NaHmStE5Um5GyVWLGPYy+wXsZAMhJBn/GpCAlh2MRbfzyMwSHMhzyeYxBld8chrKJIJK zPrnH+Mq6QZDImHiOqHQpOo+eD1O5rgjqXdhkdTK59Dj1dNx9bSrCYqBOo5oAY+1PP8k rP9HM6dGWCGcAyvUU+QhRzTVZ00hsQLEYbXaXB1LtC8WZd2HqPuSlqjyiU5U3oSl5LdU hq5w== X-Gm-Message-State: AMke39mnsvUoX93j8DZwhuAnQBF6rRiK3FBK2U5+LjVmRZRxGvz43PIOz5MNNHKH0vWw7ANR X-Received: by 10.84.253.23 with SMTP id z23mr4476335pll.0.1486648739896; Thu, 09 Feb 2017 05:58:59 -0800 (PST) Received: from localhost ([2620:0:1000:3012:e82b:1a25:9765:74cf]) by smtp.gmail.com with ESMTPSA id h17sm29368753pfh.62.2017.02.09.05.58.59 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Thu, 09 Feb 2017 05:58:59 -0800 (PST) From: Eric Dumazet To: "David S . Miller" Cc: netdev , Tariq Toukan , Martin KaFai Lau , Willem de Bruijn , Jesper Dangaard Brouer , Brenden Blanco , Alexei Starovoitov , Eric Dumazet , Eric Dumazet Subject: [PATCH v2 net-next 08/14] mlx4: use order-0 pages for RX Date: Thu, 9 Feb 2017 05:58:32 -0800 Message-Id: <20170209135838.16487-9-edumazet@google.com> X-Mailer: git-send-email 2.11.0.483.g087da7b7c-goog In-Reply-To: <20170209135838.16487-1-edumazet@google.com> References: <20170209135838.16487-1-edumazet@google.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Use of order-3 pages is problematic in some cases. This patch might add three kinds of regression : 1) a CPU performance regression, but we will add later page recycling and performance should be back. 2) TCP receiver could grow its receive window slightly slower, because skb->len/skb->truesize ratio will decrease. This is mostly ok, we prefer being conservative to not risk OOM, and eventually tune TCP better in the future. This is consistent with other drivers using 2048 per ethernet frame. 3) Because we allocate one page per RX slot, we consume more memory for the ring buffers. XDP already had this constraint anyway. Signed-off-by: Eric Dumazet --- drivers/net/ethernet/mellanox/mlx4/en_rx.c | 72 +++++++++++++--------------- drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 4 -- 2 files changed, 33 insertions(+), 43 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c index 0c61c1200f2aec4c74d7403d9ec3ed06821c..069ea09185fb0669d5c1f9b1b88f517b2d69 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c @@ -53,38 +53,26 @@ static int mlx4_alloc_pages(struct mlx4_en_priv *priv, struct mlx4_en_rx_alloc *page_alloc, const struct mlx4_en_frag_info *frag_info, - gfp_t _gfp) + gfp_t gfp) { - int order; struct page *page; dma_addr_t dma; - for (order = priv->rx_page_order; ;) { - gfp_t gfp = _gfp; - - if (order) - gfp |= __GFP_COMP | __GFP_NOWARN | __GFP_NOMEMALLOC; - page = alloc_pages(gfp, order); - if (likely(page)) - break; - if (--order < 0 || - ((PAGE_SIZE << order) < frag_info->frag_size)) - return -ENOMEM; - } - dma = dma_map_page(priv->ddev, page, 0, PAGE_SIZE << order, - priv->dma_dir); + page = alloc_page(gfp); + if (unlikely(!page)) + return -ENOMEM; + dma = dma_map_page(priv->ddev, page, 0, PAGE_SIZE, priv->dma_dir); if (unlikely(dma_mapping_error(priv->ddev, dma))) { put_page(page); return -ENOMEM; } - page_alloc->page_size = PAGE_SIZE << order; page_alloc->page = page; page_alloc->dma = dma; page_alloc->page_offset = 0; /* Not doing get_page() for each frag is a big win * on asymetric workloads. Note we can not use atomic_set(). */ - page_ref_add(page, page_alloc->page_size / frag_info->frag_stride - 1); + page_ref_add(page, PAGE_SIZE / frag_info->frag_stride - 1); return 0; } @@ -105,7 +93,7 @@ static int mlx4_en_alloc_frags(struct mlx4_en_priv *priv, page_alloc[i].page_offset += frag_info->frag_stride; if (page_alloc[i].page_offset + frag_info->frag_stride <= - ring_alloc[i].page_size) + PAGE_SIZE) continue; if (unlikely(mlx4_alloc_pages(priv, &page_alloc[i], @@ -127,11 +115,10 @@ static int mlx4_en_alloc_frags(struct mlx4_en_priv *priv, while (i--) { if (page_alloc[i].page != ring_alloc[i].page) { dma_unmap_page(priv->ddev, page_alloc[i].dma, - page_alloc[i].page_size, - priv->dma_dir); + PAGE_SIZE, priv->dma_dir); page = page_alloc[i].page; /* Revert changes done by mlx4_alloc_pages */ - page_ref_sub(page, page_alloc[i].page_size / + page_ref_sub(page, PAGE_SIZE / priv->frag_info[i].frag_stride - 1); put_page(page); } @@ -147,8 +134,8 @@ static void mlx4_en_free_frag(struct mlx4_en_priv *priv, u32 next_frag_end = frags[i].page_offset + 2 * frag_info->frag_stride; - if (next_frag_end > frags[i].page_size) - dma_unmap_page(priv->ddev, frags[i].dma, frags[i].page_size, + if (next_frag_end > PAGE_SIZE) + dma_unmap_page(priv->ddev, frags[i].dma, PAGE_SIZE, priv->dma_dir); if (frags[i].page) @@ -168,9 +155,8 @@ static int mlx4_en_init_allocator(struct mlx4_en_priv *priv, frag_info, GFP_KERNEL | __GFP_COLD)) goto out; - en_dbg(DRV, priv, " frag %d allocator: - size:%d frags:%d\n", - i, ring->page_alloc[i].page_size, - page_ref_count(ring->page_alloc[i].page)); + en_dbg(DRV, priv, " frag %d allocator: - frags:%d\n", + i, page_ref_count(ring->page_alloc[i].page)); } return 0; @@ -180,11 +166,10 @@ static int mlx4_en_init_allocator(struct mlx4_en_priv *priv, page_alloc = &ring->page_alloc[i]; dma_unmap_page(priv->ddev, page_alloc->dma, - page_alloc->page_size, - priv->dma_dir); + PAGE_SIZE, priv->dma_dir); page = page_alloc->page; /* Revert changes done by mlx4_alloc_pages */ - page_ref_sub(page, page_alloc->page_size / + page_ref_sub(page, PAGE_SIZE / priv->frag_info[i].frag_stride - 1); put_page(page); page_alloc->page = NULL; @@ -206,9 +191,9 @@ static void mlx4_en_destroy_allocator(struct mlx4_en_priv *priv, i, page_count(page_alloc->page)); dma_unmap_page(priv->ddev, page_alloc->dma, - page_alloc->page_size, priv->dma_dir); + PAGE_SIZE, priv->dma_dir); while (page_alloc->page_offset + frag_info->frag_stride < - page_alloc->page_size) { + PAGE_SIZE) { put_page(page_alloc->page); page_alloc->page_offset += frag_info->frag_stride; } @@ -1191,7 +1176,6 @@ void mlx4_en_calc_rx_buf(struct net_device *dev) * This only works when num_frags == 1. */ if (priv->tx_ring_num[TX_XDP]) { - priv->rx_page_order = 0; priv->frag_info[0].frag_size = eff_mtu; /* This will gain efficient xdp frame recycling at the * expense of more costly truesize accounting @@ -1201,22 +1185,32 @@ void mlx4_en_calc_rx_buf(struct net_device *dev) priv->rx_headroom = XDP_PACKET_HEADROOM; i = 1; } else { - int buf_size = 0; + int frag_size_max = 2048, buf_size = 0; + + /* should not happen, right ? */ + if (eff_mtu > PAGE_SIZE + (MLX4_EN_MAX_RX_FRAGS - 1) * 2048) + frag_size_max = PAGE_SIZE; while (buf_size < eff_mtu) { - int frag_size = eff_mtu - buf_size; + int frag_stride, frag_size = eff_mtu - buf_size; + int pad, nb; if (i < MLX4_EN_MAX_RX_FRAGS - 1) - frag_size = min(frag_size, 2048); + frag_size = min(frag_size, frag_size_max); priv->frag_info[i].frag_size = frag_size; + frag_stride = ALIGN(frag_size, SMP_CACHE_BYTES); + /* We can only pack 2 1536-bytes frames in on 4K page + * Therefore, each frame would consume more bytes (truesize) + */ + nb = PAGE_SIZE / frag_stride; + pad = (PAGE_SIZE - nb * frag_stride) / nb; + pad &= ~(SMP_CACHE_BYTES - 1); + priv->frag_info[i].frag_stride = frag_stride + pad; - priv->frag_info[i].frag_stride = ALIGN(frag_size, - SMP_CACHE_BYTES); buf_size += frag_size; i++; } - priv->rx_page_order = MLX4_EN_ALLOC_PREFER_ORDER; priv->dma_dir = PCI_DMA_FROMDEVICE; priv->rx_headroom = 0; } diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h index 92c6cf6de9452d5d1b2092a17a8dad5348db..4f7d1503277d2e030854deabeb22b96fa931 100644 --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h @@ -102,8 +102,6 @@ /* Use the maximum between 16384 and a single page */ #define MLX4_EN_ALLOC_SIZE PAGE_ALIGN(16384) -#define MLX4_EN_ALLOC_PREFER_ORDER PAGE_ALLOC_COSTLY_ORDER - #define MLX4_EN_MAX_RX_FRAGS 4 /* Maximum ring sizes */ @@ -255,7 +253,6 @@ struct mlx4_en_rx_alloc { struct page *page; dma_addr_t dma; u32 page_offset; - u32 page_size; }; #define MLX4_EN_CACHE_SIZE (2 * NAPI_POLL_WEIGHT) @@ -579,7 +576,6 @@ struct mlx4_en_priv { u8 num_frags; u8 log_rx_info; u8 dma_dir; - u8 rx_page_order; u16 rx_headroom; struct mlx4_en_tx_ring **tx_ring[MLX4_EN_NUM_TX_TYPES];