From patchwork Thu Jul 2 12:19:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Magnus Karlsson X-Patchwork-Id: 1321398 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: incoming-bpf@patchwork.ozlabs.org Delivered-To: patchwork-incoming-bpf@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=bpf-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49yJ3r6trzz9sR4 for ; Thu, 2 Jul 2020 22:55:36 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728995AbgGBMzg (ORCPT ); Thu, 2 Jul 2020 08:55:36 -0400 Received: from mga12.intel.com ([192.55.52.136]:6892 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728661AbgGBMzf (ORCPT ); Thu, 2 Jul 2020 08:55:35 -0400 IronPort-SDR: CeE5k8jBlYw96KqVgpnW/HufUJYs1FjE32tfcg+5FWy0EJ8pwzJEVtpXlEZJXd1CIkCXnyx4Eh fa6x0wtkckGA== X-IronPort-AV: E=McAfee;i="6000,8403,9669"; a="126486066" X-IronPort-AV: E=Sophos;i="5.75,304,1589266800"; d="scan'208";a="126486066" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Jul 2020 05:19:27 -0700 IronPort-SDR: PZNCyLr3kDELFN8JzlFOhHD/evDbdGPtc+VDLQfmhZM+JNg8WAciNkNEdeoNx58NH4PzLaeLQo UdaztPtPJMXg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,304,1589266800"; d="scan'208";a="425933261" Received: from mkarlsso-mobl.ger.corp.intel.com (HELO localhost.localdomain) ([10.252.39.242]) by orsmga004.jf.intel.com with ESMTP; 02 Jul 2020 05:19:22 -0700 From: Magnus Karlsson To: magnus.karlsson@intel.com, bjorn.topel@intel.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org, jonathan.lemon@gmail.com, maximmi@mellanox.com Cc: bpf@vger.kernel.org, jeffrey.t.kirsher@intel.com, maciej.fijalkowski@intel.com, maciejromanfijalkowski@gmail.com, cristian.dumitrescu@intel.com Subject: [PATCH bpf-next 01/14] xsk: i40e: ice: ixgbe: mlx5: pass buffer pool to driver instead of umem Date: Thu, 2 Jul 2020 14:19:00 +0200 Message-Id: <1593692353-15102-2-git-send-email-magnus.karlsson@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1593692353-15102-1-git-send-email-magnus.karlsson@intel.com> References: <1593692353-15102-1-git-send-email-magnus.karlsson@intel.com> Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Replace the explicit umem reference passed to the driver in AF_XDP zero-copy mode with the buffer pool instead. This in preparation for extending the functionality of the zero-copy mode so that umems can be shared between queues on the same netdev and also between netdevs. In this commit, only an umem reference has been added to the buffer pool struct. But later commits will add other entities to it. These are going to be entities that are different between different queue ids and netdevs even though the umem is shared between them. Signed-off-by: Magnus Karlsson --- drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 2 +- drivers/net/ethernet/intel/i40e/i40e_main.c | 29 +++-- drivers/net/ethernet/intel/i40e/i40e_txrx.c | 10 +- drivers/net/ethernet/intel/i40e/i40e_txrx.h | 2 +- drivers/net/ethernet/intel/i40e/i40e_xsk.c | 81 ++++++------ drivers/net/ethernet/intel/i40e/i40e_xsk.h | 4 +- drivers/net/ethernet/intel/ice/ice.h | 18 +-- drivers/net/ethernet/intel/ice/ice_base.c | 16 +-- drivers/net/ethernet/intel/ice/ice_lib.c | 2 +- drivers/net/ethernet/intel/ice/ice_main.c | 10 +- drivers/net/ethernet/intel/ice/ice_txrx.c | 8 +- drivers/net/ethernet/intel/ice/ice_txrx.h | 2 +- drivers/net/ethernet/intel/ice/ice_xsk.c | 142 ++++++++++----------- drivers/net/ethernet/intel/ice/ice_xsk.h | 7 +- drivers/net/ethernet/intel/ixgbe/ixgbe.h | 2 +- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 34 ++--- .../net/ethernet/intel/ixgbe/ixgbe_txrx_common.h | 7 +- drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 61 ++++----- drivers/net/ethernet/mellanox/mlx5/core/en.h | 19 +-- drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c | 5 +- .../net/ethernet/mellanox/mlx5/core/en/xsk/rx.h | 10 +- .../net/ethernet/mellanox/mlx5/core/en/xsk/setup.c | 12 +- .../net/ethernet/mellanox/mlx5/core/en/xsk/setup.h | 2 +- .../net/ethernet/mellanox/mlx5/core/en/xsk/tx.c | 12 +- .../net/ethernet/mellanox/mlx5/core/en/xsk/tx.h | 6 +- .../net/ethernet/mellanox/mlx5/core/en/xsk/umem.c | 108 ++++++++-------- .../net/ethernet/mellanox/mlx5/core/en/xsk/umem.h | 14 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 46 +++---- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 16 +-- include/linux/netdevice.h | 10 +- include/net/xdp_sock_drv.h | 7 +- include/net/xsk_buff_pool.h | 4 +- net/ethtool/channels.c | 2 +- net/ethtool/ioctl.c | 2 +- net/xdp/xdp_umem.c | 45 +++---- net/xdp/xsk_buff_pool.c | 5 +- 36 files changed, 389 insertions(+), 373 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c index aa8026b..422b54f 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c +++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c @@ -1967,7 +1967,7 @@ static int i40e_set_ringparam(struct net_device *netdev, (new_rx_count == vsi->rx_rings[0]->count)) return 0; - /* If there is a AF_XDP UMEM attached to any of Rx rings, + /* If there is a AF_XDP page pool attached to any of Rx rings, * disallow changing the number of descriptors -- regardless * if the netdev is running or not. */ diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 5d807c8..3df725e 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -3103,12 +3103,12 @@ static void i40e_config_xps_tx_ring(struct i40e_ring *ring) } /** - * i40e_xsk_umem - Retrieve the AF_XDP ZC if XDP and ZC is enabled + * i40e_xsk_pool - Retrieve the AF_XDP buffer pool if XDP and ZC is enabled * @ring: The Tx or Rx ring * - * Returns the UMEM or NULL. + * Returns the AF_XDP buffer pool or NULL. **/ -static struct xdp_umem *i40e_xsk_umem(struct i40e_ring *ring) +static struct xsk_buff_pool *i40e_xsk_pool(struct i40e_ring *ring) { bool xdp_on = i40e_enabled_xdp_vsi(ring->vsi); int qid = ring->queue_index; @@ -3119,7 +3119,7 @@ static struct xdp_umem *i40e_xsk_umem(struct i40e_ring *ring) if (!xdp_on || !test_bit(qid, ring->vsi->af_xdp_zc_qps)) return NULL; - return xdp_get_umem_from_qid(ring->vsi->netdev, qid); + return xdp_get_xsk_pool_from_qid(ring->vsi->netdev, qid); } /** @@ -3138,7 +3138,7 @@ static int i40e_configure_tx_ring(struct i40e_ring *ring) u32 qtx_ctl = 0; if (ring_is_xdp(ring)) - ring->xsk_umem = i40e_xsk_umem(ring); + ring->xsk_pool = i40e_xsk_pool(ring); /* some ATR related tx ring init */ if (vsi->back->flags & I40E_FLAG_FD_ATR_ENABLED) { @@ -3261,12 +3261,13 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring) xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq); kfree(ring->rx_bi); - ring->xsk_umem = i40e_xsk_umem(ring); - if (ring->xsk_umem) { + ring->xsk_pool = i40e_xsk_pool(ring); + if (ring->xsk_pool) { ret = i40e_alloc_rx_bi_zc(ring); if (ret) return ret; - ring->rx_buf_len = xsk_umem_get_rx_frame_size(ring->xsk_umem); + ring->rx_buf_len = + xsk_umem_get_rx_frame_size(ring->xsk_pool->umem); /* For AF_XDP ZC, we disallow packets to span on * multiple buffers, thus letting us skip that * handling in the fast-path. @@ -3349,8 +3350,8 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring) ring->tail = hw->hw_addr + I40E_QRX_TAIL(pf_q); writel(0, ring->tail); - if (ring->xsk_umem) { - xsk_buff_set_rxq_info(ring->xsk_umem, &ring->xdp_rxq); + if (ring->xsk_pool) { + xsk_buff_set_rxq_info(ring->xsk_pool->umem, &ring->xdp_rxq); ok = i40e_alloc_rx_buffers_zc(ring, I40E_DESC_UNUSED(ring)); } else { ok = !i40e_alloc_rx_buffers(ring, I40E_DESC_UNUSED(ring)); @@ -3361,7 +3362,7 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring) */ dev_info(&vsi->back->pdev->dev, "Failed to allocate some buffers on %sRx ring %d (pf_q %d)\n", - ring->xsk_umem ? "UMEM enabled " : "", + ring->xsk_pool ? "AF_XDP ZC enabled " : "", ring->queue_index, pf_q); } @@ -12553,7 +12554,7 @@ static int i40e_xdp_setup(struct i40e_vsi *vsi, */ if (need_reset && prog) for (i = 0; i < vsi->num_queue_pairs; i++) - if (vsi->xdp_rings[i]->xsk_umem) + if (vsi->xdp_rings[i]->xsk_pool) (void)i40e_xsk_wakeup(vsi->netdev, i, XDP_WAKEUP_RX); @@ -12835,8 +12836,8 @@ static int i40e_xdp(struct net_device *dev, case XDP_QUERY_PROG: xdp->prog_id = vsi->xdp_prog ? vsi->xdp_prog->aux->id : 0; return 0; - case XDP_SETUP_XSK_UMEM: - return i40e_xsk_umem_setup(vsi, xdp->xsk.umem, + case XDP_SETUP_XSK_POOL: + return i40e_xsk_pool_setup(vsi, xdp->xsk.pool, xdp->xsk.queue_id); default: return -EINVAL; diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c index f9555c8..a50592b 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -636,7 +636,7 @@ void i40e_clean_tx_ring(struct i40e_ring *tx_ring) unsigned long bi_size; u16 i; - if (ring_is_xdp(tx_ring) && tx_ring->xsk_umem) { + if (ring_is_xdp(tx_ring) && tx_ring->xsk_pool) { i40e_xsk_clean_tx_ring(tx_ring); } else { /* ring already cleared, nothing to do */ @@ -1335,7 +1335,7 @@ void i40e_clean_rx_ring(struct i40e_ring *rx_ring) rx_ring->skb = NULL; } - if (rx_ring->xsk_umem) { + if (rx_ring->xsk_pool) { i40e_xsk_clean_rx_ring(rx_ring); goto skip_free; } @@ -1369,7 +1369,7 @@ void i40e_clean_rx_ring(struct i40e_ring *rx_ring) } skip_free: - if (rx_ring->xsk_umem) + if (rx_ring->xsk_pool) i40e_clear_rx_bi_zc(rx_ring); else i40e_clear_rx_bi(rx_ring); @@ -2579,7 +2579,7 @@ int i40e_napi_poll(struct napi_struct *napi, int budget) * budget and be more aggressive about cleaning up the Tx descriptors. */ i40e_for_each_ring(ring, q_vector->tx) { - bool wd = ring->xsk_umem ? + bool wd = ring->xsk_pool ? i40e_clean_xdp_tx_irq(vsi, ring, budget) : i40e_clean_tx_irq(vsi, ring, budget); @@ -2601,7 +2601,7 @@ int i40e_napi_poll(struct napi_struct *napi, int budget) budget_per_ring = max(budget/q_vector->num_ringpairs, 1); i40e_for_each_ring(ring, q_vector->rx) { - int cleaned = ring->xsk_umem ? + int cleaned = ring->xsk_pool ? i40e_clean_rx_irq_zc(ring, budget_per_ring) : i40e_clean_rx_irq(ring, budget_per_ring); diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h index 5c25597..88d43ed 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h @@ -411,7 +411,7 @@ struct i40e_ring { struct i40e_channel *ch; struct xdp_rxq_info xdp_rxq; - struct xdp_umem *xsk_umem; + struct xsk_buff_pool *xsk_pool; } ____cacheline_internodealigned_in_smp; static inline bool ring_uses_build_skb(struct i40e_ring *ring) diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c index 7276580..d7ebdf6 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c @@ -29,14 +29,16 @@ static struct xdp_buff **i40e_rx_bi(struct i40e_ring *rx_ring, u32 idx) } /** - * i40e_xsk_umem_enable - Enable/associate a UMEM to a certain ring/qid + * i40e_xsk_pool_enable - Enable/associate an AF_XDP buffer pool to a + * certain ring/qid * @vsi: Current VSI - * @umem: UMEM - * @qid: Rx ring to associate UMEM to + * @pool: buffer pool + * @qid: Rx ring to associate buffer pool with * * Returns 0 on success, <0 on failure **/ -static int i40e_xsk_umem_enable(struct i40e_vsi *vsi, struct xdp_umem *umem, +static int i40e_xsk_pool_enable(struct i40e_vsi *vsi, + struct xsk_buff_pool *pool, u16 qid) { struct net_device *netdev = vsi->netdev; @@ -53,7 +55,8 @@ static int i40e_xsk_umem_enable(struct i40e_vsi *vsi, struct xdp_umem *umem, qid >= netdev->real_num_tx_queues) return -EINVAL; - err = xsk_buff_dma_map(umem, &vsi->back->pdev->dev, I40E_RX_DMA_ATTR); + err = xsk_buff_dma_map(pool->umem, &vsi->back->pdev->dev, + I40E_RX_DMA_ATTR); if (err) return err; @@ -80,21 +83,22 @@ static int i40e_xsk_umem_enable(struct i40e_vsi *vsi, struct xdp_umem *umem, } /** - * i40e_xsk_umem_disable - Disassociate a UMEM from a certain ring/qid + * i40e_xsk_pool_disable - Disassociate an AF_XDP buffer pool from a + * certain ring/qid * @vsi: Current VSI - * @qid: Rx ring to associate UMEM to + * @qid: Rx ring to associate buffer pool with * * Returns 0 on success, <0 on failure **/ -static int i40e_xsk_umem_disable(struct i40e_vsi *vsi, u16 qid) +static int i40e_xsk_pool_disable(struct i40e_vsi *vsi, u16 qid) { struct net_device *netdev = vsi->netdev; - struct xdp_umem *umem; + struct xsk_buff_pool *pool; bool if_running; int err; - umem = xdp_get_umem_from_qid(netdev, qid); - if (!umem) + pool = xdp_get_xsk_pool_from_qid(netdev, qid); + if (!pool) return -EINVAL; if_running = netif_running(vsi->netdev) && i40e_enabled_xdp_vsi(vsi); @@ -106,7 +110,7 @@ static int i40e_xsk_umem_disable(struct i40e_vsi *vsi, u16 qid) } clear_bit(qid, vsi->af_xdp_zc_qps); - xsk_buff_dma_unmap(umem, I40E_RX_DMA_ATTR); + xsk_buff_dma_unmap(pool->umem, I40E_RX_DMA_ATTR); if (if_running) { err = i40e_queue_pair_enable(vsi, qid); @@ -118,20 +122,21 @@ static int i40e_xsk_umem_disable(struct i40e_vsi *vsi, u16 qid) } /** - * i40e_xsk_umem_setup - Enable/disassociate a UMEM to/from a ring/qid + * i40e_xsk_pool_setup - Enable/disassociate an AF_XDP buffer pool to/from + * a ring/qid * @vsi: Current VSI - * @umem: UMEM to enable/associate to a ring, or NULL to disable - * @qid: Rx ring to (dis)associate UMEM (from)to + * @pool: Buffer pool to enable/associate to a ring, or NULL to disable + * @qid: Rx ring to (dis)associate buffer pool (from)to * - * This function enables or disables a UMEM to a certain ring. + * This function enables or disables a buffer pool to a certain ring. * * Returns 0 on success, <0 on failure **/ -int i40e_xsk_umem_setup(struct i40e_vsi *vsi, struct xdp_umem *umem, +int i40e_xsk_pool_setup(struct i40e_vsi *vsi, struct xsk_buff_pool *pool, u16 qid) { - return umem ? i40e_xsk_umem_enable(vsi, umem, qid) : - i40e_xsk_umem_disable(vsi, qid); + return pool ? i40e_xsk_pool_enable(vsi, pool, qid) : + i40e_xsk_pool_disable(vsi, qid); } /** @@ -191,7 +196,7 @@ bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 count) rx_desc = I40E_RX_DESC(rx_ring, ntu); bi = i40e_rx_bi(rx_ring, ntu); do { - xdp = xsk_buff_alloc(rx_ring->xsk_umem); + xdp = xsk_buff_alloc(rx_ring->xsk_pool->umem); if (!xdp) { ok = false; goto no_buffers; @@ -358,11 +363,11 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget) i40e_finalize_xdp_rx(rx_ring, xdp_xmit); i40e_update_rx_stats(rx_ring, total_rx_bytes, total_rx_packets); - if (xsk_umem_uses_need_wakeup(rx_ring->xsk_umem)) { + if (xsk_umem_uses_need_wakeup(rx_ring->xsk_pool->umem)) { if (failure || rx_ring->next_to_clean == rx_ring->next_to_use) - xsk_set_rx_need_wakeup(rx_ring->xsk_umem); + xsk_set_rx_need_wakeup(rx_ring->xsk_pool->umem); else - xsk_clear_rx_need_wakeup(rx_ring->xsk_umem); + xsk_clear_rx_need_wakeup(rx_ring->xsk_pool->umem); return (int)total_rx_packets; } @@ -391,11 +396,12 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget) break; } - if (!xsk_umem_consume_tx(xdp_ring->xsk_umem, &desc)) + if (!xsk_umem_consume_tx(xdp_ring->xsk_pool->umem, &desc)) break; - dma = xsk_buff_raw_get_dma(xdp_ring->xsk_umem, desc.addr); - xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_umem, dma, + dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool->umem, + desc.addr); + xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool->umem, dma, desc.len); tx_bi = &xdp_ring->tx_bi[xdp_ring->next_to_use]; @@ -419,7 +425,7 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget) I40E_TXD_QW1_CMD_SHIFT); i40e_xdp_ring_update_tail(xdp_ring); - xsk_umem_consume_tx_done(xdp_ring->xsk_umem); + xsk_umem_consume_tx_done(xdp_ring->xsk_pool->umem); } return !!budget && work_done; @@ -452,7 +458,7 @@ bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, { unsigned int ntc, total_bytes = 0, budget = vsi->work_limit; u32 i, completed_frames, frames_ready, xsk_frames = 0; - struct xdp_umem *umem = tx_ring->xsk_umem; + struct xsk_buff_pool *bp = tx_ring->xsk_pool; u32 head_idx = i40e_get_head(tx_ring); bool work_done = true, xmit_done; struct i40e_tx_buffer *tx_bi; @@ -492,14 +498,14 @@ bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, tx_ring->next_to_clean -= tx_ring->count; if (xsk_frames) - xsk_umem_complete_tx(umem, xsk_frames); + xsk_umem_complete_tx(bp->umem, xsk_frames); i40e_arm_wb(tx_ring, vsi, budget); i40e_update_tx_stats(tx_ring, completed_frames, total_bytes); out_xmit: - if (xsk_umem_uses_need_wakeup(tx_ring->xsk_umem)) - xsk_set_tx_need_wakeup(tx_ring->xsk_umem); + if (xsk_umem_uses_need_wakeup(tx_ring->xsk_pool->umem)) + xsk_set_tx_need_wakeup(tx_ring->xsk_pool->umem); xmit_done = i40e_xmit_zc(tx_ring, budget); @@ -533,7 +539,7 @@ int i40e_xsk_wakeup(struct net_device *dev, u32 queue_id, u32 flags) if (queue_id >= vsi->num_queue_pairs) return -ENXIO; - if (!vsi->xdp_rings[queue_id]->xsk_umem) + if (!vsi->xdp_rings[queue_id]->xsk_pool) return -ENXIO; ring = vsi->xdp_rings[queue_id]; @@ -572,7 +578,7 @@ void i40e_xsk_clean_rx_ring(struct i40e_ring *rx_ring) void i40e_xsk_clean_tx_ring(struct i40e_ring *tx_ring) { u16 ntc = tx_ring->next_to_clean, ntu = tx_ring->next_to_use; - struct xdp_umem *umem = tx_ring->xsk_umem; + struct xsk_buff_pool *bp = tx_ring->xsk_pool; struct i40e_tx_buffer *tx_bi; u32 xsk_frames = 0; @@ -592,14 +598,15 @@ void i40e_xsk_clean_tx_ring(struct i40e_ring *tx_ring) } if (xsk_frames) - xsk_umem_complete_tx(umem, xsk_frames); + xsk_umem_complete_tx(bp->umem, xsk_frames); } /** - * i40e_xsk_any_rx_ring_enabled - Checks if Rx rings have AF_XDP UMEM attached + * i40e_xsk_any_rx_ring_enabled - Checks if Rx rings have an AF_XDP + * buffer pool attached * @vsi: vsi * - * Returns true if any of the Rx rings has an AF_XDP UMEM attached + * Returns true if any of the Rx rings has an AF_XDP buffer pool attached **/ bool i40e_xsk_any_rx_ring_enabled(struct i40e_vsi *vsi) { @@ -607,7 +614,7 @@ bool i40e_xsk_any_rx_ring_enabled(struct i40e_vsi *vsi) int i; for (i = 0; i < vsi->num_queue_pairs; i++) { - if (xdp_get_umem_from_qid(netdev, i)) + if (xdp_get_xsk_pool_from_qid(netdev, i)) return true; } diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.h b/drivers/net/ethernet/intel/i40e/i40e_xsk.h index ea919a7d..a5ad927 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.h +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.h @@ -5,12 +5,12 @@ #define _I40E_XSK_H_ struct i40e_vsi; -struct xdp_umem; +struct xsk_buff_pool; struct zero_copy_allocator; int i40e_queue_pair_disable(struct i40e_vsi *vsi, int queue_pair); int i40e_queue_pair_enable(struct i40e_vsi *vsi, int queue_pair); -int i40e_xsk_umem_setup(struct i40e_vsi *vsi, struct xdp_umem *umem, +int i40e_xsk_pool_setup(struct i40e_vsi *vsi, struct xsk_buff_pool *pool, u16 qid); bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 cleaned_count); int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget); diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h index 5792ee6..9eff7e8 100644 --- a/drivers/net/ethernet/intel/ice/ice.h +++ b/drivers/net/ethernet/intel/ice/ice.h @@ -318,9 +318,9 @@ struct ice_vsi { struct ice_ring **xdp_rings; /* XDP ring array */ u16 num_xdp_txq; /* Used XDP queues */ u8 xdp_mapping_mode; /* ICE_MAP_MODE_[CONTIG|SCATTER] */ - struct xdp_umem **xsk_umems; - u16 num_xsk_umems_used; - u16 num_xsk_umems; + struct xsk_buff_pool **xsk_pools; + u16 num_xsk_pools_used; + u16 num_xsk_pools; } ____cacheline_internodealigned_in_smp; /* struct that defines an interrupt vector */ @@ -489,25 +489,25 @@ static inline void ice_set_ring_xdp(struct ice_ring *ring) } /** - * ice_xsk_umem - get XDP UMEM bound to a ring + * ice_xsk_pool - get XSK buffer pool bound to a ring * @ring - ring to use * - * Returns a pointer to xdp_umem structure if there is an UMEM present, + * Returns a pointer to xdp_umem structure if there is a buffer pool present, * NULL otherwise. */ -static inline struct xdp_umem *ice_xsk_umem(struct ice_ring *ring) +static inline struct xsk_buff_pool *ice_xsk_pool(struct ice_ring *ring) { - struct xdp_umem **umems = ring->vsi->xsk_umems; + struct xsk_buff_pool **pools = ring->vsi->xsk_pools; u16 qid = ring->q_index; if (ice_ring_is_xdp(ring)) qid -= ring->vsi->num_xdp_txq; - if (qid >= ring->vsi->num_xsk_umems || !umems || !umems[qid] || + if (qid >= ring->vsi->num_xsk_pools || !pools || !pools[qid] || !ice_is_xdp_ena_vsi(ring->vsi)) return NULL; - return umems[qid]; + return pools[qid]; } /** diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c index d620d26..94dbf89 100644 --- a/drivers/net/ethernet/intel/ice/ice_base.c +++ b/drivers/net/ethernet/intel/ice/ice_base.c @@ -308,12 +308,12 @@ int ice_setup_rx_ctx(struct ice_ring *ring) xdp_rxq_info_reg(&ring->xdp_rxq, ring->netdev, ring->q_index); - ring->xsk_umem = ice_xsk_umem(ring); - if (ring->xsk_umem) { + ring->xsk_pool = ice_xsk_pool(ring); + if (ring->xsk_pool) { xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq); ring->rx_buf_len = - xsk_umem_get_rx_frame_size(ring->xsk_umem); + xsk_umem_get_rx_frame_size(ring->xsk_pool->umem); /* For AF_XDP ZC, we disallow packets to span on * multiple buffers, thus letting us skip that * handling in the fast-path. @@ -324,7 +324,7 @@ int ice_setup_rx_ctx(struct ice_ring *ring) NULL); if (err) return err; - xsk_buff_set_rxq_info(ring->xsk_umem, &ring->xdp_rxq); + xsk_buff_set_rxq_info(ring->xsk_pool->umem, &ring->xdp_rxq); dev_info(dev, "Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring %d\n", ring->q_index); @@ -417,9 +417,9 @@ int ice_setup_rx_ctx(struct ice_ring *ring) ring->tail = hw->hw_addr + QRX_TAIL(pf_q); writel(0, ring->tail); - if (ring->xsk_umem) { - if (!xsk_buff_can_alloc(ring->xsk_umem, num_bufs)) { - dev_warn(dev, "UMEM does not provide enough addresses to fill %d buffers on Rx ring %d\n", + if (ring->xsk_pool) { + if (!xsk_buff_can_alloc(ring->xsk_pool->umem, num_bufs)) { + dev_warn(dev, "XSK buffer pool does not provide enough addresses to fill %d buffers on Rx ring %d\n", num_bufs, ring->q_index); dev_warn(dev, "Change Rx ring/fill queue size to avoid performance issues\n"); @@ -428,7 +428,7 @@ int ice_setup_rx_ctx(struct ice_ring *ring) err = ice_alloc_rx_bufs_zc(ring, num_bufs); if (err) - dev_info(dev, "Failed to allocate some buffers on UMEM enabled Rx ring %d (pf_q %d)\n", + dev_info(dev, "Failed to allocate some buffers on XSK buffer pool enabled Rx ring %d (pf_q %d)\n", ring->q_index, pf_q); return 0; } diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c index 28b46cc..e87e25a 100644 --- a/drivers/net/ethernet/intel/ice/ice_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_lib.c @@ -1713,7 +1713,7 @@ int ice_vsi_cfg_xdp_txqs(struct ice_vsi *vsi) return ret; for (i = 0; i < vsi->num_xdp_txq; i++) - vsi->xdp_rings[i]->xsk_umem = ice_xsk_umem(vsi->xdp_rings[i]); + vsi->xdp_rings[i]->xsk_pool = ice_xsk_pool(vsi->xdp_rings[i]); return ret; } diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 082825e..b354abaf 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -1706,7 +1706,7 @@ static int ice_xdp_alloc_setup_rings(struct ice_vsi *vsi) if (ice_setup_tx_ring(xdp_ring)) goto free_xdp_rings; ice_set_ring_xdp(xdp_ring); - xdp_ring->xsk_umem = ice_xsk_umem(xdp_ring); + xdp_ring->xsk_pool = ice_xsk_pool(xdp_ring); } return 0; @@ -1950,13 +1950,13 @@ ice_xdp_setup_prog(struct ice_vsi *vsi, struct bpf_prog *prog, if (if_running) ret = ice_up(vsi); - if (!ret && prog && vsi->xsk_umems) { + if (!ret && prog && vsi->xsk_pools) { int i; ice_for_each_rxq(vsi, i) { struct ice_ring *rx_ring = vsi->rx_rings[i]; - if (rx_ring->xsk_umem) + if (rx_ring->xsk_pool) napi_schedule(&rx_ring->q_vector->napi); } } @@ -1985,8 +1985,8 @@ static int ice_xdp(struct net_device *dev, struct netdev_bpf *xdp) case XDP_QUERY_PROG: xdp->prog_id = vsi->xdp_prog ? vsi->xdp_prog->aux->id : 0; return 0; - case XDP_SETUP_XSK_UMEM: - return ice_xsk_umem_setup(vsi, xdp->xsk.umem, + case XDP_SETUP_XSK_POOL: + return ice_xsk_pool_setup(vsi, xdp->xsk.pool, xdp->xsk.queue_id); default: return -EINVAL; diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index abdb137c..241c1ea 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -145,7 +145,7 @@ void ice_clean_tx_ring(struct ice_ring *tx_ring) { u16 i; - if (ice_ring_is_xdp(tx_ring) && tx_ring->xsk_umem) { + if (ice_ring_is_xdp(tx_ring) && tx_ring->xsk_pool) { ice_xsk_clean_xdp_ring(tx_ring); goto tx_skip_free; } @@ -375,7 +375,7 @@ void ice_clean_rx_ring(struct ice_ring *rx_ring) if (!rx_ring->rx_buf) return; - if (rx_ring->xsk_umem) { + if (rx_ring->xsk_pool) { ice_xsk_clean_rx_ring(rx_ring); goto rx_skip_free; } @@ -1619,7 +1619,7 @@ int ice_napi_poll(struct napi_struct *napi, int budget) * budget and be more aggressive about cleaning up the Tx descriptors. */ ice_for_each_ring(ring, q_vector->tx) { - bool wd = ring->xsk_umem ? + bool wd = ring->xsk_pool ? ice_clean_tx_irq_zc(ring, budget) : ice_clean_tx_irq(ring, budget); @@ -1649,7 +1649,7 @@ int ice_napi_poll(struct napi_struct *napi, int budget) * comparison in the irq context instead of many inside the * ice_clean_rx_irq function and makes the codebase cleaner. */ - cleaned = ring->xsk_umem ? + cleaned = ring->xsk_pool ? ice_clean_rx_irq_zc(ring, budget_per_ring) : ice_clean_rx_irq(ring, budget_per_ring); work_done += cleaned; diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h index e70c461..3b37360 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.h +++ b/drivers/net/ethernet/intel/ice/ice_txrx.h @@ -295,7 +295,7 @@ struct ice_ring { struct rcu_head rcu; /* to avoid race on free */ struct bpf_prog *xdp_prog; - struct xdp_umem *xsk_umem; + struct xsk_buff_pool *xsk_pool; /* CL3 - 3rd cacheline starts here */ struct xdp_rxq_info xdp_rxq; /* CLX - the below items are only accessed infrequently and should be diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index b6f928c..f0ce669 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -234,7 +234,7 @@ static int ice_qp_ena(struct ice_vsi *vsi, u16 q_idx) if (err) goto free_buf; ice_set_ring_xdp(xdp_ring); - xdp_ring->xsk_umem = ice_xsk_umem(xdp_ring); + xdp_ring->xsk_pool = ice_xsk_pool(xdp_ring); } err = ice_setup_rx_ctx(rx_ring); @@ -258,21 +258,21 @@ static int ice_qp_ena(struct ice_vsi *vsi, u16 q_idx) } /** - * ice_xsk_alloc_umems - allocate a UMEM region for an XDP socket - * @vsi: VSI to allocate the UMEM on + * ice_xsk_alloc_pools - allocate a buffer pool for an XDP socket + * @vsi: VSI to allocate the buffer pool on * * Returns 0 on success, negative on error */ -static int ice_xsk_alloc_umems(struct ice_vsi *vsi) +static int ice_xsk_alloc_pools(struct ice_vsi *vsi) { - if (vsi->xsk_umems) + if (vsi->xsk_pools) return 0; - vsi->xsk_umems = kcalloc(vsi->num_xsk_umems, sizeof(*vsi->xsk_umems), + vsi->xsk_pools = kcalloc(vsi->num_xsk_pools, sizeof(*vsi->xsk_pools), GFP_KERNEL); - if (!vsi->xsk_umems) { - vsi->num_xsk_umems = 0; + if (!vsi->xsk_pools) { + vsi->num_xsk_pools = 0; return -ENOMEM; } @@ -280,74 +280,74 @@ static int ice_xsk_alloc_umems(struct ice_vsi *vsi) } /** - * ice_xsk_remove_umem - Remove an UMEM for a certain ring/qid + * ice_xsk_remove_pool - Remove an buffer pool for a certain ring/qid * @vsi: VSI from which the VSI will be removed - * @qid: Ring/qid associated with the UMEM + * @qid: Ring/qid associated with the buffer pool */ -static void ice_xsk_remove_umem(struct ice_vsi *vsi, u16 qid) +static void ice_xsk_remove_pool(struct ice_vsi *vsi, u16 qid) { - vsi->xsk_umems[qid] = NULL; - vsi->num_xsk_umems_used--; + vsi->xsk_pools[qid] = NULL; + vsi->num_xsk_pools_used--; - if (vsi->num_xsk_umems_used == 0) { - kfree(vsi->xsk_umems); - vsi->xsk_umems = NULL; - vsi->num_xsk_umems = 0; + if (vsi->num_xsk_pools_used == 0) { + kfree(vsi->xsk_pools); + vsi->xsk_pools = NULL; + vsi->num_xsk_pools = 0; } } /** - * ice_xsk_umem_disable - disable a UMEM region + * ice_xsk_pool_disable - disable a buffer pool region * @vsi: Current VSI * @qid: queue ID * * Returns 0 on success, negative on failure */ -static int ice_xsk_umem_disable(struct ice_vsi *vsi, u16 qid) +static int ice_xsk_pool_disable(struct ice_vsi *vsi, u16 qid) { - if (!vsi->xsk_umems || qid >= vsi->num_xsk_umems || - !vsi->xsk_umems[qid]) + if (!vsi->xsk_pools || qid >= vsi->num_xsk_pools || + !vsi->xsk_pools[qid]) return -EINVAL; - xsk_buff_dma_unmap(vsi->xsk_umems[qid], ICE_RX_DMA_ATTR); - ice_xsk_remove_umem(vsi, qid); + xsk_buff_dma_unmap(vsi->xsk_pools[qid]->umem, ICE_RX_DMA_ATTR); + ice_xsk_remove_pool(vsi, qid); return 0; } /** - * ice_xsk_umem_enable - enable a UMEM region + * ice_xsk_pool_enable - enable a buffer pool region * @vsi: Current VSI - * @umem: pointer to a requested UMEM region + * @pool: pointer to a requested buffer pool region * @qid: queue ID * * Returns 0 on success, negative on failure */ static int -ice_xsk_umem_enable(struct ice_vsi *vsi, struct xdp_umem *umem, u16 qid) +ice_xsk_pool_enable(struct ice_vsi *vsi, struct xsk_buff_pool *pool, u16 qid) { int err; if (vsi->type != ICE_VSI_PF) return -EINVAL; - if (!vsi->num_xsk_umems) - vsi->num_xsk_umems = min_t(u16, vsi->num_rxq, vsi->num_txq); - if (qid >= vsi->num_xsk_umems) + if (!vsi->num_xsk_pools) + vsi->num_xsk_pools = min_t(u16, vsi->num_rxq, vsi->num_txq); + if (qid >= vsi->num_xsk_pools) return -EINVAL; - err = ice_xsk_alloc_umems(vsi); + err = ice_xsk_alloc_pools(vsi); if (err) return err; - if (vsi->xsk_umems && vsi->xsk_umems[qid]) + if (vsi->xsk_pools && vsi->xsk_pools[qid]) return -EBUSY; - vsi->xsk_umems[qid] = umem; - vsi->num_xsk_umems_used++; + vsi->xsk_pools[qid] = pool; + vsi->num_xsk_pools_used++; - err = xsk_buff_dma_map(vsi->xsk_umems[qid], ice_pf_to_dev(vsi->back), + err = xsk_buff_dma_map(vsi->xsk_pools[qid]->umem, ice_pf_to_dev(vsi->back), ICE_RX_DMA_ATTR); if (err) return err; @@ -356,17 +356,17 @@ ice_xsk_umem_enable(struct ice_vsi *vsi, struct xdp_umem *umem, u16 qid) } /** - * ice_xsk_umem_setup - enable/disable a UMEM region depending on its state + * ice_xsk_pool_setup - enable/disable a buffer pool region depending on its state * @vsi: Current VSI - * @umem: UMEM to enable/associate to a ring, NULL to disable + * @pool: buffer pool to enable/associate to a ring, NULL to disable * @qid: queue ID * * Returns 0 on success, negative on failure */ -int ice_xsk_umem_setup(struct ice_vsi *vsi, struct xdp_umem *umem, u16 qid) +int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool, u16 qid) { - bool if_running, umem_present = !!umem; - int ret = 0, umem_failure = 0; + bool if_running, pool_present = !!pool; + int ret = 0, pool_failure = 0; if_running = netif_running(vsi->netdev) && ice_is_xdp_ena_vsi(vsi); @@ -374,26 +374,26 @@ int ice_xsk_umem_setup(struct ice_vsi *vsi, struct xdp_umem *umem, u16 qid) ret = ice_qp_dis(vsi, qid); if (ret) { netdev_err(vsi->netdev, "ice_qp_dis error = %d\n", ret); - goto xsk_umem_if_up; + goto xsk_pool_if_up; } } - umem_failure = umem_present ? ice_xsk_umem_enable(vsi, umem, qid) : - ice_xsk_umem_disable(vsi, qid); + pool_failure = pool_present ? ice_xsk_pool_enable(vsi, pool, qid) : + ice_xsk_pool_disable(vsi, qid); -xsk_umem_if_up: +xsk_pool_if_up: if (if_running) { ret = ice_qp_ena(vsi, qid); - if (!ret && umem_present) + if (!ret && pool_present) napi_schedule(&vsi->xdp_rings[qid]->q_vector->napi); else if (ret) netdev_err(vsi->netdev, "ice_qp_ena error = %d\n", ret); } - if (umem_failure) { - netdev_err(vsi->netdev, "Could not %sable UMEM, error = %d\n", - umem_present ? "en" : "dis", umem_failure); - return umem_failure; + if (pool_failure) { + netdev_err(vsi->netdev, "Could not %sable buffer pool, error = %d\n", + pool_present ? "en" : "dis", pool_failure); + return pool_failure; } return ret; @@ -424,7 +424,7 @@ bool ice_alloc_rx_bufs_zc(struct ice_ring *rx_ring, u16 count) rx_buf = &rx_ring->rx_buf[ntu]; do { - rx_buf->xdp = xsk_buff_alloc(rx_ring->xsk_umem); + rx_buf->xdp = xsk_buff_alloc(rx_ring->xsk_pool->umem); if (!rx_buf->xdp) { ret = true; break; @@ -645,11 +645,11 @@ int ice_clean_rx_irq_zc(struct ice_ring *rx_ring, int budget) ice_finalize_xdp_rx(rx_ring, xdp_xmit); ice_update_rx_ring_stats(rx_ring, total_rx_packets, total_rx_bytes); - if (xsk_umem_uses_need_wakeup(rx_ring->xsk_umem)) { + if (xsk_umem_uses_need_wakeup(rx_ring->xsk_pool->umem)) { if (failure || rx_ring->next_to_clean == rx_ring->next_to_use) - xsk_set_rx_need_wakeup(rx_ring->xsk_umem); + xsk_set_rx_need_wakeup(rx_ring->xsk_pool->umem); else - xsk_clear_rx_need_wakeup(rx_ring->xsk_umem); + xsk_clear_rx_need_wakeup(rx_ring->xsk_pool->umem); return (int)total_rx_packets; } @@ -682,11 +682,11 @@ static bool ice_xmit_zc(struct ice_ring *xdp_ring, int budget) tx_buf = &xdp_ring->tx_buf[xdp_ring->next_to_use]; - if (!xsk_umem_consume_tx(xdp_ring->xsk_umem, &desc)) + if (!xsk_umem_consume_tx(xdp_ring->xsk_pool->umem, &desc)) break; - dma = xsk_buff_raw_get_dma(xdp_ring->xsk_umem, desc.addr); - xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_umem, dma, + dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool->umem, desc.addr); + xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool->umem, dma, desc.len); tx_buf->bytecount = desc.len; @@ -703,9 +703,9 @@ static bool ice_xmit_zc(struct ice_ring *xdp_ring, int budget) if (tx_desc) { ice_xdp_ring_update_tail(xdp_ring); - xsk_umem_consume_tx_done(xdp_ring->xsk_umem); - if (xsk_umem_uses_need_wakeup(xdp_ring->xsk_umem)) - xsk_clear_tx_need_wakeup(xdp_ring->xsk_umem); + xsk_umem_consume_tx_done(xdp_ring->xsk_pool->umem); + if (xsk_umem_uses_need_wakeup(xdp_ring->xsk_pool->umem)) + xsk_clear_tx_need_wakeup(xdp_ring->xsk_pool->umem); } return budget > 0 && work_done; @@ -779,13 +779,13 @@ bool ice_clean_tx_irq_zc(struct ice_ring *xdp_ring, int budget) xdp_ring->next_to_clean = ntc; if (xsk_frames) - xsk_umem_complete_tx(xdp_ring->xsk_umem, xsk_frames); + xsk_umem_complete_tx(xdp_ring->xsk_pool->umem, xsk_frames); - if (xsk_umem_uses_need_wakeup(xdp_ring->xsk_umem)) { + if (xsk_umem_uses_need_wakeup(xdp_ring->xsk_pool->umem)) { if (xdp_ring->next_to_clean == xdp_ring->next_to_use) - xsk_set_tx_need_wakeup(xdp_ring->xsk_umem); + xsk_set_tx_need_wakeup(xdp_ring->xsk_pool->umem); else - xsk_clear_tx_need_wakeup(xdp_ring->xsk_umem); + xsk_clear_tx_need_wakeup(xdp_ring->xsk_pool->umem); } ice_update_tx_ring_stats(xdp_ring, total_packets, total_bytes); @@ -820,7 +820,7 @@ ice_xsk_wakeup(struct net_device *netdev, u32 queue_id, if (queue_id >= vsi->num_txq) return -ENXIO; - if (!vsi->xdp_rings[queue_id]->xsk_umem) + if (!vsi->xdp_rings[queue_id]->xsk_pool) return -ENXIO; ring = vsi->xdp_rings[queue_id]; @@ -839,20 +839,20 @@ ice_xsk_wakeup(struct net_device *netdev, u32 queue_id, } /** - * ice_xsk_any_rx_ring_ena - Checks if Rx rings have AF_XDP UMEM attached + * ice_xsk_any_rx_ring_ena - Checks if Rx rings have AF_XDP buff pool attached * @vsi: VSI to be checked * - * Returns true if any of the Rx rings has an AF_XDP UMEM attached + * Returns true if any of the Rx rings has an AF_XDP buff pool attached */ bool ice_xsk_any_rx_ring_ena(struct ice_vsi *vsi) { int i; - if (!vsi->xsk_umems) + if (!vsi->xsk_pools) return false; - for (i = 0; i < vsi->num_xsk_umems; i++) { - if (vsi->xsk_umems[i]) + for (i = 0; i < vsi->num_xsk_pools; i++) { + if (vsi->xsk_pools[i]) return true; } @@ -860,7 +860,7 @@ bool ice_xsk_any_rx_ring_ena(struct ice_vsi *vsi) } /** - * ice_xsk_clean_rx_ring - clean UMEM queues connected to a given Rx ring + * ice_xsk_clean_rx_ring - clean buffer pool queues connected to a given Rx ring * @rx_ring: ring to be cleaned */ void ice_xsk_clean_rx_ring(struct ice_ring *rx_ring) @@ -878,7 +878,7 @@ void ice_xsk_clean_rx_ring(struct ice_ring *rx_ring) } /** - * ice_xsk_clean_xdp_ring - Clean the XDP Tx ring and its UMEM queues + * ice_xsk_clean_xdp_ring - Clean the XDP Tx ring and its buffer pool queues * @xdp_ring: XDP_Tx ring */ void ice_xsk_clean_xdp_ring(struct ice_ring *xdp_ring) @@ -902,5 +902,5 @@ void ice_xsk_clean_xdp_ring(struct ice_ring *xdp_ring) } if (xsk_frames) - xsk_umem_complete_tx(xdp_ring->xsk_umem, xsk_frames); + xsk_umem_complete_tx(xdp_ring->xsk_pool->umem, xsk_frames); } diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.h b/drivers/net/ethernet/intel/ice/ice_xsk.h index fc1a06b..fad7836 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.h +++ b/drivers/net/ethernet/intel/ice/ice_xsk.h @@ -9,7 +9,8 @@ struct ice_vsi; #ifdef CONFIG_XDP_SOCKETS -int ice_xsk_umem_setup(struct ice_vsi *vsi, struct xdp_umem *umem, u16 qid); +int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool, + u16 qid); int ice_clean_rx_irq_zc(struct ice_ring *rx_ring, int budget); bool ice_clean_tx_irq_zc(struct ice_ring *xdp_ring, int budget); int ice_xsk_wakeup(struct net_device *netdev, u32 queue_id, u32 flags); @@ -19,8 +20,8 @@ void ice_xsk_clean_rx_ring(struct ice_ring *rx_ring); void ice_xsk_clean_xdp_ring(struct ice_ring *xdp_ring); #else static inline int -ice_xsk_umem_setup(struct ice_vsi __always_unused *vsi, - struct xdp_umem __always_unused *umem, +ice_xsk_pool_setup(struct ice_vsi __always_unused *vsi, + struct xsk_buff_pool __always_unused *pool, u16 __always_unused qid) { return -EOPNOTSUPP; diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h index 5ddfc83..bd0f65e 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h @@ -350,7 +350,7 @@ struct ixgbe_ring { struct ixgbe_rx_queue_stats rx_stats; }; struct xdp_rxq_info xdp_rxq; - struct xdp_umem *xsk_umem; + struct xsk_buff_pool *xsk_pool; u16 ring_idx; /* {rx,tx,xdp}_ring back reference idx */ u16 rx_buf_len; } ____cacheline_internodealigned_in_smp; diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index f162b8b..3217000 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -3158,7 +3158,7 @@ int ixgbe_poll(struct napi_struct *napi, int budget) #endif ixgbe_for_each_ring(ring, q_vector->tx) { - bool wd = ring->xsk_umem ? + bool wd = ring->xsk_pool ? ixgbe_clean_xdp_tx_irq(q_vector, ring, budget) : ixgbe_clean_tx_irq(q_vector, ring, budget); @@ -3178,7 +3178,7 @@ int ixgbe_poll(struct napi_struct *napi, int budget) per_ring_budget = budget; ixgbe_for_each_ring(ring, q_vector->rx) { - int cleaned = ring->xsk_umem ? + int cleaned = ring->xsk_pool ? ixgbe_clean_rx_irq_zc(q_vector, ring, per_ring_budget) : ixgbe_clean_rx_irq(q_vector, ring, @@ -3473,9 +3473,9 @@ void ixgbe_configure_tx_ring(struct ixgbe_adapter *adapter, u32 txdctl = IXGBE_TXDCTL_ENABLE; u8 reg_idx = ring->reg_idx; - ring->xsk_umem = NULL; + ring->xsk_pool = NULL; if (ring_is_xdp(ring)) - ring->xsk_umem = ixgbe_xsk_umem(adapter, ring); + ring->xsk_pool = ixgbe_xsk_pool(adapter, ring); /* disable queue to avoid issues while updating state */ IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(reg_idx), 0); @@ -3715,8 +3715,8 @@ static void ixgbe_configure_srrctl(struct ixgbe_adapter *adapter, srrctl = IXGBE_RX_HDR_SIZE << IXGBE_SRRCTL_BSIZEHDRSIZE_SHIFT; /* configure the packet buffer length */ - if (rx_ring->xsk_umem) { - u32 xsk_buf_len = xsk_umem_get_rx_frame_size(rx_ring->xsk_umem); + if (rx_ring->xsk_pool) { + u32 xsk_buf_len = xsk_umem_get_rx_frame_size(rx_ring->xsk_pool->umem); /* If the MAC support setting RXDCTL.RLPML, the * SRRCTL[n].BSIZEPKT is set to PAGE_SIZE and @@ -4061,12 +4061,12 @@ void ixgbe_configure_rx_ring(struct ixgbe_adapter *adapter, u8 reg_idx = ring->reg_idx; xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq); - ring->xsk_umem = ixgbe_xsk_umem(adapter, ring); - if (ring->xsk_umem) { + ring->xsk_pool = ixgbe_xsk_pool(adapter, ring); + if (ring->xsk_pool) { WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq, MEM_TYPE_XSK_BUFF_POOL, NULL)); - xsk_buff_set_rxq_info(ring->xsk_umem, &ring->xdp_rxq); + xsk_buff_set_rxq_info(ring->xsk_pool->umem, &ring->xdp_rxq); } else { WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq, MEM_TYPE_PAGE_SHARED, NULL)); @@ -4121,8 +4121,8 @@ void ixgbe_configure_rx_ring(struct ixgbe_adapter *adapter, #endif } - if (ring->xsk_umem && hw->mac.type != ixgbe_mac_82599EB) { - u32 xsk_buf_len = xsk_umem_get_rx_frame_size(ring->xsk_umem); + if (ring->xsk_pool && hw->mac.type != ixgbe_mac_82599EB) { + u32 xsk_buf_len = xsk_umem_get_rx_frame_size(ring->xsk_pool->umem); rxdctl &= ~(IXGBE_RXDCTL_RLPMLMASK | IXGBE_RXDCTL_RLPML_EN); @@ -4144,7 +4144,7 @@ void ixgbe_configure_rx_ring(struct ixgbe_adapter *adapter, IXGBE_WRITE_REG(hw, IXGBE_RXDCTL(reg_idx), rxdctl); ixgbe_rx_desc_queue_enable(adapter, ring); - if (ring->xsk_umem) + if (ring->xsk_pool) ixgbe_alloc_rx_buffers_zc(ring, ixgbe_desc_unused(ring)); else ixgbe_alloc_rx_buffers(ring, ixgbe_desc_unused(ring)); @@ -5277,7 +5277,7 @@ static void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring) u16 i = rx_ring->next_to_clean; struct ixgbe_rx_buffer *rx_buffer = &rx_ring->rx_buffer_info[i]; - if (rx_ring->xsk_umem) { + if (rx_ring->xsk_pool) { ixgbe_xsk_clean_rx_ring(rx_ring); goto skip_free; } @@ -5965,7 +5965,7 @@ static void ixgbe_clean_tx_ring(struct ixgbe_ring *tx_ring) u16 i = tx_ring->next_to_clean; struct ixgbe_tx_buffer *tx_buffer = &tx_ring->tx_buffer_info[i]; - if (tx_ring->xsk_umem) { + if (tx_ring->xsk_pool) { ixgbe_xsk_clean_tx_ring(tx_ring); goto out; } @@ -10290,7 +10290,7 @@ static int ixgbe_xdp_setup(struct net_device *dev, struct bpf_prog *prog) */ if (need_reset && prog) for (i = 0; i < adapter->num_rx_queues; i++) - if (adapter->xdp_ring[i]->xsk_umem) + if (adapter->xdp_ring[i]->xsk_pool) (void)ixgbe_xsk_wakeup(adapter->netdev, i, XDP_WAKEUP_RX); @@ -10308,8 +10308,8 @@ static int ixgbe_xdp(struct net_device *dev, struct netdev_bpf *xdp) xdp->prog_id = adapter->xdp_prog ? adapter->xdp_prog->aux->id : 0; return 0; - case XDP_SETUP_XSK_UMEM: - return ixgbe_xsk_umem_setup(adapter, xdp->xsk.umem, + case XDP_SETUP_XSK_POOL: + return ixgbe_xsk_pool_setup(adapter, xdp->xsk.pool, xdp->xsk.queue_id); default: diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h index 7887ae4..2aeec78 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h @@ -28,9 +28,10 @@ void ixgbe_irq_rearm_queues(struct ixgbe_adapter *adapter, u64 qmask); void ixgbe_txrx_ring_disable(struct ixgbe_adapter *adapter, int ring); void ixgbe_txrx_ring_enable(struct ixgbe_adapter *adapter, int ring); -struct xdp_umem *ixgbe_xsk_umem(struct ixgbe_adapter *adapter, - struct ixgbe_ring *ring); -int ixgbe_xsk_umem_setup(struct ixgbe_adapter *adapter, struct xdp_umem *umem, +struct xsk_buff_pool *ixgbe_xsk_pool(struct ixgbe_adapter *adapter, + struct ixgbe_ring *ring); +int ixgbe_xsk_pool_setup(struct ixgbe_adapter *adapter, + struct xsk_buff_pool *pool, u16 qid); void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle); diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c index be9d2a8..9f503d6 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c @@ -8,8 +8,8 @@ #include "ixgbe.h" #include "ixgbe_txrx_common.h" -struct xdp_umem *ixgbe_xsk_umem(struct ixgbe_adapter *adapter, - struct ixgbe_ring *ring) +struct xsk_buff_pool *ixgbe_xsk_pool(struct ixgbe_adapter *adapter, + struct ixgbe_ring *ring) { bool xdp_on = READ_ONCE(adapter->xdp_prog); int qid = ring->ring_idx; @@ -17,11 +17,11 @@ struct xdp_umem *ixgbe_xsk_umem(struct ixgbe_adapter *adapter, if (!xdp_on || !test_bit(qid, adapter->af_xdp_zc_qps)) return NULL; - return xdp_get_umem_from_qid(adapter->netdev, qid); + return xdp_get_xsk_pool_from_qid(adapter->netdev, qid); } -static int ixgbe_xsk_umem_enable(struct ixgbe_adapter *adapter, - struct xdp_umem *umem, +static int ixgbe_xsk_pool_enable(struct ixgbe_adapter *adapter, + struct xsk_buff_pool *pool, u16 qid) { struct net_device *netdev = adapter->netdev; @@ -35,7 +35,7 @@ static int ixgbe_xsk_umem_enable(struct ixgbe_adapter *adapter, qid >= netdev->real_num_tx_queues) return -EINVAL; - err = xsk_buff_dma_map(umem, &adapter->pdev->dev, IXGBE_RX_DMA_ATTR); + err = xsk_buff_dma_map(pool->umem, &adapter->pdev->dev, IXGBE_RX_DMA_ATTR); if (err) return err; @@ -59,13 +59,13 @@ static int ixgbe_xsk_umem_enable(struct ixgbe_adapter *adapter, return 0; } -static int ixgbe_xsk_umem_disable(struct ixgbe_adapter *adapter, u16 qid) +static int ixgbe_xsk_pool_disable(struct ixgbe_adapter *adapter, u16 qid) { - struct xdp_umem *umem; + struct xsk_buff_pool *pool; bool if_running; - umem = xdp_get_umem_from_qid(adapter->netdev, qid); - if (!umem) + pool = xdp_get_xsk_pool_from_qid(adapter->netdev, qid); + if (!pool) return -EINVAL; if_running = netif_running(adapter->netdev) && @@ -75,7 +75,7 @@ static int ixgbe_xsk_umem_disable(struct ixgbe_adapter *adapter, u16 qid) ixgbe_txrx_ring_disable(adapter, qid); clear_bit(qid, adapter->af_xdp_zc_qps); - xsk_buff_dma_unmap(umem, IXGBE_RX_DMA_ATTR); + xsk_buff_dma_unmap(pool->umem, IXGBE_RX_DMA_ATTR); if (if_running) ixgbe_txrx_ring_enable(adapter, qid); @@ -83,11 +83,12 @@ static int ixgbe_xsk_umem_disable(struct ixgbe_adapter *adapter, u16 qid) return 0; } -int ixgbe_xsk_umem_setup(struct ixgbe_adapter *adapter, struct xdp_umem *umem, +int ixgbe_xsk_pool_setup(struct ixgbe_adapter *adapter, + struct xsk_buff_pool *pool, u16 qid) { - return umem ? ixgbe_xsk_umem_enable(adapter, umem, qid) : - ixgbe_xsk_umem_disable(adapter, qid); + return pool ? ixgbe_xsk_pool_enable(adapter, pool, qid) : + ixgbe_xsk_pool_disable(adapter, qid); } static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter, @@ -149,7 +150,7 @@ bool ixgbe_alloc_rx_buffers_zc(struct ixgbe_ring *rx_ring, u16 count) i -= rx_ring->count; do { - bi->xdp = xsk_buff_alloc(rx_ring->xsk_umem); + bi->xdp = xsk_buff_alloc(rx_ring->xsk_pool->umem); if (!bi->xdp) { ok = false; break; @@ -344,11 +345,11 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector, q_vector->rx.total_packets += total_rx_packets; q_vector->rx.total_bytes += total_rx_bytes; - if (xsk_umem_uses_need_wakeup(rx_ring->xsk_umem)) { + if (xsk_umem_uses_need_wakeup(rx_ring->xsk_pool->umem)) { if (failure || rx_ring->next_to_clean == rx_ring->next_to_use) - xsk_set_rx_need_wakeup(rx_ring->xsk_umem); + xsk_set_rx_need_wakeup(rx_ring->xsk_pool->umem); else - xsk_clear_rx_need_wakeup(rx_ring->xsk_umem); + xsk_clear_rx_need_wakeup(rx_ring->xsk_pool->umem); return (int)total_rx_packets; } @@ -373,6 +374,7 @@ void ixgbe_xsk_clean_rx_ring(struct ixgbe_ring *rx_ring) static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget) { + struct xsk_buff_pool *pool = xdp_ring->xsk_pool; union ixgbe_adv_tx_desc *tx_desc = NULL; struct ixgbe_tx_buffer *tx_bi; bool work_done = true; @@ -387,12 +389,11 @@ static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget) break; } - if (!xsk_umem_consume_tx(xdp_ring->xsk_umem, &desc)) + if (!xsk_umem_consume_tx(pool->umem, &desc)) break; - dma = xsk_buff_raw_get_dma(xdp_ring->xsk_umem, desc.addr); - xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_umem, dma, - desc.len); + dma = xsk_buff_raw_get_dma(pool->umem, desc.addr); + xsk_buff_raw_dma_sync_for_device(pool->umem, dma, desc.len); tx_bi = &xdp_ring->tx_buffer_info[xdp_ring->next_to_use]; tx_bi->bytecount = desc.len; @@ -418,7 +419,7 @@ static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget) if (tx_desc) { ixgbe_xdp_ring_update_tail(xdp_ring); - xsk_umem_consume_tx_done(xdp_ring->xsk_umem); + xsk_umem_consume_tx_done(pool->umem); } return !!budget && work_done; @@ -439,7 +440,7 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector, { u16 ntc = tx_ring->next_to_clean, ntu = tx_ring->next_to_use; unsigned int total_packets = 0, total_bytes = 0; - struct xdp_umem *umem = tx_ring->xsk_umem; + struct xsk_buff_pool *pool = tx_ring->xsk_pool; union ixgbe_adv_tx_desc *tx_desc; struct ixgbe_tx_buffer *tx_bi; u32 xsk_frames = 0; @@ -484,10 +485,10 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector, q_vector->tx.total_packets += total_packets; if (xsk_frames) - xsk_umem_complete_tx(umem, xsk_frames); + xsk_umem_complete_tx(pool->umem, xsk_frames); - if (xsk_umem_uses_need_wakeup(tx_ring->xsk_umem)) - xsk_set_tx_need_wakeup(tx_ring->xsk_umem); + if (xsk_umem_uses_need_wakeup(pool->umem)) + xsk_set_tx_need_wakeup(pool->umem); return ixgbe_xmit_zc(tx_ring, q_vector->tx.work_limit); } @@ -511,7 +512,7 @@ int ixgbe_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags) if (test_bit(__IXGBE_TX_DISABLED, &ring->state)) return -ENETDOWN; - if (!ring->xsk_umem) + if (!ring->xsk_pool) return -ENXIO; if (!napi_if_scheduled_mark_missed(&ring->q_vector->napi)) { @@ -526,7 +527,7 @@ int ixgbe_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags) void ixgbe_xsk_clean_tx_ring(struct ixgbe_ring *tx_ring) { u16 ntc = tx_ring->next_to_clean, ntu = tx_ring->next_to_use; - struct xdp_umem *umem = tx_ring->xsk_umem; + struct xsk_buff_pool *pool = tx_ring->xsk_pool; struct ixgbe_tx_buffer *tx_bi; u32 xsk_frames = 0; @@ -546,5 +547,5 @@ void ixgbe_xsk_clean_tx_ring(struct ixgbe_ring *tx_ring) } if (xsk_frames) - xsk_umem_complete_tx(umem, xsk_frames); + xsk_umem_complete_tx(pool->umem, xsk_frames); } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 842db20..516dfd3 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -448,7 +448,7 @@ struct mlx5e_xdpsq { struct mlx5e_cq cq; /* read only */ - struct xdp_umem *umem; + struct xsk_buff_pool *pool; struct mlx5_wq_cyc wq; struct mlx5e_xdpsq_stats *stats; mlx5e_fp_xmit_xdp_frame_check xmit_xdp_frame_check; @@ -610,7 +610,7 @@ struct mlx5e_rq { struct page_pool *page_pool; /* AF_XDP zero-copy */ - struct xdp_umem *umem; + struct xsk_buff_pool *xsk_pool; struct work_struct recover_work; @@ -731,12 +731,13 @@ struct mlx5e_hv_vhca_stats_agent { #endif struct mlx5e_xsk { - /* UMEMs are stored separately from channels, because we don't want to - * lose them when channels are recreated. The kernel also stores UMEMs, - * but it doesn't distinguish between zero-copy and non-zero-copy UMEMs, - * so rely on our mechanism. + /* XSK buffer pools are stored separately from channels, + * because we don't want to lose them when channels are + * recreated. The kernel also stores buffer pool, but it doesn't + * distinguish between zero-copy and non-zero-copy UMEMs, so + * rely on our mechanism. */ - struct xdp_umem **umems; + struct xsk_buff_pool **pools; u16 refcnt; bool ever_used; }; @@ -948,7 +949,7 @@ struct mlx5e_xsk_param; struct mlx5e_rq_param; int mlx5e_open_rq(struct mlx5e_channel *c, struct mlx5e_params *params, struct mlx5e_rq_param *param, struct mlx5e_xsk_param *xsk, - struct xdp_umem *umem, struct mlx5e_rq *rq); + struct xsk_buff_pool *pool, struct mlx5e_rq *rq); int mlx5e_wait_for_min_rx_wqes(struct mlx5e_rq *rq, int wait_time); void mlx5e_deactivate_rq(struct mlx5e_rq *rq); void mlx5e_close_rq(struct mlx5e_rq *rq); @@ -958,7 +959,7 @@ int mlx5e_open_icosq(struct mlx5e_channel *c, struct mlx5e_params *params, struct mlx5e_sq_param *param, struct mlx5e_icosq *sq); void mlx5e_close_icosq(struct mlx5e_icosq *sq); int mlx5e_open_xdpsq(struct mlx5e_channel *c, struct mlx5e_params *params, - struct mlx5e_sq_param *param, struct xdp_umem *umem, + struct mlx5e_sq_param *param, struct xsk_buff_pool *pool, struct mlx5e_xdpsq *sq, bool is_redirect); void mlx5e_close_xdpsq(struct mlx5e_xdpsq *sq); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c index c9d308e..0a5a873 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c @@ -446,7 +446,7 @@ bool mlx5e_poll_xdpsq_cq(struct mlx5e_cq *cq) } while ((++i < MLX5E_TX_CQ_POLL_BUDGET) && (cqe = mlx5_cqwq_get_cqe(&cq->wq))); if (xsk_frames) - xsk_umem_complete_tx(sq->umem, xsk_frames); + xsk_umem_complete_tx(sq->pool->umem, xsk_frames); sq->stats->cqes += i; @@ -476,7 +476,7 @@ void mlx5e_free_xdpsq_descs(struct mlx5e_xdpsq *sq) } if (xsk_frames) - xsk_umem_complete_tx(sq->umem, xsk_frames); + xsk_umem_complete_tx(sq->pool->umem, xsk_frames); } int mlx5e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames, @@ -561,4 +561,3 @@ void mlx5e_set_xmit_fp(struct mlx5e_xdpsq *sq, bool is_mpw) sq->xmit_xdp_frame = is_mpw ? mlx5e_xmit_xdp_frame_mpwqe : mlx5e_xmit_xdp_frame; } - diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h index d147b2f..3dd056a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h @@ -19,10 +19,10 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_linear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi, u32 cqe_bcnt); -static inline int mlx5e_xsk_page_alloc_umem(struct mlx5e_rq *rq, +static inline int mlx5e_xsk_page_alloc_pool(struct mlx5e_rq *rq, struct mlx5e_dma_info *dma_info) { - dma_info->xsk = xsk_buff_alloc(rq->umem); + dma_info->xsk = xsk_buff_alloc(rq->xsk_pool->umem); if (!dma_info->xsk) return -ENOMEM; @@ -38,13 +38,13 @@ static inline int mlx5e_xsk_page_alloc_umem(struct mlx5e_rq *rq, static inline bool mlx5e_xsk_update_rx_wakeup(struct mlx5e_rq *rq, bool alloc_err) { - if (!xsk_umem_uses_need_wakeup(rq->umem)) + if (!xsk_umem_uses_need_wakeup(rq->xsk_pool->umem)) return alloc_err; if (unlikely(alloc_err)) - xsk_set_rx_need_wakeup(rq->umem); + xsk_set_rx_need_wakeup(rq->xsk_pool->umem); else - xsk_clear_rx_need_wakeup(rq->umem); + xsk_clear_rx_need_wakeup(rq->xsk_pool->umem); return false; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c index 2c80205..f32a381 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c @@ -62,7 +62,7 @@ static void mlx5e_build_xsk_cparam(struct mlx5e_priv *priv, } int mlx5e_open_xsk(struct mlx5e_priv *priv, struct mlx5e_params *params, - struct mlx5e_xsk_param *xsk, struct xdp_umem *umem, + struct mlx5e_xsk_param *xsk, struct xsk_buff_pool *pool, struct mlx5e_channel *c) { struct mlx5e_channel_param *cparam; @@ -82,7 +82,7 @@ int mlx5e_open_xsk(struct mlx5e_priv *priv, struct mlx5e_params *params, if (unlikely(err)) goto err_free_cparam; - err = mlx5e_open_rq(c, params, &cparam->rq, xsk, umem, &c->xskrq); + err = mlx5e_open_rq(c, params, &cparam->rq, xsk, pool, &c->xskrq); if (unlikely(err)) goto err_close_rx_cq; @@ -90,13 +90,13 @@ int mlx5e_open_xsk(struct mlx5e_priv *priv, struct mlx5e_params *params, if (unlikely(err)) goto err_close_rq; - /* Create a separate SQ, so that when the UMEM is disabled, we could + /* Create a separate SQ, so that when the buff pool is disabled, we could * close this SQ safely and stop receiving CQEs. In other case, e.g., if - * the XDPSQ was used instead, we might run into trouble when the UMEM + * the XDPSQ was used instead, we might run into trouble when the buff pool * is disabled and then reenabled, but the SQ continues receiving CQEs - * from the old UMEM. + * from the old buff pool. */ - err = mlx5e_open_xdpsq(c, params, &cparam->xdp_sq, umem, &c->xsksq, true); + err = mlx5e_open_xdpsq(c, params, &cparam->xdp_sq, pool, &c->xsksq, true); if (unlikely(err)) goto err_close_tx_cq; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.h index 0dd11b8..ca20f1f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.h @@ -12,7 +12,7 @@ bool mlx5e_validate_xsk_param(struct mlx5e_params *params, struct mlx5e_xsk_param *xsk, struct mlx5_core_dev *mdev); int mlx5e_open_xsk(struct mlx5e_priv *priv, struct mlx5e_params *params, - struct mlx5e_xsk_param *xsk, struct xdp_umem *umem, + struct mlx5e_xsk_param *xsk, struct xsk_buff_pool *pool, struct mlx5e_channel *c); void mlx5e_close_xsk(struct mlx5e_channel *c); void mlx5e_activate_xsk(struct mlx5e_channel *c); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c index 83dce9c..abe4639 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c @@ -66,7 +66,7 @@ static void mlx5e_xsk_tx_post_err(struct mlx5e_xdpsq *sq, bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget) { - struct xdp_umem *umem = sq->umem; + struct xsk_buff_pool *pool = sq->pool; struct mlx5e_xdp_info xdpi; struct mlx5e_xdp_xmit_data xdptxd; bool work_done = true; @@ -83,7 +83,7 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget) break; } - if (!xsk_umem_consume_tx(umem, &desc)) { + if (!xsk_umem_consume_tx(pool->umem, &desc)) { /* TX will get stuck until something wakes it up by * triggering NAPI. Currently it's expected that the * application calls sendto() if there are consumed, but @@ -92,11 +92,11 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget) break; } - xdptxd.dma_addr = xsk_buff_raw_get_dma(umem, desc.addr); - xdptxd.data = xsk_buff_raw_get_data(umem, desc.addr); + xdptxd.dma_addr = xsk_buff_raw_get_dma(pool->umem, desc.addr); + xdptxd.data = xsk_buff_raw_get_data(pool->umem, desc.addr); xdptxd.len = desc.len; - xsk_buff_raw_dma_sync_for_device(umem, xdptxd.dma_addr, xdptxd.len); + xsk_buff_raw_dma_sync_for_device(pool->umem, xdptxd.dma_addr, xdptxd.len); if (unlikely(!sq->xmit_xdp_frame(sq, &xdptxd, &xdpi, check_result))) { if (sq->mpwqe.wqe) @@ -113,7 +113,7 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget) mlx5e_xdp_mpwqe_complete(sq); mlx5e_xmit_xdp_doorbell(sq); - xsk_umem_consume_tx_done(umem); + xsk_umem_consume_tx_done(pool->umem); } return !(budget && work_done); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h index 39fa0a7..610a084 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h @@ -15,13 +15,13 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget); static inline void mlx5e_xsk_update_tx_wakeup(struct mlx5e_xdpsq *sq) { - if (!xsk_umem_uses_need_wakeup(sq->umem)) + if (!xsk_umem_uses_need_wakeup(sq->pool->umem)) return; if (sq->pc != sq->cc) - xsk_clear_tx_need_wakeup(sq->umem); + xsk_clear_tx_need_wakeup(sq->pool->umem); else - xsk_set_tx_need_wakeup(sq->umem); + xsk_set_tx_need_wakeup(sq->pool->umem); } #endif /* __MLX5_EN_XSK_TX_H__ */ diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.c index 7b17fcd..947abf1 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.c @@ -6,26 +6,26 @@ #include "setup.h" #include "en/params.h" -static int mlx5e_xsk_map_umem(struct mlx5e_priv *priv, - struct xdp_umem *umem) +static int mlx5e_xsk_map_pool(struct mlx5e_priv *priv, + struct xsk_buff_pool *pool) { struct device *dev = priv->mdev->device; - return xsk_buff_dma_map(umem, dev, 0); + return xsk_buff_dma_map(pool->umem, dev, 0); } -static void mlx5e_xsk_unmap_umem(struct mlx5e_priv *priv, - struct xdp_umem *umem) +static void mlx5e_xsk_unmap_pool(struct mlx5e_priv *priv, + struct xsk_buff_pool *pool) { - return xsk_buff_dma_unmap(umem, 0); + return xsk_buff_dma_unmap(pool->umem, 0); } -static int mlx5e_xsk_get_umems(struct mlx5e_xsk *xsk) +static int mlx5e_xsk_get_pools(struct mlx5e_xsk *xsk) { - if (!xsk->umems) { - xsk->umems = kcalloc(MLX5E_MAX_NUM_CHANNELS, - sizeof(*xsk->umems), GFP_KERNEL); - if (unlikely(!xsk->umems)) + if (!xsk->pools) { + xsk->pools = kcalloc(MLX5E_MAX_NUM_CHANNELS, + sizeof(*xsk->pools), GFP_KERNEL); + if (unlikely(!xsk->pools)) return -ENOMEM; } @@ -35,68 +35,68 @@ static int mlx5e_xsk_get_umems(struct mlx5e_xsk *xsk) return 0; } -static void mlx5e_xsk_put_umems(struct mlx5e_xsk *xsk) +static void mlx5e_xsk_put_pools(struct mlx5e_xsk *xsk) { if (!--xsk->refcnt) { - kfree(xsk->umems); - xsk->umems = NULL; + kfree(xsk->pools); + xsk->pools = NULL; } } -static int mlx5e_xsk_add_umem(struct mlx5e_xsk *xsk, struct xdp_umem *umem, u16 ix) +static int mlx5e_xsk_add_pool(struct mlx5e_xsk *xsk, struct xsk_buff_pool *pool, u16 ix) { int err; - err = mlx5e_xsk_get_umems(xsk); + err = mlx5e_xsk_get_pools(xsk); if (unlikely(err)) return err; - xsk->umems[ix] = umem; + xsk->pools[ix] = pool; return 0; } -static void mlx5e_xsk_remove_umem(struct mlx5e_xsk *xsk, u16 ix) +static void mlx5e_xsk_remove_pool(struct mlx5e_xsk *xsk, u16 ix) { - xsk->umems[ix] = NULL; + xsk->pools[ix] = NULL; - mlx5e_xsk_put_umems(xsk); + mlx5e_xsk_put_pools(xsk); } -static bool mlx5e_xsk_is_umem_sane(struct xdp_umem *umem) +static bool mlx5e_xsk_is_pool_sane(struct xsk_buff_pool *pool) { - return xsk_umem_get_headroom(umem) <= 0xffff && - xsk_umem_get_chunk_size(umem) <= 0xffff; + return xsk_umem_get_headroom(pool->umem) <= 0xffff && + xsk_umem_get_chunk_size(pool->umem) <= 0xffff; } -void mlx5e_build_xsk_param(struct xdp_umem *umem, struct mlx5e_xsk_param *xsk) +void mlx5e_build_xsk_param(struct xsk_buff_pool *pool, struct mlx5e_xsk_param *xsk) { - xsk->headroom = xsk_umem_get_headroom(umem); - xsk->chunk_size = xsk_umem_get_chunk_size(umem); + xsk->headroom = xsk_umem_get_headroom(pool->umem); + xsk->chunk_size = xsk_umem_get_chunk_size(pool->umem); } static int mlx5e_xsk_enable_locked(struct mlx5e_priv *priv, - struct xdp_umem *umem, u16 ix) + struct xsk_buff_pool *pool, u16 ix) { struct mlx5e_params *params = &priv->channels.params; struct mlx5e_xsk_param xsk; struct mlx5e_channel *c; int err; - if (unlikely(mlx5e_xsk_get_umem(&priv->channels.params, &priv->xsk, ix))) + if (unlikely(mlx5e_xsk_get_pool(&priv->channels.params, &priv->xsk, ix))) return -EBUSY; - if (unlikely(!mlx5e_xsk_is_umem_sane(umem))) + if (unlikely(!mlx5e_xsk_is_pool_sane(pool))) return -EINVAL; - err = mlx5e_xsk_map_umem(priv, umem); + err = mlx5e_xsk_map_pool(priv, pool); if (unlikely(err)) return err; - err = mlx5e_xsk_add_umem(&priv->xsk, umem, ix); + err = mlx5e_xsk_add_pool(&priv->xsk, pool, ix); if (unlikely(err)) - goto err_unmap_umem; + goto err_unmap_pool; - mlx5e_build_xsk_param(umem, &xsk); + mlx5e_build_xsk_param(pool, &xsk); if (!test_bit(MLX5E_STATE_OPENED, &priv->state)) { /* XSK objects will be created on open. */ @@ -112,9 +112,9 @@ static int mlx5e_xsk_enable_locked(struct mlx5e_priv *priv, c = priv->channels.c[ix]; - err = mlx5e_open_xsk(priv, params, &xsk, umem, c); + err = mlx5e_open_xsk(priv, params, &xsk, pool, c); if (unlikely(err)) - goto err_remove_umem; + goto err_remove_pool; mlx5e_activate_xsk(c); @@ -132,11 +132,11 @@ static int mlx5e_xsk_enable_locked(struct mlx5e_priv *priv, mlx5e_deactivate_xsk(c); mlx5e_close_xsk(c); -err_remove_umem: - mlx5e_xsk_remove_umem(&priv->xsk, ix); +err_remove_pool: + mlx5e_xsk_remove_pool(&priv->xsk, ix); -err_unmap_umem: - mlx5e_xsk_unmap_umem(priv, umem); +err_unmap_pool: + mlx5e_xsk_unmap_pool(priv, pool); return err; @@ -146,7 +146,7 @@ static int mlx5e_xsk_enable_locked(struct mlx5e_priv *priv, */ if (!mlx5e_validate_xsk_param(params, &xsk, priv->mdev)) { err = -EINVAL; - goto err_remove_umem; + goto err_remove_pool; } return 0; @@ -154,45 +154,45 @@ static int mlx5e_xsk_enable_locked(struct mlx5e_priv *priv, static int mlx5e_xsk_disable_locked(struct mlx5e_priv *priv, u16 ix) { - struct xdp_umem *umem = mlx5e_xsk_get_umem(&priv->channels.params, + struct xsk_buff_pool *pool = mlx5e_xsk_get_pool(&priv->channels.params, &priv->xsk, ix); struct mlx5e_channel *c; - if (unlikely(!umem)) + if (unlikely(!pool)) return -EINVAL; if (!test_bit(MLX5E_STATE_OPENED, &priv->state)) - goto remove_umem; + goto remove_pool; /* XSK RQ and SQ are only created if XDP program is set. */ if (!priv->channels.params.xdp_prog) - goto remove_umem; + goto remove_pool; c = priv->channels.c[ix]; mlx5e_xsk_redirect_rqt_to_drop(priv, ix); mlx5e_deactivate_xsk(c); mlx5e_close_xsk(c); -remove_umem: - mlx5e_xsk_remove_umem(&priv->xsk, ix); - mlx5e_xsk_unmap_umem(priv, umem); +remove_pool: + mlx5e_xsk_remove_pool(&priv->xsk, ix); + mlx5e_xsk_unmap_pool(priv, pool); return 0; } -static int mlx5e_xsk_enable_umem(struct mlx5e_priv *priv, struct xdp_umem *umem, +static int mlx5e_xsk_enable_pool(struct mlx5e_priv *priv, struct xsk_buff_pool *pool, u16 ix) { int err; mutex_lock(&priv->state_lock); - err = mlx5e_xsk_enable_locked(priv, umem, ix); + err = mlx5e_xsk_enable_locked(priv, pool, ix); mutex_unlock(&priv->state_lock); return err; } -static int mlx5e_xsk_disable_umem(struct mlx5e_priv *priv, u16 ix) +static int mlx5e_xsk_disable_pool(struct mlx5e_priv *priv, u16 ix) { int err; @@ -203,7 +203,7 @@ static int mlx5e_xsk_disable_umem(struct mlx5e_priv *priv, u16 ix) return err; } -int mlx5e_xsk_setup_umem(struct net_device *dev, struct xdp_umem *umem, u16 qid) +int mlx5e_xsk_setup_pool(struct net_device *dev, struct xsk_buff_pool *pool, u16 qid) { struct mlx5e_priv *priv = netdev_priv(dev); struct mlx5e_params *params = &priv->channels.params; @@ -212,8 +212,8 @@ int mlx5e_xsk_setup_umem(struct net_device *dev, struct xdp_umem *umem, u16 qid) if (unlikely(!mlx5e_qid_get_ch_if_in_group(params, qid, MLX5E_RQ_GROUP_XSK, &ix))) return -EINVAL; - return umem ? mlx5e_xsk_enable_umem(priv, umem, ix) : - mlx5e_xsk_disable_umem(priv, ix); + return pool ? mlx5e_xsk_enable_pool(priv, pool, ix) : + mlx5e_xsk_disable_pool(priv, ix); } u16 mlx5e_xsk_first_unused_channel(struct mlx5e_params *params, struct mlx5e_xsk *xsk) @@ -221,7 +221,7 @@ u16 mlx5e_xsk_first_unused_channel(struct mlx5e_params *params, struct mlx5e_xsk u16 res = xsk->refcnt ? params->num_channels : 0; while (res) { - if (mlx5e_xsk_get_umem(params, xsk, res - 1)) + if (mlx5e_xsk_get_pool(params, xsk, res - 1)) break; --res; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.h index 25b4cbe..629db33 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.h @@ -6,25 +6,25 @@ #include "en.h" -static inline struct xdp_umem *mlx5e_xsk_get_umem(struct mlx5e_params *params, - struct mlx5e_xsk *xsk, u16 ix) +static inline struct xsk_buff_pool *mlx5e_xsk_get_pool(struct mlx5e_params *params, + struct mlx5e_xsk *xsk, u16 ix) { - if (!xsk || !xsk->umems) + if (!xsk || !xsk->pools) return NULL; if (unlikely(ix >= params->num_channels)) return NULL; - return xsk->umems[ix]; + return xsk->pools[ix]; } struct mlx5e_xsk_param; -void mlx5e_build_xsk_param(struct xdp_umem *umem, struct mlx5e_xsk_param *xsk); +void mlx5e_build_xsk_param(struct xsk_buff_pool *pool, struct mlx5e_xsk_param *xsk); /* .ndo_bpf callback. */ -int mlx5e_xsk_setup_umem(struct net_device *dev, struct xdp_umem *umem, u16 qid); +int mlx5e_xsk_setup_pool(struct net_device *dev, struct xsk_buff_pool *pool, u16 qid); -int mlx5e_xsk_resize_reuseq(struct xdp_umem *umem, u32 nentries); +int mlx5e_xsk_resize_reuseq(struct xsk_buff_pool *pool, u32 nentries); u16 mlx5e_xsk_first_unused_channel(struct mlx5e_params *params, struct mlx5e_xsk *xsk); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index a836a02..2b4a3e3 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -365,7 +365,7 @@ static void mlx5e_rq_err_cqe_work(struct work_struct *recover_work) static int mlx5e_alloc_rq(struct mlx5e_channel *c, struct mlx5e_params *params, struct mlx5e_xsk_param *xsk, - struct xdp_umem *umem, + struct xsk_buff_pool *pool, struct mlx5e_rq_param *rqp, struct mlx5e_rq *rq) { @@ -391,9 +391,9 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c, rq->mdev = mdev; rq->hw_mtu = MLX5E_SW2HW_MTU(params, params->sw_mtu); rq->xdpsq = &c->rq_xdpsq; - rq->umem = umem; + rq->xsk_pool = pool; - if (rq->umem) + if (rq->xsk_pool) rq->stats = &c->priv->channel_stats[c->ix].xskrq; else rq->stats = &c->priv->channel_stats[c->ix].rq; @@ -518,7 +518,7 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c, if (xsk) { err = xdp_rxq_info_reg_mem_model(&rq->xdp_rxq, MEM_TYPE_XSK_BUFF_POOL, NULL); - xsk_buff_set_rxq_info(rq->umem, &rq->xdp_rxq); + xsk_buff_set_rxq_info(rq->xsk_pool->umem, &rq->xdp_rxq); } else { /* Create a page_pool and register it with rxq */ pp_params.order = 0; @@ -857,11 +857,11 @@ void mlx5e_free_rx_descs(struct mlx5e_rq *rq) int mlx5e_open_rq(struct mlx5e_channel *c, struct mlx5e_params *params, struct mlx5e_rq_param *param, struct mlx5e_xsk_param *xsk, - struct xdp_umem *umem, struct mlx5e_rq *rq) + struct xsk_buff_pool *pool, struct mlx5e_rq *rq) { int err; - err = mlx5e_alloc_rq(c, params, xsk, umem, param, rq); + err = mlx5e_alloc_rq(c, params, xsk, pool, param, rq); if (err) return err; @@ -963,7 +963,7 @@ static int mlx5e_alloc_xdpsq_db(struct mlx5e_xdpsq *sq, int numa) static int mlx5e_alloc_xdpsq(struct mlx5e_channel *c, struct mlx5e_params *params, - struct xdp_umem *umem, + struct xsk_buff_pool *pool, struct mlx5e_sq_param *param, struct mlx5e_xdpsq *sq, bool is_redirect) @@ -979,9 +979,9 @@ static int mlx5e_alloc_xdpsq(struct mlx5e_channel *c, sq->uar_map = mdev->mlx5e_res.bfreg.map; sq->min_inline_mode = params->tx_min_inline_mode; sq->hw_mtu = MLX5E_SW2HW_MTU(params, params->sw_mtu); - sq->umem = umem; + sq->pool = pool; - sq->stats = sq->umem ? + sq->stats = sq->pool ? &c->priv->channel_stats[c->ix].xsksq : is_redirect ? &c->priv->channel_stats[c->ix].xdpsq : @@ -1445,13 +1445,13 @@ void mlx5e_close_icosq(struct mlx5e_icosq *sq) } int mlx5e_open_xdpsq(struct mlx5e_channel *c, struct mlx5e_params *params, - struct mlx5e_sq_param *param, struct xdp_umem *umem, + struct mlx5e_sq_param *param, struct xsk_buff_pool *pool, struct mlx5e_xdpsq *sq, bool is_redirect) { struct mlx5e_create_sq_param csp = {}; int err; - err = mlx5e_alloc_xdpsq(c, params, umem, param, sq, is_redirect); + err = mlx5e_alloc_xdpsq(c, params, pool, param, sq, is_redirect); if (err) return err; @@ -1927,7 +1927,7 @@ static u8 mlx5e_enumerate_lag_port(struct mlx5_core_dev *mdev, int ix) static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix, struct mlx5e_params *params, struct mlx5e_channel_param *cparam, - struct xdp_umem *umem, + struct xsk_buff_pool *pool, struct mlx5e_channel **cp) { int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix)); @@ -1966,9 +1966,9 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix, if (unlikely(err)) goto err_napi_del; - if (umem) { - mlx5e_build_xsk_param(umem, &xsk); - err = mlx5e_open_xsk(priv, params, &xsk, umem, c); + if (pool) { + mlx5e_build_xsk_param(pool, &xsk); + err = mlx5e_open_xsk(priv, params, &xsk, pool, c); if (unlikely(err)) goto err_close_queues; } @@ -2316,12 +2316,12 @@ int mlx5e_open_channels(struct mlx5e_priv *priv, mlx5e_build_channel_param(priv, &chs->params, cparam); for (i = 0; i < chs->num; i++) { - struct xdp_umem *umem = NULL; + struct xsk_buff_pool *pool = NULL; if (chs->params.xdp_prog) - umem = mlx5e_xsk_get_umem(&chs->params, chs->params.xsk, i); + pool = mlx5e_xsk_get_pool(&chs->params, chs->params.xsk, i); - err = mlx5e_open_channel(priv, i, &chs->params, cparam, umem, &chs->c[i]); + err = mlx5e_open_channel(priv, i, &chs->params, cparam, pool, &chs->c[i]); if (err) goto err_close_channels; } @@ -3882,13 +3882,13 @@ static bool mlx5e_xsk_validate_mtu(struct net_device *netdev, u16 ix; for (ix = 0; ix < chs->params.num_channels; ix++) { - struct xdp_umem *umem = mlx5e_xsk_get_umem(&chs->params, chs->params.xsk, ix); + struct xsk_buff_pool *pool = mlx5e_xsk_get_pool(&chs->params, chs->params.xsk, ix); struct mlx5e_xsk_param xsk; - if (!umem) + if (!pool) continue; - mlx5e_build_xsk_param(umem, &xsk); + mlx5e_build_xsk_param(pool, &xsk); if (!mlx5e_validate_xsk_param(new_params, &xsk, mdev)) { u32 hr = mlx5e_get_linear_rq_headroom(new_params, &xsk); @@ -4518,8 +4518,8 @@ static int mlx5e_xdp(struct net_device *dev, struct netdev_bpf *xdp) case XDP_QUERY_PROG: xdp->prog_id = mlx5e_xdp_query(dev); return 0; - case XDP_SETUP_XSK_UMEM: - return mlx5e_xsk_setup_umem(dev, xdp->xsk.umem, + case XDP_SETUP_XSK_POOL: + return mlx5e_xsk_setup_pool(dev, xdp->xsk.pool, xdp->xsk.queue_id); default: return -EINVAL; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c index dbb1c63..1dcf77d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c @@ -264,8 +264,8 @@ static inline int mlx5e_page_alloc_pool(struct mlx5e_rq *rq, static inline int mlx5e_page_alloc(struct mlx5e_rq *rq, struct mlx5e_dma_info *dma_info) { - if (rq->umem) - return mlx5e_xsk_page_alloc_umem(rq, dma_info); + if (rq->xsk_pool) + return mlx5e_xsk_page_alloc_pool(rq, dma_info); else return mlx5e_page_alloc_pool(rq, dma_info); } @@ -296,7 +296,7 @@ static inline void mlx5e_page_release(struct mlx5e_rq *rq, struct mlx5e_dma_info *dma_info, bool recycle) { - if (rq->umem) + if (rq->xsk_pool) /* The `recycle` parameter is ignored, and the page is always * put into the Reuse Ring, because there is no way to return * the page to the userspace when the interface goes down. @@ -383,14 +383,14 @@ static int mlx5e_alloc_rx_wqes(struct mlx5e_rq *rq, u16 ix, u8 wqe_bulk) int err; int i; - if (rq->umem) { + if (rq->xsk_pool) { int pages_desired = wqe_bulk << rq->wqe.info.log_num_frags; /* Check in advance that we have enough frames, instead of * allocating one-by-one, failing and moving frames to the * Reuse Ring. */ - if (unlikely(!xsk_buff_can_alloc(rq->umem, pages_desired))) + if (unlikely(!xsk_buff_can_alloc(rq->xsk_pool->umem, pages_desired))) return -ENOMEM; } @@ -488,8 +488,8 @@ static int mlx5e_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix) /* Check in advance that we have enough frames, instead of allocating * one-by-one, failing and moving frames to the Reuse Ring. */ - if (rq->umem && - unlikely(!xsk_buff_can_alloc(rq->umem, MLX5_MPWRQ_PAGES_PER_WQE))) { + if (rq->xsk_pool && + unlikely(!xsk_buff_can_alloc(rq->xsk_pool->umem, MLX5_MPWRQ_PAGES_PER_WQE))) { err = -ENOMEM; goto err; } @@ -700,7 +700,7 @@ bool mlx5e_post_rx_mpwqes(struct mlx5e_rq *rq) * the driver when it refills the Fill Ring. * 2. Otherwise, busy poll by rescheduling the NAPI poll. */ - if (unlikely(alloc_err == -ENOMEM && rq->umem)) + if (unlikely(alloc_err == -ENOMEM && rq->xsk_pool)) return true; return false; diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 6fc613e..e5acc3b 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -616,7 +616,7 @@ struct netdev_queue { /* Subordinate device that the queue has been assigned to */ struct net_device *sb_dev; #ifdef CONFIG_XDP_SOCKETS - struct xdp_umem *umem; + struct xsk_buff_pool *pool; #endif /* * write-mostly part @@ -749,7 +749,7 @@ struct netdev_rx_queue { struct net_device *dev; struct xdp_rxq_info xdp_rxq; #ifdef CONFIG_XDP_SOCKETS - struct xdp_umem *umem; + struct xsk_buff_pool *pool; #endif } ____cacheline_aligned_in_smp; @@ -879,7 +879,7 @@ enum bpf_netdev_command { /* BPF program for offload callbacks, invoked at program load time. */ BPF_OFFLOAD_MAP_ALLOC, BPF_OFFLOAD_MAP_FREE, - XDP_SETUP_XSK_UMEM, + XDP_SETUP_XSK_POOL, }; struct bpf_prog_offload_ops; @@ -906,9 +906,9 @@ struct netdev_bpf { struct { struct bpf_offloaded_map *offmap; }; - /* XDP_SETUP_XSK_UMEM */ + /* XDP_SETUP_XSK_POOL */ struct { - struct xdp_umem *umem; + struct xsk_buff_pool *pool; u16 queue_id; } xsk; }; diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h index ccf848f..5dc8d3c 100644 --- a/include/net/xdp_sock_drv.h +++ b/include/net/xdp_sock_drv.h @@ -14,7 +14,8 @@ void xsk_umem_complete_tx(struct xdp_umem *umem, u32 nb_entries); bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc); void xsk_umem_consume_tx_done(struct xdp_umem *umem); -struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev, u16 queue_id); +struct xsk_buff_pool *xdp_get_xsk_pool_from_qid(struct net_device *dev, + u16 queue_id); void xsk_set_rx_need_wakeup(struct xdp_umem *umem); void xsk_set_tx_need_wakeup(struct xdp_umem *umem); void xsk_clear_rx_need_wakeup(struct xdp_umem *umem); @@ -125,8 +126,8 @@ static inline void xsk_umem_consume_tx_done(struct xdp_umem *umem) { } -static inline struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev, - u16 queue_id) +static inline struct xsk_buff_pool * +xdp_get_xsk_pool_from_qid(struct net_device *dev, u16 queue_id) { return NULL; } diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h index a4ff226..a6dec9c 100644 --- a/include/net/xsk_buff_pool.h +++ b/include/net/xsk_buff_pool.h @@ -13,6 +13,7 @@ struct xsk_buff_pool; struct xdp_rxq_info; struct xsk_queue; struct xdp_desc; +struct xdp_umem; struct device; struct page; @@ -42,13 +43,14 @@ struct xsk_buff_pool { u32 frame_len; bool cheap_dma; bool unaligned; + struct xdp_umem *umem; void *addrs; struct device *dev; struct xdp_buff_xsk *free_heads[]; }; /* AF_XDP core. */ -struct xsk_buff_pool *xp_create(struct page **pages, u32 nr_pages, u32 chunks, +struct xsk_buff_pool *xp_create(struct xdp_umem *umem, u32 chunks, u32 chunk_size, u32 headroom, u64 size, bool unaligned); void xp_set_fq(struct xsk_buff_pool *pool, struct xsk_queue *fq); diff --git a/net/ethtool/channels.c b/net/ethtool/channels.c index 9ef54cd..78d990b 100644 --- a/net/ethtool/channels.c +++ b/net/ethtool/channels.c @@ -223,7 +223,7 @@ int ethnl_set_channels(struct sk_buff *skb, struct genl_info *info) from_channel = channels.combined_count + min(channels.rx_count, channels.tx_count); for (i = from_channel; i < old_total; i++) - if (xdp_get_umem_from_qid(dev, i)) { + if (xdp_get_xsk_pool_from_qid(dev, i)) { GENL_SET_ERR_MSG(info, "requested channel counts are too low for existing zerocopy AF_XDP sockets"); return -EINVAL; } diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c index b5df90c..91de16d 100644 --- a/net/ethtool/ioctl.c +++ b/net/ethtool/ioctl.c @@ -1702,7 +1702,7 @@ static noinline_for_stack int ethtool_set_channels(struct net_device *dev, min(channels.rx_count, channels.tx_count); to_channel = curr.combined_count + max(curr.rx_count, curr.tx_count); for (i = from_channel; i < to_channel; i++) - if (xdp_get_umem_from_qid(dev, i)) + if (xdp_get_xsk_pool_from_qid(dev, i)) return -EINVAL; ret = dev->ethtool_ops->set_channels(dev, &channels); diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c index e97db37..0b5f3b0 100644 --- a/net/xdp/xdp_umem.c +++ b/net/xdp/xdp_umem.c @@ -51,8 +51,9 @@ void xdp_del_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs) * not know if the device has more tx queues than rx, or the opposite. * This might also change during run time. */ -static int xdp_reg_umem_at_qid(struct net_device *dev, struct xdp_umem *umem, - u16 queue_id) +static int xdp_reg_xsk_pool_at_qid(struct net_device *dev, + struct xsk_buff_pool *pool, + u16 queue_id) { if (queue_id >= max_t(unsigned int, dev->real_num_rx_queues, @@ -60,31 +61,31 @@ static int xdp_reg_umem_at_qid(struct net_device *dev, struct xdp_umem *umem, return -EINVAL; if (queue_id < dev->real_num_rx_queues) - dev->_rx[queue_id].umem = umem; + dev->_rx[queue_id].pool = pool; if (queue_id < dev->real_num_tx_queues) - dev->_tx[queue_id].umem = umem; + dev->_tx[queue_id].pool = pool; return 0; } -struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev, - u16 queue_id) +struct xsk_buff_pool *xdp_get_xsk_pool_from_qid(struct net_device *dev, + u16 queue_id) { if (queue_id < dev->real_num_rx_queues) - return dev->_rx[queue_id].umem; + return dev->_rx[queue_id].pool; if (queue_id < dev->real_num_tx_queues) - return dev->_tx[queue_id].umem; + return dev->_tx[queue_id].pool; return NULL; } -EXPORT_SYMBOL(xdp_get_umem_from_qid); +EXPORT_SYMBOL(xdp_get_xsk_pool_from_qid); -static void xdp_clear_umem_at_qid(struct net_device *dev, u16 queue_id) +static void xdp_clear_xsk_pool_at_qid(struct net_device *dev, u16 queue_id) { if (queue_id < dev->real_num_rx_queues) - dev->_rx[queue_id].umem = NULL; + dev->_rx[queue_id].pool = NULL; if (queue_id < dev->real_num_tx_queues) - dev->_tx[queue_id].umem = NULL; + dev->_tx[queue_id].pool = NULL; } int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, @@ -102,10 +103,10 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, if (force_zc && force_copy) return -EINVAL; - if (xdp_get_umem_from_qid(dev, queue_id)) + if (xdp_get_xsk_pool_from_qid(dev, queue_id)) return -EBUSY; - err = xdp_reg_umem_at_qid(dev, umem, queue_id); + err = xdp_reg_xsk_pool_at_qid(dev, umem->pool, queue_id); if (err) return err; @@ -132,8 +133,8 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, goto err_unreg_umem; } - bpf.command = XDP_SETUP_XSK_UMEM; - bpf.xsk.umem = umem; + bpf.command = XDP_SETUP_XSK_POOL; + bpf.xsk.pool = umem->pool; bpf.xsk.queue_id = queue_id; err = dev->netdev_ops->ndo_bpf(dev, &bpf); @@ -147,7 +148,7 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, if (!force_zc) err = 0; /* fallback to copy mode */ if (err) - xdp_clear_umem_at_qid(dev, queue_id); + xdp_clear_xsk_pool_at_qid(dev, queue_id); return err; } @@ -162,8 +163,8 @@ void xdp_umem_clear_dev(struct xdp_umem *umem) return; if (umem->zc) { - bpf.command = XDP_SETUP_XSK_UMEM; - bpf.xsk.umem = NULL; + bpf.command = XDP_SETUP_XSK_POOL; + bpf.xsk.pool = NULL; bpf.xsk.queue_id = umem->queue_id; err = umem->dev->netdev_ops->ndo_bpf(umem->dev, &bpf); @@ -172,7 +173,7 @@ void xdp_umem_clear_dev(struct xdp_umem *umem) WARN(1, "failed to disable umem!\n"); } - xdp_clear_umem_at_qid(umem->dev, umem->queue_id); + xdp_clear_xsk_pool_at_qid(umem->dev, umem->queue_id); dev_put(umem->dev); umem->dev = NULL; @@ -373,8 +374,8 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr) if (err) goto out_account; - umem->pool = xp_create(umem->pgs, umem->npgs, chunks, chunk_size, - headroom, size, unaligned_chunks); + umem->pool = xp_create(umem, chunks, chunk_size, headroom, size, + unaligned_chunks); if (!umem->pool) { err = -ENOMEM; goto out_pin; diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c index 540ed75..c57f0bb 100644 --- a/net/xdp/xsk_buff_pool.c +++ b/net/xdp/xsk_buff_pool.c @@ -32,7 +32,7 @@ void xp_destroy(struct xsk_buff_pool *pool) kvfree(pool); } -struct xsk_buff_pool *xp_create(struct page **pages, u32 nr_pages, u32 chunks, +struct xsk_buff_pool *xp_create(struct xdp_umem *umem, u32 chunks, u32 chunk_size, u32 headroom, u64 size, bool unaligned) { @@ -58,6 +58,7 @@ struct xsk_buff_pool *xp_create(struct page **pages, u32 nr_pages, u32 chunks, pool->cheap_dma = true; pool->unaligned = unaligned; pool->frame_len = chunk_size - headroom - XDP_PACKET_HEADROOM; + pool->umem = umem; INIT_LIST_HEAD(&pool->free_list); for (i = 0; i < pool->free_heads_cnt; i++) { @@ -67,7 +68,7 @@ struct xsk_buff_pool *xp_create(struct page **pages, u32 nr_pages, u32 chunks, pool->free_heads[i] = xskb; } - err = xp_addr_map(pool, pages, nr_pages); + err = xp_addr_map(pool, umem->pgs, umem->npgs); if (!err) return pool; From patchwork Thu Jul 2 12:19:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Magnus Karlsson X-Patchwork-Id: 1321358 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: incoming-bpf@patchwork.ozlabs.org Delivered-To: patchwork-incoming-bpf@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=bpf-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49yHGG2BBKz9sR4 for ; Thu, 2 Jul 2020 22:19:34 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729004AbgGBMTd (ORCPT ); Thu, 2 Jul 2020 08:19:33 -0400 Received: from mga12.intel.com ([192.55.52.136]:6897 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729001AbgGBMTd (ORCPT ); Thu, 2 Jul 2020 08:19:33 -0400 IronPort-SDR: xc0jTu/HYBKdWgp6S3J3cN5T+7Zwr7wK7rAPYTpwUN8YI164Q7MSFmRUF/Chg9ECUyJRjll7pG b+mYH8t+bycw== X-IronPort-AV: E=McAfee;i="6000,8403,9669"; a="126486070" X-IronPort-AV: E=Sophos;i="5.75,304,1589266800"; d="scan'208";a="126486070" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Jul 2020 05:19:32 -0700 IronPort-SDR: mvpHQD814TPHli5Z6wQ7JZKqaKPmcaGDPxfaGJdTDtzua1W7h2MTWyzMQtRfYgmBpb5FHt/XzC HpQ6sfoOYkTA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,304,1589266800"; d="scan'208";a="425933279" Received: from mkarlsso-mobl.ger.corp.intel.com (HELO localhost.localdomain) ([10.252.39.242]) by orsmga004.jf.intel.com with ESMTP; 02 Jul 2020 05:19:27 -0700 From: Magnus Karlsson To: magnus.karlsson@intel.com, bjorn.topel@intel.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org, jonathan.lemon@gmail.com, maximmi@mellanox.com Cc: bpf@vger.kernel.org, jeffrey.t.kirsher@intel.com, maciej.fijalkowski@intel.com, maciejromanfijalkowski@gmail.com, cristian.dumitrescu@intel.com Subject: [PATCH bpf-next 02/14] xsk: i40e: ice: ixgbe: mlx5: rename xsk zero-copy driver interfaces Date: Thu, 2 Jul 2020 14:19:01 +0200 Message-Id: <1593692353-15102-3-git-send-email-magnus.karlsson@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1593692353-15102-1-git-send-email-magnus.karlsson@intel.com> References: <1593692353-15102-1-git-send-email-magnus.karlsson@intel.com> Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Rename the AF_XDP zero-copy driver interface functions to better reflect what they do after the replacement of umems with buffer pools in the previous commit. Mostly it is about replacing the umem name from the function names with xsk_buff and also have them take the a buffer pool pointer instead of a umem. The various ring functions have also been renamed in the process so that they have the same naming convention as the internal functions in xsk_queue.h. This so that it will be clearer what they do and also for consistency. Signed-off-by: Magnus Karlsson --- drivers/net/ethernet/intel/i40e/i40e_main.c | 6 +- drivers/net/ethernet/intel/i40e/i40e_xsk.c | 34 +++--- drivers/net/ethernet/intel/ice/ice_base.c | 6 +- drivers/net/ethernet/intel/ice/ice_xsk.c | 34 +++--- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 6 +- drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 32 +++--- drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c | 4 +- .../net/ethernet/mellanox/mlx5/core/en/xsk/rx.h | 8 +- .../net/ethernet/mellanox/mlx5/core/en/xsk/tx.c | 10 +- .../net/ethernet/mellanox/mlx5/core/en/xsk/tx.h | 6 +- .../net/ethernet/mellanox/mlx5/core/en/xsk/umem.c | 12 +-- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 4 +- include/net/xdp_sock.h | 1 + include/net/xdp_sock_drv.h | 114 +++++++++++---------- net/ethtool/channels.c | 2 +- net/ethtool/ioctl.c | 2 +- net/xdp/xdp_umem.c | 24 ++--- net/xdp/xsk.c | 45 ++++---- 19 files changed, 182 insertions(+), 170 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 3df725e..73dded7 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -3119,7 +3119,7 @@ static struct xsk_buff_pool *i40e_xsk_pool(struct i40e_ring *ring) if (!xdp_on || !test_bit(qid, ring->vsi->af_xdp_zc_qps)) return NULL; - return xdp_get_xsk_pool_from_qid(ring->vsi->netdev, qid); + return xsk_get_pool_from_qid(ring->vsi->netdev, qid); } /** @@ -3267,7 +3267,7 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring) if (ret) return ret; ring->rx_buf_len = - xsk_umem_get_rx_frame_size(ring->xsk_pool->umem); + xsk_pool_get_rx_frame_size(ring->xsk_pool); /* For AF_XDP ZC, we disallow packets to span on * multiple buffers, thus letting us skip that * handling in the fast-path. @@ -3351,7 +3351,7 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring) writel(0, ring->tail); if (ring->xsk_pool) { - xsk_buff_set_rxq_info(ring->xsk_pool->umem, &ring->xdp_rxq); + xsk_pool_set_rxq_info(ring->xsk_pool, &ring->xdp_rxq); ok = i40e_alloc_rx_buffers_zc(ring, I40E_DESC_UNUSED(ring)); } else { ok = !i40e_alloc_rx_buffers(ring, I40E_DESC_UNUSED(ring)); diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c index d7ebdf6..ebaf0bd 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c @@ -55,8 +55,7 @@ static int i40e_xsk_pool_enable(struct i40e_vsi *vsi, qid >= netdev->real_num_tx_queues) return -EINVAL; - err = xsk_buff_dma_map(pool->umem, &vsi->back->pdev->dev, - I40E_RX_DMA_ATTR); + err = xsk_pool_dma_map(pool, &vsi->back->pdev->dev, I40E_RX_DMA_ATTR); if (err) return err; @@ -97,7 +96,7 @@ static int i40e_xsk_pool_disable(struct i40e_vsi *vsi, u16 qid) bool if_running; int err; - pool = xdp_get_xsk_pool_from_qid(netdev, qid); + pool = xsk_get_pool_from_qid(netdev, qid); if (!pool) return -EINVAL; @@ -110,7 +109,7 @@ static int i40e_xsk_pool_disable(struct i40e_vsi *vsi, u16 qid) } clear_bit(qid, vsi->af_xdp_zc_qps); - xsk_buff_dma_unmap(pool->umem, I40E_RX_DMA_ATTR); + xsk_pool_dma_unmap(pool, I40E_RX_DMA_ATTR); if (if_running) { err = i40e_queue_pair_enable(vsi, qid); @@ -196,7 +195,7 @@ bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 count) rx_desc = I40E_RX_DESC(rx_ring, ntu); bi = i40e_rx_bi(rx_ring, ntu); do { - xdp = xsk_buff_alloc(rx_ring->xsk_pool->umem); + xdp = xsk_buff_alloc(rx_ring->xsk_pool); if (!xdp) { ok = false; goto no_buffers; @@ -363,11 +362,11 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget) i40e_finalize_xdp_rx(rx_ring, xdp_xmit); i40e_update_rx_stats(rx_ring, total_rx_bytes, total_rx_packets); - if (xsk_umem_uses_need_wakeup(rx_ring->xsk_pool->umem)) { + if (xsk_uses_need_wakeup(rx_ring->xsk_pool)) { if (failure || rx_ring->next_to_clean == rx_ring->next_to_use) - xsk_set_rx_need_wakeup(rx_ring->xsk_pool->umem); + xsk_set_rx_need_wakeup(rx_ring->xsk_pool); else - xsk_clear_rx_need_wakeup(rx_ring->xsk_pool->umem); + xsk_clear_rx_need_wakeup(rx_ring->xsk_pool); return (int)total_rx_packets; } @@ -396,12 +395,11 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget) break; } - if (!xsk_umem_consume_tx(xdp_ring->xsk_pool->umem, &desc)) + if (!xsk_tx_peek_desc(xdp_ring->xsk_pool, &desc)) break; - dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool->umem, - desc.addr); - xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool->umem, dma, + dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool, desc.addr); + xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma, desc.len); tx_bi = &xdp_ring->tx_bi[xdp_ring->next_to_use]; @@ -425,7 +423,7 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget) I40E_TXD_QW1_CMD_SHIFT); i40e_xdp_ring_update_tail(xdp_ring); - xsk_umem_consume_tx_done(xdp_ring->xsk_pool->umem); + xsk_tx_release(xdp_ring->xsk_pool); } return !!budget && work_done; @@ -498,14 +496,14 @@ bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, tx_ring->next_to_clean -= tx_ring->count; if (xsk_frames) - xsk_umem_complete_tx(bp->umem, xsk_frames); + xsk_tx_completed(bp, xsk_frames); i40e_arm_wb(tx_ring, vsi, budget); i40e_update_tx_stats(tx_ring, completed_frames, total_bytes); out_xmit: - if (xsk_umem_uses_need_wakeup(tx_ring->xsk_pool->umem)) - xsk_set_tx_need_wakeup(tx_ring->xsk_pool->umem); + if (xsk_uses_need_wakeup(tx_ring->xsk_pool)) + xsk_set_tx_need_wakeup(tx_ring->xsk_pool); xmit_done = i40e_xmit_zc(tx_ring, budget); @@ -598,7 +596,7 @@ void i40e_xsk_clean_tx_ring(struct i40e_ring *tx_ring) } if (xsk_frames) - xsk_umem_complete_tx(bp->umem, xsk_frames); + xsk_tx_completed(bp, xsk_frames); } /** @@ -614,7 +612,7 @@ bool i40e_xsk_any_rx_ring_enabled(struct i40e_vsi *vsi) int i; for (i = 0; i < vsi->num_queue_pairs; i++) { - if (xdp_get_xsk_pool_from_qid(netdev, i)) + if (xsk_get_pool_from_qid(netdev, i)) return true; } diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c index 94dbf89..16fbc79 100644 --- a/drivers/net/ethernet/intel/ice/ice_base.c +++ b/drivers/net/ethernet/intel/ice/ice_base.c @@ -313,7 +313,7 @@ int ice_setup_rx_ctx(struct ice_ring *ring) xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq); ring->rx_buf_len = - xsk_umem_get_rx_frame_size(ring->xsk_pool->umem); + xsk_pool_get_rx_frame_size(ring->xsk_pool); /* For AF_XDP ZC, we disallow packets to span on * multiple buffers, thus letting us skip that * handling in the fast-path. @@ -324,7 +324,7 @@ int ice_setup_rx_ctx(struct ice_ring *ring) NULL); if (err) return err; - xsk_buff_set_rxq_info(ring->xsk_pool->umem, &ring->xdp_rxq); + xsk_pool_set_rxq_info(ring->xsk_pool, &ring->xdp_rxq); dev_info(dev, "Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring %d\n", ring->q_index); @@ -418,7 +418,7 @@ int ice_setup_rx_ctx(struct ice_ring *ring) writel(0, ring->tail); if (ring->xsk_pool) { - if (!xsk_buff_can_alloc(ring->xsk_pool->umem, num_bufs)) { + if (!xsk_buff_can_alloc(ring->xsk_pool, num_bufs)) { dev_warn(dev, "XSK buffer pool does not provide enough addresses to fill %d buffers on Rx ring %d\n", num_bufs, ring->q_index); dev_warn(dev, "Change Rx ring/fill queue size to avoid performance issues\n"); diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index f0ce669..6430df2 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -310,7 +310,7 @@ static int ice_xsk_pool_disable(struct ice_vsi *vsi, u16 qid) !vsi->xsk_pools[qid]) return -EINVAL; - xsk_buff_dma_unmap(vsi->xsk_pools[qid]->umem, ICE_RX_DMA_ATTR); + xsk_pool_dma_unmap(vsi->xsk_pools[qid], ICE_RX_DMA_ATTR); ice_xsk_remove_pool(vsi, qid); return 0; @@ -347,7 +347,7 @@ ice_xsk_pool_enable(struct ice_vsi *vsi, struct xsk_buff_pool *pool, u16 qid) vsi->xsk_pools[qid] = pool; vsi->num_xsk_pools_used++; - err = xsk_buff_dma_map(vsi->xsk_pools[qid]->umem, ice_pf_to_dev(vsi->back), + err = xsk_pool_dma_map(vsi->xsk_pools[qid], ice_pf_to_dev(vsi->back), ICE_RX_DMA_ATTR); if (err) return err; @@ -424,7 +424,7 @@ bool ice_alloc_rx_bufs_zc(struct ice_ring *rx_ring, u16 count) rx_buf = &rx_ring->rx_buf[ntu]; do { - rx_buf->xdp = xsk_buff_alloc(rx_ring->xsk_pool->umem); + rx_buf->xdp = xsk_buff_alloc(rx_ring->xsk_pool); if (!rx_buf->xdp) { ret = true; break; @@ -645,11 +645,11 @@ int ice_clean_rx_irq_zc(struct ice_ring *rx_ring, int budget) ice_finalize_xdp_rx(rx_ring, xdp_xmit); ice_update_rx_ring_stats(rx_ring, total_rx_packets, total_rx_bytes); - if (xsk_umem_uses_need_wakeup(rx_ring->xsk_pool->umem)) { + if (xsk_uses_need_wakeup(rx_ring->xsk_pool)) { if (failure || rx_ring->next_to_clean == rx_ring->next_to_use) - xsk_set_rx_need_wakeup(rx_ring->xsk_pool->umem); + xsk_set_rx_need_wakeup(rx_ring->xsk_pool); else - xsk_clear_rx_need_wakeup(rx_ring->xsk_pool->umem); + xsk_clear_rx_need_wakeup(rx_ring->xsk_pool); return (int)total_rx_packets; } @@ -682,11 +682,11 @@ static bool ice_xmit_zc(struct ice_ring *xdp_ring, int budget) tx_buf = &xdp_ring->tx_buf[xdp_ring->next_to_use]; - if (!xsk_umem_consume_tx(xdp_ring->xsk_pool->umem, &desc)) + if (!xsk_tx_peek_desc(xdp_ring->xsk_pool, &desc)) break; - dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool->umem, desc.addr); - xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool->umem, dma, + dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool, desc.addr); + xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma, desc.len); tx_buf->bytecount = desc.len; @@ -703,9 +703,9 @@ static bool ice_xmit_zc(struct ice_ring *xdp_ring, int budget) if (tx_desc) { ice_xdp_ring_update_tail(xdp_ring); - xsk_umem_consume_tx_done(xdp_ring->xsk_pool->umem); - if (xsk_umem_uses_need_wakeup(xdp_ring->xsk_pool->umem)) - xsk_clear_tx_need_wakeup(xdp_ring->xsk_pool->umem); + xsk_tx_release(xdp_ring->xsk_pool); + if (xsk_uses_need_wakeup(xdp_ring->xsk_pool)) + xsk_clear_tx_need_wakeup(xdp_ring->xsk_pool); } return budget > 0 && work_done; @@ -779,13 +779,13 @@ bool ice_clean_tx_irq_zc(struct ice_ring *xdp_ring, int budget) xdp_ring->next_to_clean = ntc; if (xsk_frames) - xsk_umem_complete_tx(xdp_ring->xsk_pool->umem, xsk_frames); + xsk_tx_completed(xdp_ring->xsk_pool, xsk_frames); - if (xsk_umem_uses_need_wakeup(xdp_ring->xsk_pool->umem)) { + if (xsk_uses_need_wakeup(xdp_ring->xsk_pool)) { if (xdp_ring->next_to_clean == xdp_ring->next_to_use) - xsk_set_tx_need_wakeup(xdp_ring->xsk_pool->umem); + xsk_set_tx_need_wakeup(xdp_ring->xsk_pool); else - xsk_clear_tx_need_wakeup(xdp_ring->xsk_pool->umem); + xsk_clear_tx_need_wakeup(xdp_ring->xsk_pool); } ice_update_tx_ring_stats(xdp_ring, total_packets, total_bytes); @@ -902,5 +902,5 @@ void ice_xsk_clean_xdp_ring(struct ice_ring *xdp_ring) } if (xsk_frames) - xsk_umem_complete_tx(xdp_ring->xsk_pool->umem, xsk_frames); + xsk_tx_completed(xdp_ring->xsk_pool, xsk_frames); } diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index 3217000..5d1c786 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -3716,7 +3716,7 @@ static void ixgbe_configure_srrctl(struct ixgbe_adapter *adapter, /* configure the packet buffer length */ if (rx_ring->xsk_pool) { - u32 xsk_buf_len = xsk_umem_get_rx_frame_size(rx_ring->xsk_pool->umem); + u32 xsk_buf_len = xsk_pool_get_rx_frame_size(rx_ring->xsk_pool); /* If the MAC support setting RXDCTL.RLPML, the * SRRCTL[n].BSIZEPKT is set to PAGE_SIZE and @@ -4066,7 +4066,7 @@ void ixgbe_configure_rx_ring(struct ixgbe_adapter *adapter, WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq, MEM_TYPE_XSK_BUFF_POOL, NULL)); - xsk_buff_set_rxq_info(ring->xsk_pool->umem, &ring->xdp_rxq); + xsk_pool_set_rxq_info(ring->xsk_pool, &ring->xdp_rxq); } else { WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq, MEM_TYPE_PAGE_SHARED, NULL)); @@ -4122,7 +4122,7 @@ void ixgbe_configure_rx_ring(struct ixgbe_adapter *adapter, } if (ring->xsk_pool && hw->mac.type != ixgbe_mac_82599EB) { - u32 xsk_buf_len = xsk_umem_get_rx_frame_size(ring->xsk_pool->umem); + u32 xsk_buf_len = xsk_pool_get_rx_frame_size(ring->xsk_pool); rxdctl &= ~(IXGBE_RXDCTL_RLPMLMASK | IXGBE_RXDCTL_RLPML_EN); diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c index 9f503d6..f07cd41 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c @@ -17,7 +17,7 @@ struct xsk_buff_pool *ixgbe_xsk_pool(struct ixgbe_adapter *adapter, if (!xdp_on || !test_bit(qid, adapter->af_xdp_zc_qps)) return NULL; - return xdp_get_xsk_pool_from_qid(adapter->netdev, qid); + return xsk_get_pool_from_qid(adapter->netdev, qid); } static int ixgbe_xsk_pool_enable(struct ixgbe_adapter *adapter, @@ -35,7 +35,7 @@ static int ixgbe_xsk_pool_enable(struct ixgbe_adapter *adapter, qid >= netdev->real_num_tx_queues) return -EINVAL; - err = xsk_buff_dma_map(pool->umem, &adapter->pdev->dev, IXGBE_RX_DMA_ATTR); + err = xsk_pool_dma_map(pool, &adapter->pdev->dev, IXGBE_RX_DMA_ATTR); if (err) return err; @@ -64,7 +64,7 @@ static int ixgbe_xsk_pool_disable(struct ixgbe_adapter *adapter, u16 qid) struct xsk_buff_pool *pool; bool if_running; - pool = xdp_get_xsk_pool_from_qid(adapter->netdev, qid); + pool = xsk_get_pool_from_qid(adapter->netdev, qid); if (!pool) return -EINVAL; @@ -75,7 +75,7 @@ static int ixgbe_xsk_pool_disable(struct ixgbe_adapter *adapter, u16 qid) ixgbe_txrx_ring_disable(adapter, qid); clear_bit(qid, adapter->af_xdp_zc_qps); - xsk_buff_dma_unmap(pool->umem, IXGBE_RX_DMA_ATTR); + xsk_pool_dma_unmap(pool, IXGBE_RX_DMA_ATTR); if (if_running) ixgbe_txrx_ring_enable(adapter, qid); @@ -150,7 +150,7 @@ bool ixgbe_alloc_rx_buffers_zc(struct ixgbe_ring *rx_ring, u16 count) i -= rx_ring->count; do { - bi->xdp = xsk_buff_alloc(rx_ring->xsk_pool->umem); + bi->xdp = xsk_buff_alloc(rx_ring->xsk_pool); if (!bi->xdp) { ok = false; break; @@ -345,11 +345,11 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector, q_vector->rx.total_packets += total_rx_packets; q_vector->rx.total_bytes += total_rx_bytes; - if (xsk_umem_uses_need_wakeup(rx_ring->xsk_pool->umem)) { + if (xsk_uses_need_wakeup(rx_ring->xsk_pool)) { if (failure || rx_ring->next_to_clean == rx_ring->next_to_use) - xsk_set_rx_need_wakeup(rx_ring->xsk_pool->umem); + xsk_set_rx_need_wakeup(rx_ring->xsk_pool); else - xsk_clear_rx_need_wakeup(rx_ring->xsk_pool->umem); + xsk_clear_rx_need_wakeup(rx_ring->xsk_pool); return (int)total_rx_packets; } @@ -389,11 +389,11 @@ static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget) break; } - if (!xsk_umem_consume_tx(pool->umem, &desc)) + if (!xsk_tx_peek_desc(pool, &desc)) break; - dma = xsk_buff_raw_get_dma(pool->umem, desc.addr); - xsk_buff_raw_dma_sync_for_device(pool->umem, dma, desc.len); + dma = xsk_buff_raw_get_dma(pool, desc.addr); + xsk_buff_raw_dma_sync_for_device(pool, dma, desc.len); tx_bi = &xdp_ring->tx_buffer_info[xdp_ring->next_to_use]; tx_bi->bytecount = desc.len; @@ -419,7 +419,7 @@ static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget) if (tx_desc) { ixgbe_xdp_ring_update_tail(xdp_ring); - xsk_umem_consume_tx_done(pool->umem); + xsk_tx_release(pool); } return !!budget && work_done; @@ -485,10 +485,10 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector, q_vector->tx.total_packets += total_packets; if (xsk_frames) - xsk_umem_complete_tx(pool->umem, xsk_frames); + xsk_tx_completed(pool, xsk_frames); - if (xsk_umem_uses_need_wakeup(pool->umem)) - xsk_set_tx_need_wakeup(pool->umem); + if (xsk_uses_need_wakeup(pool)) + xsk_set_tx_need_wakeup(pool); return ixgbe_xmit_zc(tx_ring, q_vector->tx.work_limit); } @@ -547,5 +547,5 @@ void ixgbe_xsk_clean_tx_ring(struct ixgbe_ring *tx_ring) } if (xsk_frames) - xsk_umem_complete_tx(pool->umem, xsk_frames); + xsk_tx_completed(pool, xsk_frames); } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c index 0a5a873..d6c7596 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c @@ -446,7 +446,7 @@ bool mlx5e_poll_xdpsq_cq(struct mlx5e_cq *cq) } while ((++i < MLX5E_TX_CQ_POLL_BUDGET) && (cqe = mlx5_cqwq_get_cqe(&cq->wq))); if (xsk_frames) - xsk_umem_complete_tx(sq->pool->umem, xsk_frames); + xsk_tx_completed(sq->pool, xsk_frames); sq->stats->cqes += i; @@ -476,7 +476,7 @@ void mlx5e_free_xdpsq_descs(struct mlx5e_xdpsq *sq) } if (xsk_frames) - xsk_umem_complete_tx(sq->pool->umem, xsk_frames); + xsk_tx_completed(sq->pool, xsk_frames); } int mlx5e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames, diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h index 3dd056a..7f88ccf 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h @@ -22,7 +22,7 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_linear(struct mlx5e_rq *rq, static inline int mlx5e_xsk_page_alloc_pool(struct mlx5e_rq *rq, struct mlx5e_dma_info *dma_info) { - dma_info->xsk = xsk_buff_alloc(rq->xsk_pool->umem); + dma_info->xsk = xsk_buff_alloc(rq->xsk_pool); if (!dma_info->xsk) return -ENOMEM; @@ -38,13 +38,13 @@ static inline int mlx5e_xsk_page_alloc_pool(struct mlx5e_rq *rq, static inline bool mlx5e_xsk_update_rx_wakeup(struct mlx5e_rq *rq, bool alloc_err) { - if (!xsk_umem_uses_need_wakeup(rq->xsk_pool->umem)) + if (!xsk_uses_need_wakeup(rq->xsk_pool)) return alloc_err; if (unlikely(alloc_err)) - xsk_set_rx_need_wakeup(rq->xsk_pool->umem); + xsk_set_rx_need_wakeup(rq->xsk_pool); else - xsk_clear_rx_need_wakeup(rq->xsk_pool->umem); + xsk_clear_rx_need_wakeup(rq->xsk_pool); return false; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c index abe4639..debcc70 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c @@ -83,7 +83,7 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget) break; } - if (!xsk_umem_consume_tx(pool->umem, &desc)) { + if (!xsk_tx_peek_desc(pool, &desc)) { /* TX will get stuck until something wakes it up by * triggering NAPI. Currently it's expected that the * application calls sendto() if there are consumed, but @@ -92,11 +92,11 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget) break; } - xdptxd.dma_addr = xsk_buff_raw_get_dma(pool->umem, desc.addr); - xdptxd.data = xsk_buff_raw_get_data(pool->umem, desc.addr); + xdptxd.dma_addr = xsk_buff_raw_get_dma(pool, desc.addr); + xdptxd.data = xsk_buff_raw_get_data(pool, desc.addr); xdptxd.len = desc.len; - xsk_buff_raw_dma_sync_for_device(pool->umem, xdptxd.dma_addr, xdptxd.len); + xsk_buff_raw_dma_sync_for_device(pool, xdptxd.dma_addr, xdptxd.len); if (unlikely(!sq->xmit_xdp_frame(sq, &xdptxd, &xdpi, check_result))) { if (sq->mpwqe.wqe) @@ -113,7 +113,7 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget) mlx5e_xdp_mpwqe_complete(sq); mlx5e_xmit_xdp_doorbell(sq); - xsk_umem_consume_tx_done(pool->umem); + xsk_tx_release(pool); } return !(budget && work_done); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h index 610a084..5821e88 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h @@ -15,13 +15,13 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget); static inline void mlx5e_xsk_update_tx_wakeup(struct mlx5e_xdpsq *sq) { - if (!xsk_umem_uses_need_wakeup(sq->pool->umem)) + if (!xsk_uses_need_wakeup(sq->pool)) return; if (sq->pc != sq->cc) - xsk_clear_tx_need_wakeup(sq->pool->umem); + xsk_clear_tx_need_wakeup(sq->pool); else - xsk_set_tx_need_wakeup(sq->pool->umem); + xsk_set_tx_need_wakeup(sq->pool); } #endif /* __MLX5_EN_XSK_TX_H__ */ diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.c index 947abf1..cb70870 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.c @@ -11,13 +11,13 @@ static int mlx5e_xsk_map_pool(struct mlx5e_priv *priv, { struct device *dev = priv->mdev->device; - return xsk_buff_dma_map(pool->umem, dev, 0); + return xsk_pool_dma_map(pool, dev, 0); } static void mlx5e_xsk_unmap_pool(struct mlx5e_priv *priv, struct xsk_buff_pool *pool) { - return xsk_buff_dma_unmap(pool->umem, 0); + return xsk_pool_dma_unmap(pool, 0); } static int mlx5e_xsk_get_pools(struct mlx5e_xsk *xsk) @@ -64,14 +64,14 @@ static void mlx5e_xsk_remove_pool(struct mlx5e_xsk *xsk, u16 ix) static bool mlx5e_xsk_is_pool_sane(struct xsk_buff_pool *pool) { - return xsk_umem_get_headroom(pool->umem) <= 0xffff && - xsk_umem_get_chunk_size(pool->umem) <= 0xffff; + return xsk_pool_get_headroom(pool) <= 0xffff && + xsk_pool_get_chunk_size(pool) <= 0xffff; } void mlx5e_build_xsk_param(struct xsk_buff_pool *pool, struct mlx5e_xsk_param *xsk) { - xsk->headroom = xsk_umem_get_headroom(pool->umem); - xsk->chunk_size = xsk_umem_get_chunk_size(pool->umem); + xsk->headroom = xsk_pool_get_headroom(pool); + xsk->chunk_size = xsk_pool_get_chunk_size(pool); } static int mlx5e_xsk_enable_locked(struct mlx5e_priv *priv, diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 2b4a3e3..695b993 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -518,7 +518,7 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c, if (xsk) { err = xdp_rxq_info_reg_mem_model(&rq->xdp_rxq, MEM_TYPE_XSK_BUFF_POOL, NULL); - xsk_buff_set_rxq_info(rq->xsk_pool->umem, &rq->xdp_rxq); + xsk_pool_set_rxq_info(rq->xsk_pool, &rq->xdp_rxq); } else { /* Create a page_pool and register it with rxq */ pp_params.order = 0; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c index 1dcf77d..030f6d7 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c @@ -390,7 +390,7 @@ static int mlx5e_alloc_rx_wqes(struct mlx5e_rq *rq, u16 ix, u8 wqe_bulk) * allocating one-by-one, failing and moving frames to the * Reuse Ring. */ - if (unlikely(!xsk_buff_can_alloc(rq->xsk_pool->umem, pages_desired))) + if (unlikely(!xsk_buff_can_alloc(rq->xsk_pool, pages_desired))) return -ENOMEM; } @@ -489,7 +489,7 @@ static int mlx5e_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix) * one-by-one, failing and moving frames to the Reuse Ring. */ if (rq->xsk_pool && - unlikely(!xsk_buff_can_alloc(rq->xsk_pool->umem, MLX5_MPWRQ_PAGES_PER_WQE))) { + unlikely(!xsk_buff_can_alloc(rq->xsk_pool, MLX5_MPWRQ_PAGES_PER_WQE))) { err = -ENOMEM; goto err; } diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index 96bfc5f..6eb9628 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -52,6 +52,7 @@ struct xdp_sock { struct net_device *dev; struct xdp_umem *umem; struct list_head flush_node; + struct xsk_buff_pool *pool; u16 queue_id; bool zc; enum { diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h index 5dc8d3c..a7c7d2e 100644 --- a/include/net/xdp_sock_drv.h +++ b/include/net/xdp_sock_drv.h @@ -11,48 +11,50 @@ #ifdef CONFIG_XDP_SOCKETS -void xsk_umem_complete_tx(struct xdp_umem *umem, u32 nb_entries); -bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc); -void xsk_umem_consume_tx_done(struct xdp_umem *umem); -struct xsk_buff_pool *xdp_get_xsk_pool_from_qid(struct net_device *dev, - u16 queue_id); -void xsk_set_rx_need_wakeup(struct xdp_umem *umem); -void xsk_set_tx_need_wakeup(struct xdp_umem *umem); -void xsk_clear_rx_need_wakeup(struct xdp_umem *umem); -void xsk_clear_tx_need_wakeup(struct xdp_umem *umem); -bool xsk_umem_uses_need_wakeup(struct xdp_umem *umem); +void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries); +bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc); +void xsk_tx_release(struct xsk_buff_pool *pool); +struct xsk_buff_pool *xsk_get_pool_from_qid(struct net_device *dev, + u16 queue_id); +void xsk_set_rx_need_wakeup(struct xsk_buff_pool *pool); +void xsk_set_tx_need_wakeup(struct xsk_buff_pool *pool); +void xsk_clear_rx_need_wakeup(struct xsk_buff_pool *pool); +void xsk_clear_tx_need_wakeup(struct xsk_buff_pool *pool); +bool xsk_uses_need_wakeup(struct xsk_buff_pool *pool); -static inline u32 xsk_umem_get_headroom(struct xdp_umem *umem) +static inline u32 xsk_pool_get_headroom(struct xsk_buff_pool *pool) { - return XDP_PACKET_HEADROOM + umem->headroom; + return XDP_PACKET_HEADROOM + pool->headroom; } -static inline u32 xsk_umem_get_chunk_size(struct xdp_umem *umem) +static inline u32 xsk_pool_get_chunk_size(struct xsk_buff_pool *pool) { - return umem->chunk_size; + return pool->chunk_size; } -static inline u32 xsk_umem_get_rx_frame_size(struct xdp_umem *umem) +static inline u32 xsk_pool_get_rx_frame_size(struct xsk_buff_pool *pool) { - return xsk_umem_get_chunk_size(umem) - xsk_umem_get_headroom(umem); + return xsk_pool_get_chunk_size(pool) - xsk_pool_get_headroom(pool); } -static inline void xsk_buff_set_rxq_info(struct xdp_umem *umem, +static inline void xsk_pool_set_rxq_info(struct xsk_buff_pool *pool, struct xdp_rxq_info *rxq) { - xp_set_rxq_info(umem->pool, rxq); + xp_set_rxq_info(pool, rxq); } -static inline void xsk_buff_dma_unmap(struct xdp_umem *umem, +static inline void xsk_pool_dma_unmap(struct xsk_buff_pool *pool, unsigned long attrs) { - xp_dma_unmap(umem->pool, attrs); + xp_dma_unmap(pool, attrs); } -static inline int xsk_buff_dma_map(struct xdp_umem *umem, struct device *dev, - unsigned long attrs) +static inline int xsk_pool_dma_map(struct xsk_buff_pool *pool, + struct device *dev, unsigned long attrs) { - return xp_dma_map(umem->pool, dev, attrs, umem->pgs, umem->npgs); + struct xdp_umem *umem = pool->umem; + + return xp_dma_map(pool, dev, attrs, umem->pgs, umem->npgs); } static inline dma_addr_t xsk_buff_xdp_get_dma(struct xdp_buff *xdp) @@ -69,14 +71,14 @@ static inline dma_addr_t xsk_buff_xdp_get_frame_dma(struct xdp_buff *xdp) return xp_get_frame_dma(xskb); } -static inline struct xdp_buff *xsk_buff_alloc(struct xdp_umem *umem) +static inline struct xdp_buff *xsk_buff_alloc(struct xsk_buff_pool *pool) { - return xp_alloc(umem->pool); + return xp_alloc(pool); } -static inline bool xsk_buff_can_alloc(struct xdp_umem *umem, u32 count) +static inline bool xsk_buff_can_alloc(struct xsk_buff_pool *pool, u32 count) { - return xp_can_alloc(umem->pool, count); + return xp_can_alloc(pool, count); } static inline void xsk_buff_free(struct xdp_buff *xdp) @@ -86,14 +88,15 @@ static inline void xsk_buff_free(struct xdp_buff *xdp) xp_free(xskb); } -static inline dma_addr_t xsk_buff_raw_get_dma(struct xdp_umem *umem, u64 addr) +static inline dma_addr_t xsk_buff_raw_get_dma(struct xsk_buff_pool *pool, + u64 addr) { - return xp_raw_get_dma(umem->pool, addr); + return xp_raw_get_dma(pool, addr); } -static inline void *xsk_buff_raw_get_data(struct xdp_umem *umem, u64 addr) +static inline void *xsk_buff_raw_get_data(struct xsk_buff_pool *pool, u64 addr) { - return xp_raw_get_data(umem->pool, addr); + return xp_raw_get_data(pool, addr); } static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp) @@ -103,83 +106,83 @@ static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp) xp_dma_sync_for_cpu(xskb); } -static inline void xsk_buff_raw_dma_sync_for_device(struct xdp_umem *umem, +static inline void xsk_buff_raw_dma_sync_for_device(struct xsk_buff_pool *pool, dma_addr_t dma, size_t size) { - xp_dma_sync_for_device(umem->pool, dma, size); + xp_dma_sync_for_device(pool, dma, size); } #else -static inline void xsk_umem_complete_tx(struct xdp_umem *umem, u32 nb_entries) +static inline void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries) { } -static inline bool xsk_umem_consume_tx(struct xdp_umem *umem, - struct xdp_desc *desc) +static inline bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, + struct xdp_desc *desc) { return false; } -static inline void xsk_umem_consume_tx_done(struct xdp_umem *umem) +static inline void xsk_tx_release(struct xsk_buff_pool *pool) { } static inline struct xsk_buff_pool * -xdp_get_xsk_pool_from_qid(struct net_device *dev, u16 queue_id) +xsk_get_pool_from_qid(struct net_device *dev, u16 queue_id) { return NULL; } -static inline void xsk_set_rx_need_wakeup(struct xdp_umem *umem) +static inline void xsk_set_rx_need_wakeup(struct xsk_buff_pool *pool) { } -static inline void xsk_set_tx_need_wakeup(struct xdp_umem *umem) +static inline void xsk_set_tx_need_wakeup(struct xsk_buff_pool *pool) { } -static inline void xsk_clear_rx_need_wakeup(struct xdp_umem *umem) +static inline void xsk_clear_rx_need_wakeup(struct xsk_buff_pool *pool) { } -static inline void xsk_clear_tx_need_wakeup(struct xdp_umem *umem) +static inline void xsk_clear_tx_need_wakeup(struct xsk_buff_pool *pool) { } -static inline bool xsk_umem_uses_need_wakeup(struct xdp_umem *umem) +static inline bool xsk_uses_need_wakeup(struct xsk_buff_pool *pool) { return false; } -static inline u32 xsk_umem_get_headroom(struct xdp_umem *umem) +static inline u32 xsk_pool_get_headroom(struct xsk_buff_pool *pool) { return 0; } -static inline u32 xsk_umem_get_chunk_size(struct xdp_umem *umem) +static inline u32 xsk_pool_get_chunk_size(struct xsk_buff_pool *pool) { return 0; } -static inline u32 xsk_umem_get_rx_frame_size(struct xdp_umem *umem) +static inline u32 xsk_pool_get_rx_frame_size(struct xsk_buff_pool *pool) { return 0; } -static inline void xsk_buff_set_rxq_info(struct xdp_umem *umem, +static inline void xsk_pool_set_rxq_info(struct xsk_buff_pool *pool, struct xdp_rxq_info *rxq) { } -static inline void xsk_buff_dma_unmap(struct xdp_umem *umem, +static inline void xsk_pool_dma_unmap(struct xsk_buff_pool *pool, unsigned long attrs) { } -static inline int xsk_buff_dma_map(struct xdp_umem *umem, struct device *dev, - unsigned long attrs) +static inline int xsk_pool_dma_map(struct xsk_buff_pool *pool, + struct device *dev, unsigned long attrs) { return 0; } @@ -194,12 +197,12 @@ static inline dma_addr_t xsk_buff_xdp_get_frame_dma(struct xdp_buff *xdp) return 0; } -static inline struct xdp_buff *xsk_buff_alloc(struct xdp_umem *umem) +static inline struct xdp_buff *xsk_buff_alloc(struct xsk_buff_pool *pool) { return NULL; } -static inline bool xsk_buff_can_alloc(struct xdp_umem *umem, u32 count) +static inline bool xsk_buff_can_alloc(struct xsk_buff_pool *pool, u32 count) { return false; } @@ -208,12 +211,13 @@ static inline void xsk_buff_free(struct xdp_buff *xdp) { } -static inline dma_addr_t xsk_buff_raw_get_dma(struct xdp_umem *umem, u64 addr) +static inline dma_addr_t xsk_buff_raw_get_dma(struct xsk_buff_pool *pool, + u64 addr) { return 0; } -static inline void *xsk_buff_raw_get_data(struct xdp_umem *umem, u64 addr) +static inline void *xsk_buff_raw_get_data(struct xsk_buff_pool *pool, u64 addr) { return NULL; } @@ -222,7 +226,7 @@ static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp) { } -static inline void xsk_buff_raw_dma_sync_for_device(struct xdp_umem *umem, +static inline void xsk_buff_raw_dma_sync_for_device(struct xsk_buff_pool *pool, dma_addr_t dma, size_t size) { diff --git a/net/ethtool/channels.c b/net/ethtool/channels.c index 78d990b..9ecda09 100644 --- a/net/ethtool/channels.c +++ b/net/ethtool/channels.c @@ -223,7 +223,7 @@ int ethnl_set_channels(struct sk_buff *skb, struct genl_info *info) from_channel = channels.combined_count + min(channels.rx_count, channels.tx_count); for (i = from_channel; i < old_total; i++) - if (xdp_get_xsk_pool_from_qid(dev, i)) { + if (xsk_get_pool_from_qid(dev, i)) { GENL_SET_ERR_MSG(info, "requested channel counts are too low for existing zerocopy AF_XDP sockets"); return -EINVAL; } diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c index 91de16d..2d94306 100644 --- a/net/ethtool/ioctl.c +++ b/net/ethtool/ioctl.c @@ -1702,7 +1702,7 @@ static noinline_for_stack int ethtool_set_channels(struct net_device *dev, min(channels.rx_count, channels.tx_count); to_channel = curr.combined_count + max(curr.rx_count, curr.tx_count); for (i = from_channel; i < to_channel; i++) - if (xdp_get_xsk_pool_from_qid(dev, i)) + if (xsk_get_pool_from_qid(dev, i)) return -EINVAL; ret = dev->ethtool_ops->set_channels(dev, &channels); diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c index 0b5f3b0..adde4d5 100644 --- a/net/xdp/xdp_umem.c +++ b/net/xdp/xdp_umem.c @@ -51,9 +51,9 @@ void xdp_del_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs) * not know if the device has more tx queues than rx, or the opposite. * This might also change during run time. */ -static int xdp_reg_xsk_pool_at_qid(struct net_device *dev, - struct xsk_buff_pool *pool, - u16 queue_id) +static int xsk_reg_pool_at_qid(struct net_device *dev, + struct xsk_buff_pool *pool, + u16 queue_id) { if (queue_id >= max_t(unsigned int, dev->real_num_rx_queues, @@ -68,8 +68,8 @@ static int xdp_reg_xsk_pool_at_qid(struct net_device *dev, return 0; } -struct xsk_buff_pool *xdp_get_xsk_pool_from_qid(struct net_device *dev, - u16 queue_id) +struct xsk_buff_pool *xsk_get_pool_from_qid(struct net_device *dev, + u16 queue_id) { if (queue_id < dev->real_num_rx_queues) return dev->_rx[queue_id].pool; @@ -78,9 +78,9 @@ struct xsk_buff_pool *xdp_get_xsk_pool_from_qid(struct net_device *dev, return NULL; } -EXPORT_SYMBOL(xdp_get_xsk_pool_from_qid); +EXPORT_SYMBOL(xsk_get_pool_from_qid); -static void xdp_clear_xsk_pool_at_qid(struct net_device *dev, u16 queue_id) +static void xsk_clear_pool_at_qid(struct net_device *dev, u16 queue_id) { if (queue_id < dev->real_num_rx_queues) dev->_rx[queue_id].pool = NULL; @@ -103,10 +103,10 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, if (force_zc && force_copy) return -EINVAL; - if (xdp_get_xsk_pool_from_qid(dev, queue_id)) + if (xsk_get_pool_from_qid(dev, queue_id)) return -EBUSY; - err = xdp_reg_xsk_pool_at_qid(dev, umem->pool, queue_id); + err = xsk_reg_pool_at_qid(dev, umem->pool, queue_id); if (err) return err; @@ -119,7 +119,7 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, * Also for supporting drivers that do not implement this * feature. They will always have to call sendto(). */ - xsk_set_tx_need_wakeup(umem); + xsk_set_tx_need_wakeup(umem->pool); } dev_hold(dev); @@ -148,7 +148,7 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, if (!force_zc) err = 0; /* fallback to copy mode */ if (err) - xdp_clear_xsk_pool_at_qid(dev, queue_id); + xsk_clear_pool_at_qid(dev, queue_id); return err; } @@ -173,7 +173,7 @@ void xdp_umem_clear_dev(struct xdp_umem *umem) WARN(1, "failed to disable umem!\n"); } - xdp_clear_xsk_pool_at_qid(umem->dev, umem->queue_id); + xsk_clear_pool_at_qid(umem->dev, umem->queue_id); dev_put(umem->dev); umem->dev = NULL; diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 3700266..7551f5b 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -39,8 +39,10 @@ bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs) READ_ONCE(xs->umem->fq); } -void xsk_set_rx_need_wakeup(struct xdp_umem *umem) +void xsk_set_rx_need_wakeup(struct xsk_buff_pool *pool) { + struct xdp_umem *umem = pool->umem; + if (umem->need_wakeup & XDP_WAKEUP_RX) return; @@ -49,8 +51,9 @@ void xsk_set_rx_need_wakeup(struct xdp_umem *umem) } EXPORT_SYMBOL(xsk_set_rx_need_wakeup); -void xsk_set_tx_need_wakeup(struct xdp_umem *umem) +void xsk_set_tx_need_wakeup(struct xsk_buff_pool *pool) { + struct xdp_umem *umem = pool->umem; struct xdp_sock *xs; if (umem->need_wakeup & XDP_WAKEUP_TX) @@ -66,8 +69,10 @@ void xsk_set_tx_need_wakeup(struct xdp_umem *umem) } EXPORT_SYMBOL(xsk_set_tx_need_wakeup); -void xsk_clear_rx_need_wakeup(struct xdp_umem *umem) +void xsk_clear_rx_need_wakeup(struct xsk_buff_pool *pool) { + struct xdp_umem *umem = pool->umem; + if (!(umem->need_wakeup & XDP_WAKEUP_RX)) return; @@ -76,8 +81,9 @@ void xsk_clear_rx_need_wakeup(struct xdp_umem *umem) } EXPORT_SYMBOL(xsk_clear_rx_need_wakeup); -void xsk_clear_tx_need_wakeup(struct xdp_umem *umem) +void xsk_clear_tx_need_wakeup(struct xsk_buff_pool *pool) { + struct xdp_umem *umem = pool->umem; struct xdp_sock *xs; if (!(umem->need_wakeup & XDP_WAKEUP_TX)) @@ -93,11 +99,11 @@ void xsk_clear_tx_need_wakeup(struct xdp_umem *umem) } EXPORT_SYMBOL(xsk_clear_tx_need_wakeup); -bool xsk_umem_uses_need_wakeup(struct xdp_umem *umem) +bool xsk_uses_need_wakeup(struct xsk_buff_pool *pool) { - return umem->flags & XDP_UMEM_USES_NEED_WAKEUP; + return pool->umem->flags & XDP_UMEM_USES_NEED_WAKEUP; } -EXPORT_SYMBOL(xsk_umem_uses_need_wakeup); +EXPORT_SYMBOL(xsk_uses_need_wakeup); void xp_release(struct xdp_buff_xsk *xskb) { @@ -155,12 +161,12 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len, struct xdp_buff *xsk_xdp; int err; - if (len > xsk_umem_get_rx_frame_size(xs->umem)) { + if (len > xsk_pool_get_rx_frame_size(xs->pool)) { xs->rx_dropped++; return -ENOSPC; } - xsk_xdp = xsk_buff_alloc(xs->umem); + xsk_xdp = xsk_buff_alloc(xs->pool); if (!xsk_xdp) { xs->rx_dropped++; return -ENOSPC; @@ -249,27 +255,28 @@ void __xsk_map_flush(void) } } -void xsk_umem_complete_tx(struct xdp_umem *umem, u32 nb_entries) +void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries) { - xskq_prod_submit_n(umem->cq, nb_entries); + xskq_prod_submit_n(pool->umem->cq, nb_entries); } -EXPORT_SYMBOL(xsk_umem_complete_tx); +EXPORT_SYMBOL(xsk_tx_completed); -void xsk_umem_consume_tx_done(struct xdp_umem *umem) +void xsk_tx_release(struct xsk_buff_pool *pool) { struct xdp_sock *xs; rcu_read_lock(); - list_for_each_entry_rcu(xs, &umem->xsk_tx_list, list) { + list_for_each_entry_rcu(xs, &pool->umem->xsk_tx_list, list) { __xskq_cons_release(xs->tx); xs->sk.sk_write_space(&xs->sk); } rcu_read_unlock(); } -EXPORT_SYMBOL(xsk_umem_consume_tx_done); +EXPORT_SYMBOL(xsk_tx_release); -bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc) +bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc) { + struct xdp_umem *umem = pool->umem; struct xdp_sock *xs; rcu_read_lock(); @@ -294,7 +301,7 @@ bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc) rcu_read_unlock(); return false; } -EXPORT_SYMBOL(xsk_umem_consume_tx); +EXPORT_SYMBOL(xsk_tx_peek_desc); static int xsk_wakeup(struct xdp_sock *xs, u8 flags) { @@ -357,7 +364,7 @@ static int xsk_generic_xmit(struct sock *sk) skb_put(skb, len); addr = desc.addr; - buffer = xsk_buff_raw_get_data(xs->umem, addr); + buffer = xsk_buff_raw_get_data(xs->pool, addr); err = skb_store_bits(skb, 0, buffer, len); /* This is the backpressure mechanism for the Tx path. * Reserve space in the completion queue and only proceed @@ -758,6 +765,8 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname, return PTR_ERR(umem); } + xs->pool = umem->pool; + /* Make sure umem is ready before it can be seen by others */ smp_wmb(); WRITE_ONCE(xs->umem, umem); From patchwork Thu Jul 2 12:19:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Magnus Karlsson X-Patchwork-Id: 1321360 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: incoming-bpf@patchwork.ozlabs.org Delivered-To: patchwork-incoming-bpf@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=bpf-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49yHGL1Hbhz9sR4 for ; Thu, 2 Jul 2020 22:19:38 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729014AbgGBMTh (ORCPT ); Thu, 2 Jul 2020 08:19:37 -0400 Received: from mga12.intel.com ([192.55.52.136]:6897 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729001AbgGBMTh (ORCPT ); Thu, 2 Jul 2020 08:19:37 -0400 IronPort-SDR: C8hf/MR3Ui9RejNGKqA2UlK1UbN11GdIH7Vo6TQM7XCeDYr938q2+fyCk63gkzQVk3beuINV0R Cm0Wc/DGoqbw== X-IronPort-AV: E=McAfee;i="6000,8403,9669"; a="126486076" X-IronPort-AV: E=Sophos;i="5.75,304,1589266800"; d="scan'208";a="126486076" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Jul 2020 05:19:36 -0700 IronPort-SDR: +NUslLDGRlJzQqQCXy2zqP+pz++fbn7dTcj7RxFZna8qvk5e8CM7AqnjG4hid7g29zaEp2jrpQ 5TkGAmrRVkXQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,304,1589266800"; d="scan'208";a="425933284" Received: from mkarlsso-mobl.ger.corp.intel.com (HELO localhost.localdomain) ([10.252.39.242]) by orsmga004.jf.intel.com with ESMTP; 02 Jul 2020 05:19:32 -0700 From: Magnus Karlsson To: magnus.karlsson@intel.com, bjorn.topel@intel.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org, jonathan.lemon@gmail.com, maximmi@mellanox.com Cc: bpf@vger.kernel.org, jeffrey.t.kirsher@intel.com, maciej.fijalkowski@intel.com, maciejromanfijalkowski@gmail.com, cristian.dumitrescu@intel.com Subject: [PATCH bpf-next 03/14] xsk: create and free context independently from umem Date: Thu, 2 Jul 2020 14:19:02 +0200 Message-Id: <1593692353-15102-4-git-send-email-magnus.karlsson@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1593692353-15102-1-git-send-email-magnus.karlsson@intel.com> References: <1593692353-15102-1-git-send-email-magnus.karlsson@intel.com> Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Create and free the buffer pool independently from the umem. Move these operations that are performed on the buffer pool from the umem create and destroy functions to new create and destroy functions just for the buffer pool. This so that in later commits we can instantiate multiple buffer pools per umem when sharing a umem between HW queues and/or devices. We also erradicate the back pointer from the umem to the buffer pool as this will not work when we introduce the possibility to have multiple buffer pools per umem. It might seem a bit odd that we create an empty buffer pool first and then recreate it with its right size when we bind to a device and umem. But the page pool will in later commits be used to carry information before it has been assigned to a umem and its size decided. Signed-off-by: Magnus Karlsson --- include/net/xdp_sock.h | 3 +- include/net/xsk_buff_pool.h | 14 +++- net/xdp/xdp_umem.c | 164 ++++---------------------------------------- net/xdp/xdp_umem.h | 4 +- net/xdp/xsk.c | 83 +++++++++++++++++++--- net/xdp/xsk.h | 3 + net/xdp/xsk_buff_pool.c | 154 +++++++++++++++++++++++++++++++++++++---- net/xdp/xsk_queue.h | 12 ++-- 8 files changed, 250 insertions(+), 187 deletions(-) diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index 6eb9628..b9bb118 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -20,13 +20,12 @@ struct xdp_buff; struct xdp_umem { struct xsk_queue *fq; struct xsk_queue *cq; - struct xsk_buff_pool *pool; u64 size; u32 headroom; u32 chunk_size; + u32 chunks; struct user_struct *user; refcount_t users; - struct work_struct work; struct page **pgs; u32 npgs; u16 queue_id; diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h index a6dec9c..cda8ced 100644 --- a/include/net/xsk_buff_pool.h +++ b/include/net/xsk_buff_pool.h @@ -14,6 +14,7 @@ struct xdp_rxq_info; struct xsk_queue; struct xdp_desc; struct xdp_umem; +struct xdp_sock; struct device; struct page; @@ -46,16 +47,23 @@ struct xsk_buff_pool { struct xdp_umem *umem; void *addrs; struct device *dev; + refcount_t users; + struct work_struct work; struct xdp_buff_xsk *free_heads[]; }; /* AF_XDP core. */ -struct xsk_buff_pool *xp_create(struct xdp_umem *umem, u32 chunks, - u32 chunk_size, u32 headroom, u64 size, - bool unaligned); +struct xsk_buff_pool *xp_create(void); +struct xsk_buff_pool *xp_assign_umem(struct xsk_buff_pool *pool, + struct xdp_umem *umem); +int xp_assign_dev(struct xsk_buff_pool *pool, struct xdp_sock *xs, + struct net_device *dev, u16 queue_id, u16 flags); void xp_set_fq(struct xsk_buff_pool *pool, struct xsk_queue *fq); void xp_destroy(struct xsk_buff_pool *pool); void xp_release(struct xdp_buff_xsk *xskb); +void xp_get_pool(struct xsk_buff_pool *pool); +void xp_put_pool(struct xsk_buff_pool *pool); +void xp_clear_dev(struct xsk_buff_pool *pool); /* AF_XDP, and XDP core. */ void xp_free(struct xdp_buff_xsk *xskb); diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c index adde4d5..f290345 100644 --- a/net/xdp/xdp_umem.c +++ b/net/xdp/xdp_umem.c @@ -47,160 +47,41 @@ void xdp_del_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs) spin_unlock_irqrestore(&umem->xsk_tx_list_lock, flags); } -/* The umem is stored both in the _rx struct and the _tx struct as we do - * not know if the device has more tx queues than rx, or the opposite. - * This might also change during run time. - */ -static int xsk_reg_pool_at_qid(struct net_device *dev, - struct xsk_buff_pool *pool, - u16 queue_id) -{ - if (queue_id >= max_t(unsigned int, - dev->real_num_rx_queues, - dev->real_num_tx_queues)) - return -EINVAL; - - if (queue_id < dev->real_num_rx_queues) - dev->_rx[queue_id].pool = pool; - if (queue_id < dev->real_num_tx_queues) - dev->_tx[queue_id].pool = pool; - - return 0; -} - -struct xsk_buff_pool *xsk_get_pool_from_qid(struct net_device *dev, - u16 queue_id) +static void xdp_umem_unpin_pages(struct xdp_umem *umem) { - if (queue_id < dev->real_num_rx_queues) - return dev->_rx[queue_id].pool; - if (queue_id < dev->real_num_tx_queues) - return dev->_tx[queue_id].pool; + unpin_user_pages_dirty_lock(umem->pgs, umem->npgs, true); - return NULL; + kfree(umem->pgs); + umem->pgs = NULL; } -EXPORT_SYMBOL(xsk_get_pool_from_qid); -static void xsk_clear_pool_at_qid(struct net_device *dev, u16 queue_id) +static void xdp_umem_unaccount_pages(struct xdp_umem *umem) { - if (queue_id < dev->real_num_rx_queues) - dev->_rx[queue_id].pool = NULL; - if (queue_id < dev->real_num_tx_queues) - dev->_tx[queue_id].pool = NULL; + if (umem->user) { + atomic_long_sub(umem->npgs, &umem->user->locked_vm); + free_uid(umem->user); + } } -int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, - u16 queue_id, u16 flags) +void xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, + u16 queue_id) { - bool force_zc, force_copy; - struct netdev_bpf bpf; - int err = 0; - - ASSERT_RTNL(); - - force_zc = flags & XDP_ZEROCOPY; - force_copy = flags & XDP_COPY; - - if (force_zc && force_copy) - return -EINVAL; - - if (xsk_get_pool_from_qid(dev, queue_id)) - return -EBUSY; - - err = xsk_reg_pool_at_qid(dev, umem->pool, queue_id); - if (err) - return err; - umem->dev = dev; umem->queue_id = queue_id; - if (flags & XDP_USE_NEED_WAKEUP) { - umem->flags |= XDP_UMEM_USES_NEED_WAKEUP; - /* Tx needs to be explicitly woken up the first time. - * Also for supporting drivers that do not implement this - * feature. They will always have to call sendto(). - */ - xsk_set_tx_need_wakeup(umem->pool); - } - dev_hold(dev); - - if (force_copy) - /* For copy-mode, we are done. */ - return 0; - - if (!dev->netdev_ops->ndo_bpf || !dev->netdev_ops->ndo_xsk_wakeup) { - err = -EOPNOTSUPP; - goto err_unreg_umem; - } - - bpf.command = XDP_SETUP_XSK_POOL; - bpf.xsk.pool = umem->pool; - bpf.xsk.queue_id = queue_id; - - err = dev->netdev_ops->ndo_bpf(dev, &bpf); - if (err) - goto err_unreg_umem; - - umem->zc = true; - return 0; - -err_unreg_umem: - if (!force_zc) - err = 0; /* fallback to copy mode */ - if (err) - xsk_clear_pool_at_qid(dev, queue_id); - return err; } void xdp_umem_clear_dev(struct xdp_umem *umem) { - struct netdev_bpf bpf; - int err; - - ASSERT_RTNL(); - - if (!umem->dev) - return; - - if (umem->zc) { - bpf.command = XDP_SETUP_XSK_POOL; - bpf.xsk.pool = NULL; - bpf.xsk.queue_id = umem->queue_id; - - err = umem->dev->netdev_ops->ndo_bpf(umem->dev, &bpf); - - if (err) - WARN(1, "failed to disable umem!\n"); - } - - xsk_clear_pool_at_qid(umem->dev, umem->queue_id); - dev_put(umem->dev); umem->dev = NULL; umem->zc = false; } -static void xdp_umem_unpin_pages(struct xdp_umem *umem) -{ - unpin_user_pages_dirty_lock(umem->pgs, umem->npgs, true); - - kfree(umem->pgs); - umem->pgs = NULL; -} - -static void xdp_umem_unaccount_pages(struct xdp_umem *umem) -{ - if (umem->user) { - atomic_long_sub(umem->npgs, &umem->user->locked_vm); - free_uid(umem->user); - } -} - static void xdp_umem_release(struct xdp_umem *umem) { - rtnl_lock(); xdp_umem_clear_dev(umem); - rtnl_unlock(); ida_simple_remove(&umem_ida, umem->id); @@ -214,20 +95,12 @@ static void xdp_umem_release(struct xdp_umem *umem) umem->cq = NULL; } - xp_destroy(umem->pool); xdp_umem_unpin_pages(umem); xdp_umem_unaccount_pages(umem); kfree(umem); } -static void xdp_umem_release_deferred(struct work_struct *work) -{ - struct xdp_umem *umem = container_of(work, struct xdp_umem, work); - - xdp_umem_release(umem); -} - void xdp_get_umem(struct xdp_umem *umem) { refcount_inc(&umem->users); @@ -238,10 +111,8 @@ void xdp_put_umem(struct xdp_umem *umem) if (!umem) return; - if (refcount_dec_and_test(&umem->users)) { - INIT_WORK(&umem->work, xdp_umem_release_deferred); - schedule_work(&umem->work); - } + if (refcount_dec_and_test(&umem->users)) + xdp_umem_release(umem); } static int xdp_umem_pin_pages(struct xdp_umem *umem, unsigned long address) @@ -357,6 +228,7 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr) umem->size = size; umem->headroom = headroom; umem->chunk_size = chunk_size; + umem->chunks = chunks; umem->npgs = (u32)npgs; umem->pgs = NULL; umem->user = NULL; @@ -374,16 +246,8 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr) if (err) goto out_account; - umem->pool = xp_create(umem, chunks, chunk_size, headroom, size, - unaligned_chunks); - if (!umem->pool) { - err = -ENOMEM; - goto out_pin; - } return 0; -out_pin: - xdp_umem_unpin_pages(umem); out_account: xdp_umem_unaccount_pages(umem); return err; diff --git a/net/xdp/xdp_umem.h b/net/xdp/xdp_umem.h index 32067fe..93e96be 100644 --- a/net/xdp/xdp_umem.h +++ b/net/xdp/xdp_umem.h @@ -8,8 +8,8 @@ #include -int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, - u16 queue_id, u16 flags); +void xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, + u16 queue_id); void xdp_umem_clear_dev(struct xdp_umem *umem); bool xdp_umem_validate_queues(struct xdp_umem *umem); void xdp_get_umem(struct xdp_umem *umem); diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 7551f5b..b12a832 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -105,6 +105,46 @@ bool xsk_uses_need_wakeup(struct xsk_buff_pool *pool) } EXPORT_SYMBOL(xsk_uses_need_wakeup); +struct xsk_buff_pool *xsk_get_pool_from_qid(struct net_device *dev, + u16 queue_id) +{ + if (queue_id < dev->real_num_rx_queues) + return dev->_rx[queue_id].pool; + if (queue_id < dev->real_num_tx_queues) + return dev->_tx[queue_id].pool; + + return NULL; +} +EXPORT_SYMBOL(xsk_get_pool_from_qid); + +void xsk_clear_pool_at_qid(struct net_device *dev, u16 queue_id) +{ + if (queue_id < dev->real_num_rx_queues) + dev->_rx[queue_id].pool = NULL; + if (queue_id < dev->real_num_tx_queues) + dev->_tx[queue_id].pool = NULL; +} + +/* The buffer pool is stored both in the _rx struct and the _tx struct as we do + * not know if the device has more tx queues than rx, or the opposite. + * This might also change during run time. + */ +int xsk_reg_pool_at_qid(struct net_device *dev, struct xsk_buff_pool *pool, + u16 queue_id) +{ + if (queue_id >= max_t(unsigned int, + dev->real_num_rx_queues, + dev->real_num_tx_queues)) + return -EINVAL; + + if (queue_id < dev->real_num_rx_queues) + dev->_rx[queue_id].pool = pool; + if (queue_id < dev->real_num_tx_queues) + dev->_tx[queue_id].pool = pool; + + return 0; +} + void xp_release(struct xdp_buff_xsk *xskb) { xskb->pool->free_heads[xskb->pool->free_heads_cnt++] = xskb; @@ -281,7 +321,7 @@ bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc) rcu_read_lock(); list_for_each_entry_rcu(xs, &umem->xsk_tx_list, list) { - if (!xskq_cons_peek_desc(xs->tx, desc, umem)) + if (!xskq_cons_peek_desc(xs->tx, desc, pool)) continue; /* This is the backpressure mechanism for the Tx path. @@ -347,7 +387,7 @@ static int xsk_generic_xmit(struct sock *sk) if (xs->queue_id >= xs->dev->real_num_tx_queues) goto out; - while (xskq_cons_peek_desc(xs->tx, &desc, xs->umem)) { + while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) { char *buffer; u64 addr; u32 len; @@ -629,6 +669,7 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) qid = sxdp->sxdp_queue_id; if (flags & XDP_SHARED_UMEM) { + struct xsk_buff_pool *curr_pool; struct xdp_sock *umem_xs; struct socket *sock; @@ -663,6 +704,11 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) goto out_unlock; } + /* Share the buffer pool with the other socket. */ + xp_get_pool(umem_xs->pool); + curr_pool = xs->pool; + xs->pool = umem_xs->pool; + xp_destroy(curr_pool); xdp_get_umem(umem_xs->umem); WRITE_ONCE(xs->umem, umem_xs->umem); sockfd_put(sock); @@ -670,10 +716,24 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) err = -EINVAL; goto out_unlock; } else { + struct xsk_buff_pool *new_pool; + /* This xsk has its own umem. */ - err = xdp_umem_assign_dev(xs->umem, dev, qid, flags); - if (err) + xdp_umem_assign_dev(xs->umem, dev, qid); + new_pool = xp_assign_umem(xs->pool, xs->umem); + if (!new_pool) { + err = -ENOMEM; + xdp_umem_clear_dev(xs->umem); + goto out_unlock; + } + + err = xp_assign_dev(new_pool, xs, dev, qid, flags); + if (err) { + xp_destroy(new_pool); + xdp_umem_clear_dev(xs->umem); goto out_unlock; + } + xs->pool = new_pool; } xs->dev = dev; @@ -765,8 +825,6 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname, return PTR_ERR(umem); } - xs->pool = umem->pool; - /* Make sure umem is ready before it can be seen by others */ smp_wmb(); WRITE_ONCE(xs->umem, umem); @@ -796,7 +854,7 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname, &xs->umem->cq; err = xsk_init_queue(entries, q, true); if (optname == XDP_UMEM_FILL_RING) - xp_set_fq(xs->umem->pool, *q); + xp_set_fq(xs->pool, *q); mutex_unlock(&xs->mutex); return err; } @@ -1002,7 +1060,8 @@ static int xsk_notifier(struct notifier_block *this, xsk_unbind_dev(xs); - /* Clear device references in umem. */ + /* Clear device references. */ + xp_clear_dev(xs->pool); xdp_umem_clear_dev(xs->umem); } mutex_unlock(&xs->mutex); @@ -1047,7 +1106,7 @@ static void xsk_destruct(struct sock *sk) if (!sock_flag(sk, SOCK_DEAD)) return; - xdp_put_umem(xs->umem); + xp_put_pool(xs->pool); sk_refcnt_debug_dec(sk); } @@ -1055,8 +1114,8 @@ static void xsk_destruct(struct sock *sk) static int xsk_create(struct net *net, struct socket *sock, int protocol, int kern) { - struct sock *sk; struct xdp_sock *xs; + struct sock *sk; if (!ns_capable(net->user_ns, CAP_NET_RAW)) return -EPERM; @@ -1092,6 +1151,10 @@ static int xsk_create(struct net *net, struct socket *sock, int protocol, INIT_LIST_HEAD(&xs->map_list); spin_lock_init(&xs->map_list_lock); + xs->pool = xp_create(); + if (!xs->pool) + return -ENOMEM; + mutex_lock(&net->xdp.lock); sk_add_node_rcu(sk, &net->xdp.list); mutex_unlock(&net->xdp.lock); diff --git a/net/xdp/xsk.h b/net/xdp/xsk.h index 455ddd4..a00e3e2 100644 --- a/net/xdp/xsk.h +++ b/net/xdp/xsk.h @@ -51,5 +51,8 @@ void xsk_map_try_sock_delete(struct xsk_map *map, struct xdp_sock *xs, struct xdp_sock **map_entry); int xsk_map_inc(struct xsk_map *map); void xsk_map_put(struct xsk_map *map); +void xsk_clear_pool_at_qid(struct net_device *dev, u16 queue_id); +int xsk_reg_pool_at_qid(struct net_device *dev, struct xsk_buff_pool *pool, + u16 queue_id); #endif /* XSK_H_ */ diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c index c57f0bb..da93b36 100644 --- a/net/xdp/xsk_buff_pool.c +++ b/net/xdp/xsk_buff_pool.c @@ -2,11 +2,14 @@ #include #include +#include #include #include #include #include "xsk_queue.h" +#include "xdp_umem.h" +#include "xsk.h" static void xp_addr_unmap(struct xsk_buff_pool *pool) { @@ -32,39 +35,48 @@ void xp_destroy(struct xsk_buff_pool *pool) kvfree(pool); } -struct xsk_buff_pool *xp_create(struct xdp_umem *umem, u32 chunks, - u32 chunk_size, u32 headroom, u64 size, - bool unaligned) +struct xsk_buff_pool *xp_create(void) +{ + return kvzalloc(sizeof(struct xsk_buff_pool), GFP_KERNEL); +} + +struct xsk_buff_pool *xp_assign_umem(struct xsk_buff_pool *pool_old, + struct xdp_umem *umem) { struct xsk_buff_pool *pool; struct xdp_buff_xsk *xskb; int err; u32 i; - pool = kvzalloc(struct_size(pool, free_heads, chunks), GFP_KERNEL); + pool = kvzalloc(struct_size(pool, free_heads, umem->chunks), + GFP_KERNEL); if (!pool) goto out; - pool->heads = kvcalloc(chunks, sizeof(*pool->heads), GFP_KERNEL); + memcpy(pool, pool_old, sizeof(*pool_old)); + + pool->heads = kvcalloc(umem->chunks, sizeof(*pool->heads), GFP_KERNEL); if (!pool->heads) goto out; - pool->chunk_mask = ~((u64)chunk_size - 1); - pool->addrs_cnt = size; - pool->heads_cnt = chunks; - pool->free_heads_cnt = chunks; - pool->headroom = headroom; - pool->chunk_size = chunk_size; + pool->chunk_mask = ~((u64)umem->chunk_size - 1); + pool->addrs_cnt = umem->size; + pool->heads_cnt = umem->chunks; + pool->free_heads_cnt = umem->chunks; + pool->headroom = umem->headroom; + pool->chunk_size = umem->chunk_size; pool->cheap_dma = true; - pool->unaligned = unaligned; - pool->frame_len = chunk_size - headroom - XDP_PACKET_HEADROOM; + pool->unaligned = umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG; + pool->frame_len = umem->chunk_size - umem->headroom - + XDP_PACKET_HEADROOM; pool->umem = umem; INIT_LIST_HEAD(&pool->free_list); + refcount_set(&pool->users, 1); for (i = 0; i < pool->free_heads_cnt; i++) { xskb = &pool->heads[i]; xskb->pool = pool; - xskb->xdp.frame_sz = chunk_size - headroom; + xskb->xdp.frame_sz = umem->chunk_size - umem->headroom; pool->free_heads[i] = xskb; } @@ -91,6 +103,120 @@ void xp_set_rxq_info(struct xsk_buff_pool *pool, struct xdp_rxq_info *rxq) } EXPORT_SYMBOL(xp_set_rxq_info); +int xp_assign_dev(struct xsk_buff_pool *pool, struct xdp_sock *xs, + struct net_device *dev, u16 queue_id, u16 flags) +{ + struct xdp_umem *umem = pool->umem; + bool force_zc, force_copy; + struct netdev_bpf bpf; + int err = 0; + + ASSERT_RTNL(); + + force_zc = flags & XDP_ZEROCOPY; + force_copy = flags & XDP_COPY; + + if (force_zc && force_copy) + return -EINVAL; + + if (xsk_get_pool_from_qid(dev, queue_id)) + return -EBUSY; + + err = xsk_reg_pool_at_qid(dev, pool, queue_id); + if (err) + return err; + + if ((flags & XDP_USE_NEED_WAKEUP) && xs->tx) { + umem->flags |= XDP_UMEM_USES_NEED_WAKEUP; + /* Tx needs to be explicitly woken up the first time. + * Also for supporting drivers that do not implement this + * feature. They will always have to call sendto(). + */ + xs->tx->ring->flags |= XDP_RING_NEED_WAKEUP; + } + + if (force_copy) + /* For copy-mode, we are done. */ + return 0; + + if (!dev->netdev_ops->ndo_bpf || !dev->netdev_ops->ndo_xsk_wakeup) { + err = -EOPNOTSUPP; + goto err_unreg_pool; + } + + bpf.command = XDP_SETUP_XSK_POOL; + bpf.xsk.pool = pool; + bpf.xsk.queue_id = queue_id; + + err = dev->netdev_ops->ndo_bpf(dev, &bpf); + if (err) + goto err_unreg_pool; + + umem->zc = true; + return 0; + +err_unreg_pool: + if (!force_zc) + err = 0; /* fallback to copy mode */ + if (err) + xsk_clear_pool_at_qid(dev, queue_id); + return err; +} + +void xp_clear_dev(struct xsk_buff_pool *pool) +{ + struct xdp_umem *umem = pool->umem; + struct netdev_bpf bpf; + int err; + + ASSERT_RTNL(); + + if (!umem->dev) + return; + + if (umem->zc) { + bpf.command = XDP_SETUP_XSK_POOL; + bpf.xsk.pool = NULL; + bpf.xsk.queue_id = umem->queue_id; + + err = umem->dev->netdev_ops->ndo_bpf(umem->dev, &bpf); + + if (err) + WARN(1, "failed to disable umem!\n"); + } + + xsk_clear_pool_at_qid(umem->dev, umem->queue_id); +} + +static void xp_release_deferred(struct work_struct *work) +{ + struct xsk_buff_pool *pool = container_of(work, struct xsk_buff_pool, + work); + + rtnl_lock(); + xp_clear_dev(pool); + rtnl_unlock(); + + xdp_put_umem(pool->umem); + xp_destroy(pool); +} + +void xp_get_pool(struct xsk_buff_pool *pool) +{ + refcount_inc(&pool->users); +} + +void xp_put_pool(struct xsk_buff_pool *pool) +{ + if (!pool) + return; + + if (refcount_dec_and_test(&pool->users)) { + INIT_WORK(&pool->work, xp_release_deferred); + schedule_work(&pool->work); + } +} + void xp_dma_unmap(struct xsk_buff_pool *pool, unsigned long attrs) { dma_addr_t *dma; diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h index 5b5d24d..75f1853 100644 --- a/net/xdp/xsk_queue.h +++ b/net/xdp/xsk_queue.h @@ -165,9 +165,9 @@ static inline bool xp_validate_desc(struct xsk_buff_pool *pool, static inline bool xskq_cons_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d, - struct xdp_umem *umem) + struct xsk_buff_pool *pool) { - if (!xp_validate_desc(umem->pool, d)) { + if (!xp_validate_desc(pool, d)) { q->invalid_descs++; return false; } @@ -176,14 +176,14 @@ static inline bool xskq_cons_is_valid_desc(struct xsk_queue *q, static inline bool xskq_cons_read_desc(struct xsk_queue *q, struct xdp_desc *desc, - struct xdp_umem *umem) + struct xsk_buff_pool *pool) { while (q->cached_cons != q->cached_prod) { struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring; u32 idx = q->cached_cons & q->ring_mask; *desc = ring->desc[idx]; - if (xskq_cons_is_valid_desc(q, desc, umem)) + if (xskq_cons_is_valid_desc(q, desc, pool)) return true; q->cached_cons++; @@ -235,11 +235,11 @@ static inline bool xskq_cons_peek_addr_unchecked(struct xsk_queue *q, u64 *addr) static inline bool xskq_cons_peek_desc(struct xsk_queue *q, struct xdp_desc *desc, - struct xdp_umem *umem) + struct xsk_buff_pool *pool) { if (q->cached_prod == q->cached_cons) xskq_cons_get_entries(q); - return xskq_cons_read_desc(q, desc, umem); + return xskq_cons_read_desc(q, desc, pool); } static inline void xskq_cons_release(struct xsk_queue *q) From patchwork Thu Jul 2 12:19:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Magnus Karlsson X-Patchwork-Id: 1321362 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: incoming-bpf@patchwork.ozlabs.org Delivered-To: patchwork-incoming-bpf@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=bpf-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49yHGP5j25z9sR4 for ; Thu, 2 Jul 2020 22:19:41 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729022AbgGBMTl (ORCPT ); Thu, 2 Jul 2020 08:19:41 -0400 Received: from mga12.intel.com ([192.55.52.136]:6897 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729001AbgGBMTk (ORCPT ); Thu, 2 Jul 2020 08:19:40 -0400 IronPort-SDR: 58mc8AnfTJiw2rlCJBRUnxYgeKIWNvOSsudeRxv3oUjLjSnqemLotL/6izJoQRKS0AuLgDTFsA mT2UoZ92Xh5Q== X-IronPort-AV: E=McAfee;i="6000,8403,9669"; a="126486083" X-IronPort-AV: E=Sophos;i="5.75,304,1589266800"; d="scan'208";a="126486083" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Jul 2020 05:19:40 -0700 IronPort-SDR: 9BuRF/NbD/YtGJMORtfLqOLbXkQFb8G3Pn/IfzvELgxqekmGxbzMKpIFT8lmrDtiJd76HmzR5d pQKAo3tU31og== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,304,1589266800"; d="scan'208";a="425933300" Received: from mkarlsso-mobl.ger.corp.intel.com (HELO localhost.localdomain) ([10.252.39.242]) by orsmga004.jf.intel.com with ESMTP; 02 Jul 2020 05:19:36 -0700 From: Magnus Karlsson To: magnus.karlsson@intel.com, bjorn.topel@intel.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org, jonathan.lemon@gmail.com, maximmi@mellanox.com Cc: bpf@vger.kernel.org, jeffrey.t.kirsher@intel.com, maciej.fijalkowski@intel.com, maciejromanfijalkowski@gmail.com, cristian.dumitrescu@intel.com Subject: [PATCH bpf-next 04/14] xsk: move fill and completion rings to buffer pool Date: Thu, 2 Jul 2020 14:19:03 +0200 Message-Id: <1593692353-15102-5-git-send-email-magnus.karlsson@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1593692353-15102-1-git-send-email-magnus.karlsson@intel.com> References: <1593692353-15102-1-git-send-email-magnus.karlsson@intel.com> Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Move the fill and completion rings from the umem to the buffer pool. This so that we in a later commit can share the umem between multiple HW queue ids. In this case, we need one fill and completion ring per queue id. As the buffer pool is per queue id and napi id this is a natural place for it and one umem struture can be shared between these buffer pools. Signed-off-by: Magnus Karlsson --- include/net/xdp_sock.h | 2 -- include/net/xsk_buff_pool.h | 3 ++- net/xdp/xdp_umem.c | 15 --------------- net/xdp/xsk.c | 40 ++++++++++++++++++++-------------------- net/xdp/xsk_buff_pool.c | 20 +++++++++++++++----- net/xdp/xsk_diag.c | 10 ++++++---- 6 files changed, 43 insertions(+), 47 deletions(-) diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index b9bb118..2dd3fd9 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -18,8 +18,6 @@ struct xsk_queue; struct xdp_buff; struct xdp_umem { - struct xsk_queue *fq; - struct xsk_queue *cq; u64 size; u32 headroom; u32 chunk_size; diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h index cda8ced..f811e25 100644 --- a/include/net/xsk_buff_pool.h +++ b/include/net/xsk_buff_pool.h @@ -30,6 +30,7 @@ struct xdp_buff_xsk { struct xsk_buff_pool { struct xsk_queue *fq; + struct xsk_queue *cq; struct list_head free_list; dma_addr_t *dma_pages; struct xdp_buff_xsk *heads; @@ -58,12 +59,12 @@ struct xsk_buff_pool *xp_assign_umem(struct xsk_buff_pool *pool, struct xdp_umem *umem); int xp_assign_dev(struct xsk_buff_pool *pool, struct xdp_sock *xs, struct net_device *dev, u16 queue_id, u16 flags); -void xp_set_fq(struct xsk_buff_pool *pool, struct xsk_queue *fq); void xp_destroy(struct xsk_buff_pool *pool); void xp_release(struct xdp_buff_xsk *xskb); void xp_get_pool(struct xsk_buff_pool *pool); void xp_put_pool(struct xsk_buff_pool *pool); void xp_clear_dev(struct xsk_buff_pool *pool); +bool xp_validate_queues(struct xsk_buff_pool *pool); /* AF_XDP, and XDP core. */ void xp_free(struct xdp_buff_xsk *xskb); diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c index f290345..7d86a63 100644 --- a/net/xdp/xdp_umem.c +++ b/net/xdp/xdp_umem.c @@ -85,16 +85,6 @@ static void xdp_umem_release(struct xdp_umem *umem) ida_simple_remove(&umem_ida, umem->id); - if (umem->fq) { - xskq_destroy(umem->fq); - umem->fq = NULL; - } - - if (umem->cq) { - xskq_destroy(umem->cq); - umem->cq = NULL; - } - xdp_umem_unpin_pages(umem); xdp_umem_unaccount_pages(umem); @@ -278,8 +268,3 @@ struct xdp_umem *xdp_umem_create(struct xdp_umem_reg *mr) return umem; } - -bool xdp_umem_validate_queues(struct xdp_umem *umem) -{ - return umem->fq && umem->cq; -} diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index b12a832..92f05b0 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -36,7 +36,7 @@ static DEFINE_PER_CPU(struct list_head, xskmap_flush_list); bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs) { return READ_ONCE(xs->rx) && READ_ONCE(xs->umem) && - READ_ONCE(xs->umem->fq); + READ_ONCE(xs->pool->fq); } void xsk_set_rx_need_wakeup(struct xsk_buff_pool *pool) @@ -46,7 +46,7 @@ void xsk_set_rx_need_wakeup(struct xsk_buff_pool *pool) if (umem->need_wakeup & XDP_WAKEUP_RX) return; - umem->fq->ring->flags |= XDP_RING_NEED_WAKEUP; + pool->fq->ring->flags |= XDP_RING_NEED_WAKEUP; umem->need_wakeup |= XDP_WAKEUP_RX; } EXPORT_SYMBOL(xsk_set_rx_need_wakeup); @@ -76,7 +76,7 @@ void xsk_clear_rx_need_wakeup(struct xsk_buff_pool *pool) if (!(umem->need_wakeup & XDP_WAKEUP_RX)) return; - umem->fq->ring->flags &= ~XDP_RING_NEED_WAKEUP; + pool->fq->ring->flags &= ~XDP_RING_NEED_WAKEUP; umem->need_wakeup &= ~XDP_WAKEUP_RX; } EXPORT_SYMBOL(xsk_clear_rx_need_wakeup); @@ -254,7 +254,7 @@ static int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, static void xsk_flush(struct xdp_sock *xs) { xskq_prod_submit(xs->rx); - __xskq_cons_release(xs->umem->fq); + __xskq_cons_release(xs->pool->fq); sock_def_readable(&xs->sk); } @@ -297,7 +297,7 @@ void __xsk_map_flush(void) void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries) { - xskq_prod_submit_n(pool->umem->cq, nb_entries); + xskq_prod_submit_n(pool->cq, nb_entries); } EXPORT_SYMBOL(xsk_tx_completed); @@ -329,7 +329,7 @@ bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc) * if there is space in it. This avoids having to implement * any buffering in the Tx path. */ - if (xskq_prod_reserve_addr(umem->cq, desc->addr)) + if (xskq_prod_reserve_addr(pool->cq, desc->addr)) goto out; xskq_cons_release(xs->tx); @@ -367,7 +367,7 @@ static void xsk_destruct_skb(struct sk_buff *skb) unsigned long flags; spin_lock_irqsave(&xs->tx_completion_lock, flags); - xskq_prod_submit_addr(xs->umem->cq, addr); + xskq_prod_submit_addr(xs->pool->cq, addr); spin_unlock_irqrestore(&xs->tx_completion_lock, flags); sock_wfree(skb); @@ -411,7 +411,7 @@ static int xsk_generic_xmit(struct sock *sk) * if there is space in it. This avoids having to implement * any buffering in the Tx path. */ - if (unlikely(err) || xskq_prod_reserve(xs->umem->cq)) { + if (unlikely(err) || xskq_prod_reserve(xs->pool->cq)) { kfree_skb(skb); goto out; } @@ -686,6 +686,12 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) goto out_unlock; } + if (xs->pool->fq || xs->pool->cq) { + /* Do not allow setting your own fq or cq. */ + err = -EINVAL; + goto out_unlock; + } + sock = xsk_lookup_xsk_from_fd(sxdp->sxdp_shared_umem_fd); if (IS_ERR(sock)) { err = PTR_ERR(sock); @@ -712,7 +718,7 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) xdp_get_umem(umem_xs->umem); WRITE_ONCE(xs->umem, umem_xs->umem); sockfd_put(sock); - } else if (!xs->umem || !xdp_umem_validate_queues(xs->umem)) { + } else if (!xs->umem || !xp_validate_queues(xs->pool)) { err = -EINVAL; goto out_unlock; } else { @@ -850,11 +856,9 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname, return -EINVAL; } - q = (optname == XDP_UMEM_FILL_RING) ? &xs->umem->fq : - &xs->umem->cq; + q = (optname == XDP_UMEM_FILL_RING) ? &xs->pool->fq : + &xs->pool->cq; err = xsk_init_queue(entries, q, true); - if (optname == XDP_UMEM_FILL_RING) - xp_set_fq(xs->pool, *q); mutex_unlock(&xs->mutex); return err; } @@ -1000,8 +1004,8 @@ static int xsk_mmap(struct file *file, struct socket *sock, loff_t offset = (loff_t)vma->vm_pgoff << PAGE_SHIFT; unsigned long size = vma->vm_end - vma->vm_start; struct xdp_sock *xs = xdp_sk(sock->sk); + struct xsk_buff_pool *pool = xs->pool; struct xsk_queue *q = NULL; - struct xdp_umem *umem; unsigned long pfn; struct page *qpg; @@ -1013,16 +1017,12 @@ static int xsk_mmap(struct file *file, struct socket *sock, } else if (offset == XDP_PGOFF_TX_RING) { q = READ_ONCE(xs->tx); } else { - umem = READ_ONCE(xs->umem); - if (!umem) - return -EINVAL; - /* Matches the smp_wmb() in XDP_UMEM_REG */ smp_rmb(); if (offset == XDP_UMEM_PGOFF_FILL_RING) - q = READ_ONCE(umem->fq); + q = READ_ONCE(pool->fq); else if (offset == XDP_UMEM_PGOFF_COMPLETION_RING) - q = READ_ONCE(umem->cq); + q = READ_ONCE(pool->cq); } if (!q) diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c index da93b36..6a6e0d5 100644 --- a/net/xdp/xsk_buff_pool.c +++ b/net/xdp/xsk_buff_pool.c @@ -89,11 +89,6 @@ struct xsk_buff_pool *xp_assign_umem(struct xsk_buff_pool *pool_old, return NULL; } -void xp_set_fq(struct xsk_buff_pool *pool, struct xsk_queue *fq) -{ - pool->fq = fq; -} - void xp_set_rxq_info(struct xsk_buff_pool *pool, struct xdp_rxq_info *rxq) { u32 i; @@ -197,6 +192,16 @@ static void xp_release_deferred(struct work_struct *work) xp_clear_dev(pool); rtnl_unlock(); + if (pool->fq) { + xskq_destroy(pool->fq); + pool->fq = NULL; + } + + if (pool->cq) { + xskq_destroy(pool->cq); + pool->cq = NULL; + } + xdp_put_umem(pool->umem); xp_destroy(pool); } @@ -217,6 +222,11 @@ void xp_put_pool(struct xsk_buff_pool *pool) } } +bool xp_validate_queues(struct xsk_buff_pool *pool) +{ + return pool->fq && pool->cq; +} + void xp_dma_unmap(struct xsk_buff_pool *pool, unsigned long attrs) { dma_addr_t *dma; diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c index 0163b26..1936423 100644 --- a/net/xdp/xsk_diag.c +++ b/net/xdp/xsk_diag.c @@ -46,6 +46,7 @@ static int xsk_diag_put_rings_cfg(const struct xdp_sock *xs, static int xsk_diag_put_umem(const struct xdp_sock *xs, struct sk_buff *nlskb) { + struct xsk_buff_pool *pool = xs->pool; struct xdp_umem *umem = xs->umem; struct xdp_diag_umem du = {}; int err; @@ -67,10 +68,11 @@ static int xsk_diag_put_umem(const struct xdp_sock *xs, struct sk_buff *nlskb) err = nla_put(nlskb, XDP_DIAG_UMEM, sizeof(du), &du); - if (!err && umem->fq) - err = xsk_diag_put_ring(umem->fq, XDP_DIAG_UMEM_FILL_RING, nlskb); - if (!err && umem->cq) { - err = xsk_diag_put_ring(umem->cq, XDP_DIAG_UMEM_COMPLETION_RING, + if (!err && pool->fq) + err = xsk_diag_put_ring(pool->fq, + XDP_DIAG_UMEM_FILL_RING, nlskb); + if (!err && pool->cq) { + err = xsk_diag_put_ring(pool->cq, XDP_DIAG_UMEM_COMPLETION_RING, nlskb); } return err; From patchwork Thu Jul 2 12:19:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Magnus Karlsson X-Patchwork-Id: 1321364 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: incoming-bpf@patchwork.ozlabs.org Delivered-To: patchwork-incoming-bpf@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=bpf-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49yHGT3zsNz9sSd for ; Thu, 2 Jul 2020 22:19:45 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728908AbgGBMTp (ORCPT ); Thu, 2 Jul 2020 08:19:45 -0400 Received: from mga12.intel.com ([192.55.52.136]:6897 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728832AbgGBMTo (ORCPT ); Thu, 2 Jul 2020 08:19:44 -0400 IronPort-SDR: RZI79Q56G2ITzxoY3vnWk7jrtPXuiudBqakE2wFGLqwlJwRjcqM0AdHzDod26gYyHPRBoZAg3u fCQ8Uidn9Fhg== X-IronPort-AV: E=McAfee;i="6000,8403,9669"; a="126486087" X-IronPort-AV: E=Sophos;i="5.75,304,1589266800"; d="scan'208";a="126486087" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Jul 2020 05:19:44 -0700 IronPort-SDR: jYmiWH16iDCEo2oyKNLKQl3HilmRQzQRJ82uahogbNtZMiXcRs1NatXkFlsyVVTda2k41jQQHr fCEWCE2JXyYQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,304,1589266800"; d="scan'208";a="425933311" Received: from mkarlsso-mobl.ger.corp.intel.com (HELO localhost.localdomain) ([10.252.39.242]) by orsmga004.jf.intel.com with ESMTP; 02 Jul 2020 05:19:40 -0700 From: Magnus Karlsson To: magnus.karlsson@intel.com, bjorn.topel@intel.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org, jonathan.lemon@gmail.com, maximmi@mellanox.com Cc: bpf@vger.kernel.org, jeffrey.t.kirsher@intel.com, maciej.fijalkowski@intel.com, maciejromanfijalkowski@gmail.com, cristian.dumitrescu@intel.com Subject: [PATCH bpf-next 05/14] xsk: move queue_id, dev and need_wakeup to context Date: Thu, 2 Jul 2020 14:19:04 +0200 Message-Id: <1593692353-15102-6-git-send-email-magnus.karlsson@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1593692353-15102-1-git-send-email-magnus.karlsson@intel.com> References: <1593692353-15102-1-git-send-email-magnus.karlsson@intel.com> Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Move queue_id, dev, and need_wakeup from the umem to the buffer pool. This so that we in a later commit can share the umem between multiple HW queues. There is one buffer pool per dev and queue id, so these variables should belong to the buffer pool, not the umem. Need_wakeup is also something that is set on a per napi level, so there is usually one per device and queue id. So move this to the buffer pool too. Signed-off-by: Magnus Karlsson --- include/net/xdp_sock.h | 3 --- include/net/xsk_buff_pool.h | 4 ++++ net/xdp/xdp_umem.c | 19 +------------------ net/xdp/xdp_umem.h | 4 ---- net/xdp/xsk.c | 40 +++++++++++++++------------------------- net/xdp/xsk_buff_pool.c | 37 +++++++++++++++++++++---------------- net/xdp/xsk_diag.c | 4 ++-- 7 files changed, 43 insertions(+), 68 deletions(-) diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index 2dd3fd9..e12d814 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -26,11 +26,8 @@ struct xdp_umem { refcount_t users; struct page **pgs; u32 npgs; - u16 queue_id; - u8 need_wakeup; u8 flags; int id; - struct net_device *dev; bool zc; spinlock_t xsk_tx_list_lock; struct list_head xsk_tx_list; diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h index f811e25..cd929a8 100644 --- a/include/net/xsk_buff_pool.h +++ b/include/net/xsk_buff_pool.h @@ -43,11 +43,15 @@ struct xsk_buff_pool { u32 headroom; u32 chunk_size; u32 frame_len; + u16 queue_id; + u8 cached_need_wakeup; + bool uses_need_wakeup; bool cheap_dma; bool unaligned; struct xdp_umem *umem; void *addrs; struct device *dev; + struct net_device *netdev; refcount_t users; struct work_struct work; struct xdp_buff_xsk *free_heads[]; diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c index 7d86a63..b1699d0 100644 --- a/net/xdp/xdp_umem.c +++ b/net/xdp/xdp_umem.c @@ -63,26 +63,9 @@ static void xdp_umem_unaccount_pages(struct xdp_umem *umem) } } -void xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, - u16 queue_id) -{ - umem->dev = dev; - umem->queue_id = queue_id; - - dev_hold(dev); -} - -void xdp_umem_clear_dev(struct xdp_umem *umem) -{ - dev_put(umem->dev); - umem->dev = NULL; - umem->zc = false; -} - static void xdp_umem_release(struct xdp_umem *umem) { - xdp_umem_clear_dev(umem); - + umem->zc = false; ida_simple_remove(&umem_ida, umem->id); xdp_umem_unpin_pages(umem); diff --git a/net/xdp/xdp_umem.h b/net/xdp/xdp_umem.h index 93e96be..67bf3f3 100644 --- a/net/xdp/xdp_umem.h +++ b/net/xdp/xdp_umem.h @@ -8,10 +8,6 @@ #include -void xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, - u16 queue_id); -void xdp_umem_clear_dev(struct xdp_umem *umem); -bool xdp_umem_validate_queues(struct xdp_umem *umem); void xdp_get_umem(struct xdp_umem *umem); void xdp_put_umem(struct xdp_umem *umem); void xdp_add_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs); diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 92f05b0..b02ed96 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -41,67 +41,61 @@ bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs) void xsk_set_rx_need_wakeup(struct xsk_buff_pool *pool) { - struct xdp_umem *umem = pool->umem; - - if (umem->need_wakeup & XDP_WAKEUP_RX) + if (pool->cached_need_wakeup & XDP_WAKEUP_RX) return; pool->fq->ring->flags |= XDP_RING_NEED_WAKEUP; - umem->need_wakeup |= XDP_WAKEUP_RX; + pool->cached_need_wakeup |= XDP_WAKEUP_RX; } EXPORT_SYMBOL(xsk_set_rx_need_wakeup); void xsk_set_tx_need_wakeup(struct xsk_buff_pool *pool) { - struct xdp_umem *umem = pool->umem; struct xdp_sock *xs; - if (umem->need_wakeup & XDP_WAKEUP_TX) + if (pool->cached_need_wakeup & XDP_WAKEUP_TX) return; rcu_read_lock(); - list_for_each_entry_rcu(xs, &umem->xsk_tx_list, list) { + list_for_each_entry_rcu(xs, &xs->umem->xsk_tx_list, list) { xs->tx->ring->flags |= XDP_RING_NEED_WAKEUP; } rcu_read_unlock(); - umem->need_wakeup |= XDP_WAKEUP_TX; + pool->cached_need_wakeup |= XDP_WAKEUP_TX; } EXPORT_SYMBOL(xsk_set_tx_need_wakeup); void xsk_clear_rx_need_wakeup(struct xsk_buff_pool *pool) { - struct xdp_umem *umem = pool->umem; - - if (!(umem->need_wakeup & XDP_WAKEUP_RX)) + if (!(pool->cached_need_wakeup & XDP_WAKEUP_RX)) return; pool->fq->ring->flags &= ~XDP_RING_NEED_WAKEUP; - umem->need_wakeup &= ~XDP_WAKEUP_RX; + pool->cached_need_wakeup &= ~XDP_WAKEUP_RX; } EXPORT_SYMBOL(xsk_clear_rx_need_wakeup); void xsk_clear_tx_need_wakeup(struct xsk_buff_pool *pool) { - struct xdp_umem *umem = pool->umem; struct xdp_sock *xs; - if (!(umem->need_wakeup & XDP_WAKEUP_TX)) + if (!(pool->cached_need_wakeup & XDP_WAKEUP_TX)) return; rcu_read_lock(); - list_for_each_entry_rcu(xs, &umem->xsk_tx_list, list) { + list_for_each_entry_rcu(xs, &xs->umem->xsk_tx_list, list) { xs->tx->ring->flags &= ~XDP_RING_NEED_WAKEUP; } rcu_read_unlock(); - umem->need_wakeup &= ~XDP_WAKEUP_TX; + pool->cached_need_wakeup &= ~XDP_WAKEUP_TX; } EXPORT_SYMBOL(xsk_clear_tx_need_wakeup); bool xsk_uses_need_wakeup(struct xsk_buff_pool *pool) { - return pool->umem->flags & XDP_UMEM_USES_NEED_WAKEUP; + return pool->uses_need_wakeup; } EXPORT_SYMBOL(xsk_uses_need_wakeup); @@ -474,16 +468,16 @@ static __poll_t xsk_poll(struct file *file, struct socket *sock, __poll_t mask = datagram_poll(file, sock, wait); struct sock *sk = sock->sk; struct xdp_sock *xs = xdp_sk(sk); - struct xdp_umem *umem; + struct xsk_buff_pool *pool; if (unlikely(!xsk_is_bound(xs))) return mask; - umem = xs->umem; + pool = xs->pool; - if (umem->need_wakeup) { + if (pool->cached_need_wakeup) { if (xs->zc) - xsk_wakeup(xs, umem->need_wakeup); + xsk_wakeup(xs, pool->cached_need_wakeup); else /* Poll needs to drive Tx also in copy mode */ __xsk_sendmsg(sk); @@ -725,18 +719,15 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) struct xsk_buff_pool *new_pool; /* This xsk has its own umem. */ - xdp_umem_assign_dev(xs->umem, dev, qid); new_pool = xp_assign_umem(xs->pool, xs->umem); if (!new_pool) { err = -ENOMEM; - xdp_umem_clear_dev(xs->umem); goto out_unlock; } err = xp_assign_dev(new_pool, xs, dev, qid, flags); if (err) { xp_destroy(new_pool); - xdp_umem_clear_dev(xs->umem); goto out_unlock; } xs->pool = new_pool; @@ -1062,7 +1053,6 @@ static int xsk_notifier(struct notifier_block *this, /* Clear device references. */ xp_clear_dev(xs->pool); - xdp_umem_clear_dev(xs->umem); } mutex_unlock(&xs->mutex); } diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c index 6a6e0d5..e0a49fc 100644 --- a/net/xdp/xsk_buff_pool.c +++ b/net/xdp/xsk_buff_pool.c @@ -99,9 +99,8 @@ void xp_set_rxq_info(struct xsk_buff_pool *pool, struct xdp_rxq_info *rxq) EXPORT_SYMBOL(xp_set_rxq_info); int xp_assign_dev(struct xsk_buff_pool *pool, struct xdp_sock *xs, - struct net_device *dev, u16 queue_id, u16 flags) + struct net_device *netdev, u16 queue_id, u16 flags) { - struct xdp_umem *umem = pool->umem; bool force_zc, force_copy; struct netdev_bpf bpf; int err = 0; @@ -114,15 +113,15 @@ int xp_assign_dev(struct xsk_buff_pool *pool, struct xdp_sock *xs, if (force_zc && force_copy) return -EINVAL; - if (xsk_get_pool_from_qid(dev, queue_id)) + if (xsk_get_pool_from_qid(netdev, queue_id)) return -EBUSY; - err = xsk_reg_pool_at_qid(dev, pool, queue_id); + err = xsk_reg_pool_at_qid(netdev, pool, queue_id); if (err) return err; if ((flags & XDP_USE_NEED_WAKEUP) && xs->tx) { - umem->flags |= XDP_UMEM_USES_NEED_WAKEUP; + pool->uses_need_wakeup = true; /* Tx needs to be explicitly woken up the first time. * Also for supporting drivers that do not implement this * feature. They will always have to call sendto(). @@ -130,11 +129,14 @@ int xp_assign_dev(struct xsk_buff_pool *pool, struct xdp_sock *xs, xs->tx->ring->flags |= XDP_RING_NEED_WAKEUP; } + dev_hold(netdev); + if (force_copy) /* For copy-mode, we are done. */ return 0; - if (!dev->netdev_ops->ndo_bpf || !dev->netdev_ops->ndo_xsk_wakeup) { + if (!netdev->netdev_ops->ndo_bpf || + !netdev->netdev_ops->ndo_xsk_wakeup) { err = -EOPNOTSUPP; goto err_unreg_pool; } @@ -143,44 +145,47 @@ int xp_assign_dev(struct xsk_buff_pool *pool, struct xdp_sock *xs, bpf.xsk.pool = pool; bpf.xsk.queue_id = queue_id; - err = dev->netdev_ops->ndo_bpf(dev, &bpf); + err = netdev->netdev_ops->ndo_bpf(netdev, &bpf); if (err) goto err_unreg_pool; - umem->zc = true; + pool->netdev = netdev; + pool->queue_id = queue_id; + pool->umem->zc = true; return 0; err_unreg_pool: if (!force_zc) err = 0; /* fallback to copy mode */ if (err) - xsk_clear_pool_at_qid(dev, queue_id); + xsk_clear_pool_at_qid(netdev, queue_id); return err; } void xp_clear_dev(struct xsk_buff_pool *pool) { - struct xdp_umem *umem = pool->umem; struct netdev_bpf bpf; int err; ASSERT_RTNL(); - if (!umem->dev) + if (!pool->netdev) return; - if (umem->zc) { + if (pool->umem->zc) { bpf.command = XDP_SETUP_XSK_POOL; bpf.xsk.pool = NULL; - bpf.xsk.queue_id = umem->queue_id; + bpf.xsk.queue_id = pool->queue_id; - err = umem->dev->netdev_ops->ndo_bpf(umem->dev, &bpf); + err = pool->netdev->netdev_ops->ndo_bpf(pool->netdev, &bpf); if (err) - WARN(1, "failed to disable umem!\n"); + WARN(1, "Failed to disable zero-copy!\n"); } - xsk_clear_pool_at_qid(umem->dev, umem->queue_id); + xsk_clear_pool_at_qid(pool->netdev, pool->queue_id); + dev_put(pool->netdev); + pool->netdev = NULL; } static void xp_release_deferred(struct work_struct *work) diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c index 1936423..c974295 100644 --- a/net/xdp/xsk_diag.c +++ b/net/xdp/xsk_diag.c @@ -59,8 +59,8 @@ static int xsk_diag_put_umem(const struct xdp_sock *xs, struct sk_buff *nlskb) du.num_pages = umem->npgs; du.chunk_size = umem->chunk_size; du.headroom = umem->headroom; - du.ifindex = umem->dev ? umem->dev->ifindex : 0; - du.queue_id = umem->queue_id; + du.ifindex = pool->netdev ? pool->netdev->ifindex : 0; + du.queue_id = pool->queue_id; du.flags = 0; if (umem->zc) du.flags |= XDP_DU_F_ZEROCOPY; From patchwork Thu Jul 2 12:19:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Magnus Karlsson X-Patchwork-Id: 1321366 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: incoming-bpf@patchwork.ozlabs.org Delivered-To: patchwork-incoming-bpf@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=bpf-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49yHGY4bKqz9sR4 for ; Thu, 2 Jul 2020 22:19:49 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729036AbgGBMTt (ORCPT ); Thu, 2 Jul 2020 08:19:49 -0400 Received: from mga12.intel.com ([192.55.52.136]:6897 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728832AbgGBMTs (ORCPT ); Thu, 2 Jul 2020 08:19:48 -0400 IronPort-SDR: jiiKPVRQzidjZfd0uv/4rMPQECybH1KTfYJE4bGWIn39Xl1tZAJy3/i6Dl7fvOlc7cH/AwsDko BbhpRm1FdB3w== X-IronPort-AV: E=McAfee;i="6000,8403,9669"; a="126486093" X-IronPort-AV: E=Sophos;i="5.75,304,1589266800"; d="scan'208";a="126486093" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Jul 2020 05:19:47 -0700 IronPort-SDR: ZLqsksi0iC6SyDBLF5hmJQkfetLBIJNstF2OSXYkFbfWqSlQrYF1XGKit9jYfQgpK07hEd7BHW oivj7E+r2HNw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,304,1589266800"; d="scan'208";a="425933322" Received: from mkarlsso-mobl.ger.corp.intel.com (HELO localhost.localdomain) ([10.252.39.242]) by orsmga004.jf.intel.com with ESMTP; 02 Jul 2020 05:19:44 -0700 From: Magnus Karlsson To: magnus.karlsson@intel.com, bjorn.topel@intel.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org, jonathan.lemon@gmail.com, maximmi@mellanox.com Cc: bpf@vger.kernel.org, jeffrey.t.kirsher@intel.com, maciej.fijalkowski@intel.com, maciejromanfijalkowski@gmail.com, cristian.dumitrescu@intel.com Subject: [PATCH bpf-next 06/14] xsk: move xsk_tx_list and its lock to buffer pool Date: Thu, 2 Jul 2020 14:19:05 +0200 Message-Id: <1593692353-15102-7-git-send-email-magnus.karlsson@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1593692353-15102-1-git-send-email-magnus.karlsson@intel.com> References: <1593692353-15102-1-git-send-email-magnus.karlsson@intel.com> Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Move the xsk_tx_list and the xsk_tx_list_lock from the umem to the buffer pool. This so that we in a later commit can share the umem between multiple HW queues. There is one xsk_tx_list per device and queue id, so it should be located in the buffer pool. Signed-off-by: Magnus Karlsson --- include/net/xdp_sock.h | 4 +--- include/net/xsk_buff_pool.h | 5 +++++ net/xdp/xdp_umem.c | 26 -------------------------- net/xdp/xdp_umem.h | 2 -- net/xdp/xsk.c | 13 ++++++------- net/xdp/xsk_buff_pool.c | 26 ++++++++++++++++++++++++++ 6 files changed, 38 insertions(+), 38 deletions(-) diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index e12d814..471719d 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -29,8 +29,6 @@ struct xdp_umem { u8 flags; int id; bool zc; - spinlock_t xsk_tx_list_lock; - struct list_head xsk_tx_list; }; struct xsk_map { @@ -57,7 +55,7 @@ struct xdp_sock { /* Protects multiple processes in the control path */ struct mutex mutex; struct xsk_queue *tx ____cacheline_aligned_in_smp; - struct list_head list; + struct list_head tx_list; /* Mutual exclusion of NAPI TX thread and sendmsg error paths * in the SKB destructor callback. */ diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h index cd929a8..6158a47 100644 --- a/include/net/xsk_buff_pool.h +++ b/include/net/xsk_buff_pool.h @@ -52,6 +52,9 @@ struct xsk_buff_pool { void *addrs; struct device *dev; struct net_device *netdev; + struct list_head xsk_tx_list; + /* Protects modifications to the xsk_tx_list */ + spinlock_t xsk_tx_list_lock; refcount_t users; struct work_struct work; struct xdp_buff_xsk *free_heads[]; @@ -69,6 +72,8 @@ void xp_get_pool(struct xsk_buff_pool *pool); void xp_put_pool(struct xsk_buff_pool *pool); void xp_clear_dev(struct xsk_buff_pool *pool); bool xp_validate_queues(struct xsk_buff_pool *pool); +void xp_add_xsk(struct xsk_buff_pool *pool, struct xdp_sock *xs); +void xp_del_xsk(struct xsk_buff_pool *pool, struct xdp_sock *xs); /* AF_XDP, and XDP core. */ void xp_free(struct xdp_buff_xsk *xskb); diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c index b1699d0..a871c75 100644 --- a/net/xdp/xdp_umem.c +++ b/net/xdp/xdp_umem.c @@ -23,30 +23,6 @@ static DEFINE_IDA(umem_ida); -void xdp_add_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs) -{ - unsigned long flags; - - if (!xs->tx) - return; - - spin_lock_irqsave(&umem->xsk_tx_list_lock, flags); - list_add_rcu(&xs->list, &umem->xsk_tx_list); - spin_unlock_irqrestore(&umem->xsk_tx_list_lock, flags); -} - -void xdp_del_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs) -{ - unsigned long flags; - - if (!xs->tx) - return; - - spin_lock_irqsave(&umem->xsk_tx_list_lock, flags); - list_del_rcu(&xs->list); - spin_unlock_irqrestore(&umem->xsk_tx_list_lock, flags); -} - static void xdp_umem_unpin_pages(struct xdp_umem *umem) { unpin_user_pages_dirty_lock(umem->pgs, umem->npgs, true); @@ -206,8 +182,6 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr) umem->pgs = NULL; umem->user = NULL; umem->flags = mr->flags; - INIT_LIST_HEAD(&umem->xsk_tx_list); - spin_lock_init(&umem->xsk_tx_list_lock); refcount_set(&umem->users, 1); diff --git a/net/xdp/xdp_umem.h b/net/xdp/xdp_umem.h index 67bf3f3..181fdda 100644 --- a/net/xdp/xdp_umem.h +++ b/net/xdp/xdp_umem.h @@ -10,8 +10,6 @@ void xdp_get_umem(struct xdp_umem *umem); void xdp_put_umem(struct xdp_umem *umem); -void xdp_add_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs); -void xdp_del_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs); struct xdp_umem *xdp_umem_create(struct xdp_umem_reg *mr); #endif /* XDP_UMEM_H_ */ diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index b02ed96..4d0028c 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -57,7 +57,7 @@ void xsk_set_tx_need_wakeup(struct xsk_buff_pool *pool) return; rcu_read_lock(); - list_for_each_entry_rcu(xs, &xs->umem->xsk_tx_list, list) { + list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) { xs->tx->ring->flags |= XDP_RING_NEED_WAKEUP; } rcu_read_unlock(); @@ -84,7 +84,7 @@ void xsk_clear_tx_need_wakeup(struct xsk_buff_pool *pool) return; rcu_read_lock(); - list_for_each_entry_rcu(xs, &xs->umem->xsk_tx_list, list) { + list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) { xs->tx->ring->flags &= ~XDP_RING_NEED_WAKEUP; } rcu_read_unlock(); @@ -300,7 +300,7 @@ void xsk_tx_release(struct xsk_buff_pool *pool) struct xdp_sock *xs; rcu_read_lock(); - list_for_each_entry_rcu(xs, &pool->umem->xsk_tx_list, list) { + list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) { __xskq_cons_release(xs->tx); xs->sk.sk_write_space(&xs->sk); } @@ -310,11 +310,10 @@ EXPORT_SYMBOL(xsk_tx_release); bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc) { - struct xdp_umem *umem = pool->umem; struct xdp_sock *xs; rcu_read_lock(); - list_for_each_entry_rcu(xs, &umem->xsk_tx_list, list) { + list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) { if (!xskq_cons_peek_desc(xs->tx, desc, pool)) continue; @@ -518,7 +517,7 @@ static void xsk_unbind_dev(struct xdp_sock *xs) WRITE_ONCE(xs->state, XSK_UNBOUND); /* Wait for driver to stop using the xdp socket. */ - xdp_del_sk_umem(xs->umem, xs); + xp_del_xsk(xs->pool, xs); xs->dev = NULL; synchronize_net(); dev_put(dev); @@ -736,7 +735,7 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) xs->dev = dev; xs->zc = xs->umem->zc; xs->queue_id = qid; - xdp_add_sk_umem(xs->umem, xs); + xp_add_xsk(xs->pool, xs); out_unlock: if (err) { diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c index e0a49fc..31dd337 100644 --- a/net/xdp/xsk_buff_pool.c +++ b/net/xdp/xsk_buff_pool.c @@ -11,6 +11,30 @@ #include "xdp_umem.h" #include "xsk.h" +void xp_add_xsk(struct xsk_buff_pool *pool, struct xdp_sock *xs) +{ + unsigned long flags; + + if (!xs->tx) + return; + + spin_lock_irqsave(&pool->xsk_tx_list_lock, flags); + list_add_rcu(&xs->tx_list, &pool->xsk_tx_list); + spin_unlock_irqrestore(&pool->xsk_tx_list_lock, flags); +} + +void xp_del_xsk(struct xsk_buff_pool *pool, struct xdp_sock *xs) +{ + unsigned long flags; + + if (!xs->tx) + return; + + spin_lock_irqsave(&pool->xsk_tx_list_lock, flags); + list_del_rcu(&xs->tx_list); + spin_unlock_irqrestore(&pool->xsk_tx_list_lock, flags); +} + static void xp_addr_unmap(struct xsk_buff_pool *pool) { vunmap(pool->addrs); @@ -71,6 +95,8 @@ struct xsk_buff_pool *xp_assign_umem(struct xsk_buff_pool *pool_old, XDP_PACKET_HEADROOM; pool->umem = umem; INIT_LIST_HEAD(&pool->free_list); + INIT_LIST_HEAD(&pool->xsk_tx_list); + spin_lock_init(&pool->xsk_tx_list_lock); refcount_set(&pool->users, 1); for (i = 0; i < pool->free_heads_cnt; i++) { From patchwork Thu Jul 2 12:19:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Magnus Karlsson X-Patchwork-Id: 1321368 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49yHGf4fLBz9sR4 for ; Thu, 2 Jul 2020 22:19:54 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729057AbgGBMTw (ORCPT ); Thu, 2 Jul 2020 08:19:52 -0400 Received: from mga12.intel.com ([192.55.52.136]:6897 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728832AbgGBMTv (ORCPT ); Thu, 2 Jul 2020 08:19:51 -0400 IronPort-SDR: wJ75QKrJUx5k24q6rwHy5pok6qrHwCw4WBwieilxkh/ytopu4tA4jks3TZ8ENi7Hkp7E59c8Xi /3QA0BmL67WA== X-IronPort-AV: E=McAfee;i="6000,8403,9669"; a="126486098" X-IronPort-AV: E=Sophos;i="5.75,304,1589266800"; d="scan'208";a="126486098" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Jul 2020 05:19:51 -0700 IronPort-SDR: UCcaUdaiBYXFVwxorX42RwcWbC3Sr6e7BHPGJ3a36P7aRHuCSSTrdDqDxGeBNET9YOTSjpTVOR 5xwmDTV7dSzw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,304,1589266800"; d="scan'208";a="425933334" Received: from mkarlsso-mobl.ger.corp.intel.com (HELO localhost.localdomain) ([10.252.39.242]) by orsmga004.jf.intel.com with ESMTP; 02 Jul 2020 05:19:47 -0700 From: Magnus Karlsson To: magnus.karlsson@intel.com, bjorn.topel@intel.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org, jonathan.lemon@gmail.com, maximmi@mellanox.com Cc: bpf@vger.kernel.org, jeffrey.t.kirsher@intel.com, maciej.fijalkowski@intel.com, maciejromanfijalkowski@gmail.com, cristian.dumitrescu@intel.com Subject: [PATCH bpf-next 07/14] xsk: move addrs from buffer pool to umem Date: Thu, 2 Jul 2020 14:19:06 +0200 Message-Id: <1593692353-15102-8-git-send-email-magnus.karlsson@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1593692353-15102-1-git-send-email-magnus.karlsson@intel.com> References: <1593692353-15102-1-git-send-email-magnus.karlsson@intel.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Replicate the addrs pointer in the buffer pool to the umem. This mapping will be the same for all buffer pools sharing the same umem. In the buffer pool we leave the addrs pointer for performance reasons. Signed-off-by: Magnus Karlsson --- include/net/xdp_sock.h | 1 + net/xdp/xdp_umem.c | 22 ++++++++++++++++++++++ net/xdp/xsk_buff_pool.c | 21 ++------------------- 3 files changed, 25 insertions(+), 19 deletions(-) diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index 471719d..d2fddf2 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -18,6 +18,7 @@ struct xsk_queue; struct xdp_buff; struct xdp_umem { + void *addrs; u64 size; u32 headroom; u32 chunk_size; diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c index a871c75..372998d 100644 --- a/net/xdp/xdp_umem.c +++ b/net/xdp/xdp_umem.c @@ -39,11 +39,27 @@ static void xdp_umem_unaccount_pages(struct xdp_umem *umem) } } +static void xdp_umem_addr_unmap(struct xdp_umem *umem) +{ + vunmap(umem->addrs); + umem->addrs = NULL; +} + +static int xdp_umem_addr_map(struct xdp_umem *umem, struct page **pages, + u32 nr_pages) +{ + umem->addrs = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL); + if (!umem->addrs) + return -ENOMEM; + return 0; +} + static void xdp_umem_release(struct xdp_umem *umem) { umem->zc = false; ida_simple_remove(&umem_ida, umem->id); + xdp_umem_addr_unmap(umem); xdp_umem_unpin_pages(umem); xdp_umem_unaccount_pages(umem); @@ -193,8 +209,14 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr) if (err) goto out_account; + err = xdp_umem_addr_map(umem, umem->pgs, umem->npgs); + if (err) + goto out_unpin; + return 0; +out_unpin: + xdp_umem_unpin_pages(umem); out_account: xdp_umem_unaccount_pages(umem); return err; diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c index 31dd337..ae27664 100644 --- a/net/xdp/xsk_buff_pool.c +++ b/net/xdp/xsk_buff_pool.c @@ -35,26 +35,11 @@ void xp_del_xsk(struct xsk_buff_pool *pool, struct xdp_sock *xs) spin_unlock_irqrestore(&pool->xsk_tx_list_lock, flags); } -static void xp_addr_unmap(struct xsk_buff_pool *pool) -{ - vunmap(pool->addrs); -} - -static int xp_addr_map(struct xsk_buff_pool *pool, - struct page **pages, u32 nr_pages) -{ - pool->addrs = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL); - if (!pool->addrs) - return -ENOMEM; - return 0; -} - void xp_destroy(struct xsk_buff_pool *pool) { if (!pool) return; - xp_addr_unmap(pool); kvfree(pool->heads); kvfree(pool); } @@ -69,7 +54,6 @@ struct xsk_buff_pool *xp_assign_umem(struct xsk_buff_pool *pool_old, { struct xsk_buff_pool *pool; struct xdp_buff_xsk *xskb; - int err; u32 i; pool = kvzalloc(struct_size(pool, free_heads, umem->chunks), @@ -94,6 +78,7 @@ struct xsk_buff_pool *xp_assign_umem(struct xsk_buff_pool *pool_old, pool->frame_len = umem->chunk_size - umem->headroom - XDP_PACKET_HEADROOM; pool->umem = umem; + pool->addrs = umem->addrs; INIT_LIST_HEAD(&pool->free_list); INIT_LIST_HEAD(&pool->xsk_tx_list); spin_lock_init(&pool->xsk_tx_list_lock); @@ -106,9 +91,7 @@ struct xsk_buff_pool *xp_assign_umem(struct xsk_buff_pool *pool_old, pool->free_heads[i] = xskb; } - err = xp_addr_map(pool, umem->pgs, umem->npgs); - if (!err) - return pool; + return pool; out: xp_destroy(pool); From patchwork Thu Jul 2 12:19:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Magnus Karlsson X-Patchwork-Id: 1321369 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49yHGl3lcRz9sR4 for ; Thu, 2 Jul 2020 22:19:59 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728919AbgGBMT5 (ORCPT ); Thu, 2 Jul 2020 08:19:57 -0400 Received: from mga12.intel.com ([192.55.52.136]:6897 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728906AbgGBMTz (ORCPT ); Thu, 2 Jul 2020 08:19:55 -0400 IronPort-SDR: FnFqz9/RRuQj338clEBeNKtO+r0Yi4/fF2z8b4N1mIwI94J9G9Cat2DMqaEjN6kMNkZJWRy0Hc SCUojVmbfa4Q== X-IronPort-AV: E=McAfee;i="6000,8403,9669"; a="126486108" X-IronPort-AV: E=Sophos;i="5.75,304,1589266800"; d="scan'208";a="126486108" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Jul 2020 05:19:54 -0700 IronPort-SDR: 81IknkhrmGAvjTJa+7ITokXw7xpvVezWAbZhNwzgmMIRtYaBa1eN4sml1r5KKLtVN2TBWDAV6z jt4UKJs32+Gw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,304,1589266800"; d="scan'208";a="425933345" Received: from mkarlsso-mobl.ger.corp.intel.com (HELO localhost.localdomain) ([10.252.39.242]) by orsmga004.jf.intel.com with ESMTP; 02 Jul 2020 05:19:51 -0700 From: Magnus Karlsson To: magnus.karlsson@intel.com, bjorn.topel@intel.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org, jonathan.lemon@gmail.com, maximmi@mellanox.com Cc: bpf@vger.kernel.org, jeffrey.t.kirsher@intel.com, maciej.fijalkowski@intel.com, maciejromanfijalkowski@gmail.com, cristian.dumitrescu@intel.com Subject: [PATCH bpf-next 08/14] xsk: net: enable sharing of dma mappings Date: Thu, 2 Jul 2020 14:19:07 +0200 Message-Id: <1593692353-15102-9-git-send-email-magnus.karlsson@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1593692353-15102-1-git-send-email-magnus.karlsson@intel.com> References: <1593692353-15102-1-git-send-email-magnus.karlsson@intel.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Enable the sharing of dma mappings by moving them out of the umem structure. Instead we put each dma mapped umem region in a list in the netdev structure. If dma has already been mapped for this umem and device, it is not mapped again and the existing dma mappings are reused. Signed-off-by: Magnus Karlsson --- include/linux/netdevice.h | 3 ++ include/net/xsk_buff_pool.h | 7 +++ net/core/dev.c | 3 ++ net/xdp/xsk_buff_pool.c | 112 ++++++++++++++++++++++++++++++++++++-------- 4 files changed, 106 insertions(+), 19 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index e5acc3b..fd794aa 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2006,6 +2006,9 @@ struct net_device { unsigned int real_num_rx_queues; struct bpf_prog __rcu *xdp_prog; +#ifdef CONFIG_XDP_SOCKETS + struct list_head xsk_dma_list; +#endif unsigned long gro_flush_timeout; int napi_defer_hard_irqs; rx_handler_func_t __rcu *rx_handler; diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h index 6158a47..197cca8 100644 --- a/include/net/xsk_buff_pool.h +++ b/include/net/xsk_buff_pool.h @@ -28,6 +28,13 @@ struct xdp_buff_xsk { struct list_head free_list_node; }; +struct xsk_dma_map { + dma_addr_t *dma_pages; + struct xdp_umem *umem; + refcount_t users; + struct list_head list; /* Protected by the RTNL_LOCK */ +}; + struct xsk_buff_pool { struct xsk_queue *fq; struct xsk_queue *cq; diff --git a/net/core/dev.c b/net/core/dev.c index 6bc2388..fe8a72f 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -9959,6 +9959,9 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, INIT_LIST_HEAD(&dev->ptype_all); INIT_LIST_HEAD(&dev->ptype_specific); INIT_LIST_HEAD(&dev->net_notifier_list); +#ifdef CONFIG_XDP_SOCKETS + INIT_LIST_HEAD(&dev->xsk_dma_list); +#endif #ifdef CONFIG_NET_SCHED hash_init(dev->qdisc_hash); #endif diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c index ae27664..3c58d76 100644 --- a/net/xdp/xsk_buff_pool.c +++ b/net/xdp/xsk_buff_pool.c @@ -107,6 +107,25 @@ void xp_set_rxq_info(struct xsk_buff_pool *pool, struct xdp_rxq_info *rxq) } EXPORT_SYMBOL(xp_set_rxq_info); +static void xp_disable_drv_zc(struct xsk_buff_pool *pool) +{ + struct netdev_bpf bpf; + int err; + + ASSERT_RTNL(); + + if (pool->umem->zc) { + bpf.command = XDP_SETUP_XSK_POOL; + bpf.xsk.pool = NULL; + bpf.xsk.queue_id = pool->queue_id; + + err = pool->netdev->netdev_ops->ndo_bpf(pool->netdev, &bpf); + + if (err) + WARN(1, "Failed to disable zero-copy!\n"); + } +} + int xp_assign_dev(struct xsk_buff_pool *pool, struct xdp_sock *xs, struct net_device *netdev, u16 queue_id, u16 flags) { @@ -125,6 +144,8 @@ int xp_assign_dev(struct xsk_buff_pool *pool, struct xdp_sock *xs, if (xsk_get_pool_from_qid(netdev, queue_id)) return -EBUSY; + pool->netdev = netdev; + pool->queue_id = queue_id; err = xsk_reg_pool_at_qid(netdev, pool, queue_id); if (err) return err; @@ -158,11 +179,15 @@ int xp_assign_dev(struct xsk_buff_pool *pool, struct xdp_sock *xs, if (err) goto err_unreg_pool; - pool->netdev = netdev; - pool->queue_id = queue_id; + if (!pool->dma_pages) { + WARN(1, "Driver did not DMA map zero-copy buffers"); + goto err_unreg_xsk; + } pool->umem->zc = true; return 0; +err_unreg_xsk: + xp_disable_drv_zc(pool); err_unreg_pool: if (!force_zc) err = 0; /* fallback to copy mode */ @@ -173,25 +198,10 @@ int xp_assign_dev(struct xsk_buff_pool *pool, struct xdp_sock *xs, void xp_clear_dev(struct xsk_buff_pool *pool) { - struct netdev_bpf bpf; - int err; - - ASSERT_RTNL(); - if (!pool->netdev) return; - if (pool->umem->zc) { - bpf.command = XDP_SETUP_XSK_POOL; - bpf.xsk.pool = NULL; - bpf.xsk.queue_id = pool->queue_id; - - err = pool->netdev->netdev_ops->ndo_bpf(pool->netdev, &bpf); - - if (err) - WARN(1, "Failed to disable zero-copy!\n"); - } - + xp_disable_drv_zc(pool); xsk_clear_pool_at_qid(pool->netdev, pool->queue_id); dev_put(pool->netdev); pool->netdev = NULL; @@ -241,14 +251,61 @@ bool xp_validate_queues(struct xsk_buff_pool *pool) return pool->fq && pool->cq; } +static struct xsk_dma_map *xp_find_dma_map(struct xsk_buff_pool *pool) +{ + struct xsk_dma_map *dma_map; + + list_for_each_entry(dma_map, &pool->netdev->xsk_dma_list, list) { + if (dma_map->umem == pool->umem) + return dma_map; + } + + return NULL; +} + +static void xp_destroy_dma_map(struct xsk_dma_map *dma_map) +{ + list_del(&dma_map->list); + kfree(dma_map); +} + +static void xp_put_dma_map(struct xsk_dma_map *dma_map) +{ + if (!refcount_dec_and_test(&dma_map->users)) + return; + + xp_destroy_dma_map(dma_map); +} + +static struct xsk_dma_map *xp_create_dma_map(struct xsk_buff_pool *pool) +{ + struct xsk_dma_map *dma_map; + + dma_map = kzalloc(sizeof(*dma_map), GFP_KERNEL); + if (!dma_map) + return NULL; + + dma_map->umem = pool->umem; + refcount_set(&dma_map->users, 1); + list_add(&dma_map->list, &pool->netdev->xsk_dma_list); + return dma_map; +} + void xp_dma_unmap(struct xsk_buff_pool *pool, unsigned long attrs) { + struct xsk_dma_map *dma_map; dma_addr_t *dma; u32 i; if (pool->dma_pages_cnt == 0) return; + dma_map = xp_find_dma_map(pool); + if (!dma_map) { + WARN(1, "Could not find dma_map for device"); + return; + } + for (i = 0; i < pool->dma_pages_cnt; i++) { dma = &pool->dma_pages[i]; if (*dma) { @@ -258,6 +315,7 @@ void xp_dma_unmap(struct xsk_buff_pool *pool, unsigned long attrs) } } + xp_put_dma_map(dma_map); kvfree(pool->dma_pages); pool->dma_pages_cnt = 0; pool->dev = NULL; @@ -321,14 +379,29 @@ static bool xp_check_cheap_dma(struct xsk_buff_pool *pool) int xp_dma_map(struct xsk_buff_pool *pool, struct device *dev, unsigned long attrs, struct page **pages, u32 nr_pages) { + struct xsk_dma_map *dma_map; dma_addr_t dma; u32 i; + dma_map = xp_find_dma_map(pool); + if (dma_map) { + pool->dma_pages = dma_map->dma_pages; + refcount_inc(&dma_map->users); + return 0; + } + + dma_map = xp_create_dma_map(pool); + if (!dma_map) + return -ENOMEM; + pool->dma_pages = kvcalloc(nr_pages, sizeof(*pool->dma_pages), GFP_KERNEL); - if (!pool->dma_pages) + if (!pool->dma_pages) { + xp_destroy_dma_map(dma_map); return -ENOMEM; + } + dma_map->dma_pages = pool->dma_pages; pool->dev = dev; pool->dma_pages_cnt = nr_pages; @@ -337,6 +410,7 @@ int xp_dma_map(struct xsk_buff_pool *pool, struct device *dev, DMA_BIDIRECTIONAL, attrs); if (dma_mapping_error(dev, dma)) { xp_dma_unmap(pool, attrs); + xp_destroy_dma_map(dma_map); return -ENOMEM; } pool->dma_pages[i] = dma; From patchwork Thu Jul 2 12:19:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Magnus Karlsson X-Patchwork-Id: 1321372 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49yHGp3b9kz9sTV for ; Thu, 2 Jul 2020 22:20:02 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729074AbgGBMUA (ORCPT ); Thu, 2 Jul 2020 08:20:00 -0400 Received: from mga12.intel.com ([192.55.52.136]:6897 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728906AbgGBMT7 (ORCPT ); Thu, 2 Jul 2020 08:19:59 -0400 IronPort-SDR: 5o2km7MDkltmLyDFagxhEkCQDE8htYIjzXYcuRF7GlfoY7fcoF9LlvoDTqsX5Bcs7kjC2CMvuU FsUL/kxZIpLg== X-IronPort-AV: E=McAfee;i="6000,8403,9669"; a="126486112" X-IronPort-AV: E=Sophos;i="5.75,304,1589266800"; d="scan'208";a="126486112" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Jul 2020 05:19:58 -0700 IronPort-SDR: y1Yikr3Yzpd4OVO3yEK5g4+1pVHxTOJiOzH4eUYXlAjFmHd3r5houy3UX64SBTgTXzD2FHNBs3 hnopKwtIWmHw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,304,1589266800"; d="scan'208";a="425933351" Received: from mkarlsso-mobl.ger.corp.intel.com (HELO localhost.localdomain) ([10.252.39.242]) by orsmga004.jf.intel.com with ESMTP; 02 Jul 2020 05:19:55 -0700 From: Magnus Karlsson To: magnus.karlsson@intel.com, bjorn.topel@intel.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org, jonathan.lemon@gmail.com, maximmi@mellanox.com Cc: bpf@vger.kernel.org, jeffrey.t.kirsher@intel.com, maciej.fijalkowski@intel.com, maciejromanfijalkowski@gmail.com, cristian.dumitrescu@intel.com Subject: [PATCH bpf-next 09/14] xsk: rearrange internal structs for better performance Date: Thu, 2 Jul 2020 14:19:08 +0200 Message-Id: <1593692353-15102-10-git-send-email-magnus.karlsson@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1593692353-15102-1-git-send-email-magnus.karlsson@intel.com> References: <1593692353-15102-1-git-send-email-magnus.karlsson@intel.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Rearrange the xdp_sock, xdp_umem and xsk_buff_pool structures so that they get smaller and align better to the cache lines. In the previous commits of this patch set, these structs have been reordered with the focus on functionality and simplicity, not performance. This patch improves throughput performance by around 3%. Signed-off-by: Magnus Karlsson --- include/net/xdp_sock.h | 14 +++++++------- include/net/xsk_buff_pool.h | 27 +++++++++++++++------------ 2 files changed, 22 insertions(+), 19 deletions(-) diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index d2fddf2..6c14d48 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -23,13 +23,13 @@ struct xdp_umem { u32 headroom; u32 chunk_size; u32 chunks; + u32 npgs; struct user_struct *user; refcount_t users; - struct page **pgs; - u32 npgs; u8 flags; - int id; bool zc; + struct page **pgs; + int id; }; struct xsk_map { @@ -41,7 +41,7 @@ struct xsk_map { struct xdp_sock { /* struct sock must be the first member of struct xdp_sock */ struct sock sk; - struct xsk_queue *rx; + struct xsk_queue *rx ____cacheline_aligned_in_smp; struct net_device *dev; struct xdp_umem *umem; struct list_head flush_node; @@ -53,8 +53,7 @@ struct xdp_sock { XSK_BOUND, XSK_UNBOUND, } state; - /* Protects multiple processes in the control path */ - struct mutex mutex; + u64 rx_dropped; struct xsk_queue *tx ____cacheline_aligned_in_smp; struct list_head tx_list; /* Mutual exclusion of NAPI TX thread and sendmsg error paths @@ -63,10 +62,11 @@ struct xdp_sock { spinlock_t tx_completion_lock; /* Protects generic receive. */ spinlock_t rx_lock; - u64 rx_dropped; struct list_head map_list; /* Protects map_list */ spinlock_t map_list_lock; + /* Protects multiple processes in the control path */ + struct mutex mutex; }; #ifdef CONFIG_XDP_SOCKETS diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h index 197cca8..7513a17 100644 --- a/include/net/xsk_buff_pool.h +++ b/include/net/xsk_buff_pool.h @@ -36,34 +36,37 @@ struct xsk_dma_map { }; struct xsk_buff_pool { - struct xsk_queue *fq; - struct xsk_queue *cq; + /* Members only used in the control path first. */ + struct device *dev; + struct net_device *netdev; + struct list_head xsk_tx_list; + /* Protects modifications to the xsk_tx_list */ + spinlock_t xsk_tx_list_lock; + refcount_t users; + struct xdp_umem *umem; + struct work_struct work; struct list_head free_list; + u32 heads_cnt; + u16 queue_id; + + /* Data path members as close to free_heads at the end as possible. */ + struct xsk_queue *fq ____cacheline_aligned_in_smp; + struct xsk_queue *cq; dma_addr_t *dma_pages; struct xdp_buff_xsk *heads; u64 chunk_mask; u64 addrs_cnt; u32 free_list_cnt; u32 dma_pages_cnt; - u32 heads_cnt; u32 free_heads_cnt; u32 headroom; u32 chunk_size; u32 frame_len; - u16 queue_id; u8 cached_need_wakeup; bool uses_need_wakeup; bool cheap_dma; bool unaligned; - struct xdp_umem *umem; void *addrs; - struct device *dev; - struct net_device *netdev; - struct list_head xsk_tx_list; - /* Protects modifications to the xsk_tx_list */ - spinlock_t xsk_tx_list_lock; - refcount_t users; - struct work_struct work; struct xdp_buff_xsk *free_heads[]; }; From patchwork Thu Jul 2 12:19:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Magnus Karlsson X-Patchwork-Id: 1321374 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: incoming-bpf@patchwork.ozlabs.org Delivered-To: patchwork-incoming-bpf@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=bpf-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49yHGr61YQz9sR4 for ; Thu, 2 Jul 2020 22:20:04 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728906AbgGBMUE (ORCPT ); Thu, 2 Jul 2020 08:20:04 -0400 Received: from mga12.intel.com ([192.55.52.136]:6897 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728812AbgGBMUE (ORCPT ); Thu, 2 Jul 2020 08:20:04 -0400 IronPort-SDR: TYYyw9tJpuBiIEe7F79SKtYuvEOQZXMgVIm5iayhN4BHw+7q2koH88yReTv9vUg7hGCgWT6GjN 2lXUhzsF4nyw== X-IronPort-AV: E=McAfee;i="6000,8403,9669"; a="126486118" X-IronPort-AV: E=Sophos;i="5.75,304,1589266800"; d="scan'208";a="126486118" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Jul 2020 05:20:02 -0700 IronPort-SDR: FwU//7qtj0Et4wVU8AQ2QIezjHvpL2b88enQvA6wGF6R6IBakYgwxEDKMPnLS40YMf7Q3NlCXO EiKJsHe6B9iA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,304,1589266800"; d="scan'208";a="425933369" Received: from mkarlsso-mobl.ger.corp.intel.com (HELO localhost.localdomain) ([10.252.39.242]) by orsmga004.jf.intel.com with ESMTP; 02 Jul 2020 05:19:58 -0700 From: Magnus Karlsson To: magnus.karlsson@intel.com, bjorn.topel@intel.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org, jonathan.lemon@gmail.com, maximmi@mellanox.com Cc: bpf@vger.kernel.org, jeffrey.t.kirsher@intel.com, maciej.fijalkowski@intel.com, maciejromanfijalkowski@gmail.com, cristian.dumitrescu@intel.com Subject: [PATCH bpf-next 10/14] xsk: add shared umem support between queue ids Date: Thu, 2 Jul 2020 14:19:09 +0200 Message-Id: <1593692353-15102-11-git-send-email-magnus.karlsson@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1593692353-15102-1-git-send-email-magnus.karlsson@intel.com> References: <1593692353-15102-1-git-send-email-magnus.karlsson@intel.com> Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Add support to share a umem between queue ids on the same device. This mode can be invoked with the XDP_SHARED_UMEM bind flag. Previously, sharing was only supported within the same queue id and device, and you shared one set of fill and completion rings. However, note that when sharing a umem between queue ids, you need to create a fill ring and a completion ring and tie them to the socket before you do the bind with the XDP_SHARED_UMEM flag. This so that the single-producer single-consumer semantics can be upheld. Signed-off-by: Magnus Karlsson --- include/net/xsk_buff_pool.h | 3 +++ net/xdp/xsk.c | 51 +++++++++++++++++++++++++++++---------------- net/xdp/xsk_buff_pool.c | 27 ++++++++++++++++++++++-- 3 files changed, 61 insertions(+), 20 deletions(-) diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h index 7513a17..844901c 100644 --- a/include/net/xsk_buff_pool.h +++ b/include/net/xsk_buff_pool.h @@ -76,6 +76,9 @@ struct xsk_buff_pool *xp_assign_umem(struct xsk_buff_pool *pool, struct xdp_umem *umem); int xp_assign_dev(struct xsk_buff_pool *pool, struct xdp_sock *xs, struct net_device *dev, u16 queue_id, u16 flags); +int xp_assign_dev_shared(struct xsk_buff_pool *pool, struct xdp_sock *xs, + struct xdp_umem *umem, struct net_device *dev, + u16 queue_id); void xp_destroy(struct xsk_buff_pool *pool); void xp_release(struct xdp_buff_xsk *xskb); void xp_get_pool(struct xsk_buff_pool *pool); diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 4d0028c..1abc222 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -627,6 +627,7 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) struct sockaddr_xdp *sxdp = (struct sockaddr_xdp *)addr; struct sock *sk = sock->sk; struct xdp_sock *xs = xdp_sk(sk); + struct xsk_buff_pool *new_pool; struct net_device *dev; u32 flags, qid; int err = 0; @@ -679,12 +680,6 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) goto out_unlock; } - if (xs->pool->fq || xs->pool->cq) { - /* Do not allow setting your own fq or cq. */ - err = -EINVAL; - goto out_unlock; - } - sock = xsk_lookup_xsk_from_fd(sxdp->sxdp_shared_umem_fd); if (IS_ERR(sock)) { err = PTR_ERR(sock); @@ -697,17 +692,43 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) sockfd_put(sock); goto out_unlock; } - if (umem_xs->dev != dev || umem_xs->queue_id != qid) { + if (umem_xs->dev != dev) { err = -EINVAL; sockfd_put(sock); goto out_unlock; } - /* Share the buffer pool with the other socket. */ - xp_get_pool(umem_xs->pool); - curr_pool = xs->pool; - xs->pool = umem_xs->pool; - xp_destroy(curr_pool); + if (umem_xs->queue_id != qid) { + /* Share the umem with another socket on another qid */ + new_pool = xp_assign_umem(xs->pool, umem_xs->umem); + if (!new_pool) { + sockfd_put(sock); + goto out_unlock; + } + + err = xp_assign_dev_shared(new_pool, xs, umem_xs->umem, + dev, qid); + if (err) { + xp_destroy(new_pool); + sockfd_put(sock); + goto out_unlock; + } + xs->pool = new_pool; + } else { + /* Share the buffer pool with the other socket. */ + if (xs->pool->fq || xs->pool->cq) { + /* Do not allow setting your own fq or cq. */ + err = -EINVAL; + sockfd_put(sock); + goto out_unlock; + } + + xp_get_pool(umem_xs->pool); + curr_pool = xs->pool; + xs->pool = umem_xs->pool; + xp_destroy(curr_pool); + } + xdp_get_umem(umem_xs->umem); WRITE_ONCE(xs->umem, umem_xs->umem); sockfd_put(sock); @@ -715,8 +736,6 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) err = -EINVAL; goto out_unlock; } else { - struct xsk_buff_pool *new_pool; - /* This xsk has its own umem. */ new_pool = xp_assign_umem(xs->pool, xs->umem); if (!new_pool) { @@ -841,10 +860,6 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname, mutex_unlock(&xs->mutex); return -EBUSY; } - if (!xs->umem) { - mutex_unlock(&xs->mutex); - return -EINVAL; - } q = (optname == XDP_UMEM_FILL_RING) ? &xs->pool->fq : &xs->pool->cq; diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c index 3c58d76..7987c17 100644 --- a/net/xdp/xsk_buff_pool.c +++ b/net/xdp/xsk_buff_pool.c @@ -126,8 +126,8 @@ static void xp_disable_drv_zc(struct xsk_buff_pool *pool) } } -int xp_assign_dev(struct xsk_buff_pool *pool, struct xdp_sock *xs, - struct net_device *netdev, u16 queue_id, u16 flags) +static int __xp_assign_dev(struct xsk_buff_pool *pool, struct xdp_sock *xs, + struct net_device *netdev, u16 queue_id, u16 flags) { bool force_zc, force_copy; struct netdev_bpf bpf; @@ -196,6 +196,29 @@ int xp_assign_dev(struct xsk_buff_pool *pool, struct xdp_sock *xs, return err; } +int xp_assign_dev(struct xsk_buff_pool *pool, struct xdp_sock *xs, + struct net_device *dev, u16 queue_id, u16 flags) +{ + return __xp_assign_dev(pool, xs, dev, queue_id, flags); +} + +int xp_assign_dev_shared(struct xsk_buff_pool *pool, struct xdp_sock *xs, + struct xdp_umem *umem, struct net_device *dev, + u16 queue_id) +{ + u16 flags; + + /* One fill and completion ring required for each queue id. */ + if (!pool->fq || !pool->cq) + return -EINVAL; + + flags = umem->zc ? XDP_ZEROCOPY : XDP_COPY; + if (pool->uses_need_wakeup) + flags |= XDP_USE_NEED_WAKEUP; + + return __xp_assign_dev(pool, xs, dev, queue_id, flags); +} + void xp_clear_dev(struct xsk_buff_pool *pool) { if (!pool->netdev) From patchwork Thu Jul 2 12:19:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Magnus Karlsson X-Patchwork-Id: 1321376 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: incoming-bpf@patchwork.ozlabs.org Delivered-To: patchwork-incoming-bpf@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=bpf-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49yHGv3FLJz9sR4 for ; Thu, 2 Jul 2020 22:20:07 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729081AbgGBMUG (ORCPT ); Thu, 2 Jul 2020 08:20:06 -0400 Received: from mga12.intel.com ([192.55.52.136]:6897 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728812AbgGBMUF (ORCPT ); Thu, 2 Jul 2020 08:20:05 -0400 IronPort-SDR: ay+MW/Ebgtuwo8JEiNpHGVl88r2i4Pxeb8dxyS3wzCDNpaCd+69+jbNbrXegRvYX0EdRsc1D6n Wh/L6MGgiG+g== X-IronPort-AV: E=McAfee;i="6000,8403,9669"; a="126486138" X-IronPort-AV: E=Sophos;i="5.75,304,1589266800"; d="scan'208";a="126486138" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Jul 2020 05:20:05 -0700 IronPort-SDR: SQgjYFJ9ELTM428SdkHWWDY/ZgfcZFPJCSm/iDLjvAyQgI60CCQsAuMHTsXctSrqg5t+PCKyXN of7W2cV8XVww== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,304,1589266800"; d="scan'208";a="425933401" Received: from mkarlsso-mobl.ger.corp.intel.com (HELO localhost.localdomain) ([10.252.39.242]) by orsmga004.jf.intel.com with ESMTP; 02 Jul 2020 05:20:02 -0700 From: Magnus Karlsson To: magnus.karlsson@intel.com, bjorn.topel@intel.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org, jonathan.lemon@gmail.com, maximmi@mellanox.com Cc: bpf@vger.kernel.org, jeffrey.t.kirsher@intel.com, maciej.fijalkowski@intel.com, maciejromanfijalkowski@gmail.com, cristian.dumitrescu@intel.com Subject: [PATCH bpf-next 11/14] xsk: add shared umem support between devices Date: Thu, 2 Jul 2020 14:19:10 +0200 Message-Id: <1593692353-15102-12-git-send-email-magnus.karlsson@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1593692353-15102-1-git-send-email-magnus.karlsson@intel.com> References: <1593692353-15102-1-git-send-email-magnus.karlsson@intel.com> Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Add support to share a umem between different devices. This mode can be invoked with the XDP_SHARED_UMEM bind flag. Previously, sharing was only supported within the same device. Note that when sharing a umem between devices, just as in the case of sharing a umem between queue ids, you need to create a fill ring and a completion ring and tie them to the socket (with two setsockopts, one for each ring) before you do the bind with the XDP_SHARED_UMEM flag. This so that the single-producer single-consumer semantics of the rings can be upheld. Signed-off-by: Magnus Karlsson --- net/xdp/xsk.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 1abc222..b240221 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -692,14 +692,11 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) sockfd_put(sock); goto out_unlock; } - if (umem_xs->dev != dev) { - err = -EINVAL; - sockfd_put(sock); - goto out_unlock; - } - if (umem_xs->queue_id != qid) { - /* Share the umem with another socket on another qid */ + if (umem_xs->queue_id != qid || umem_xs->dev != dev) { + /* Share the umem with another socket on another qid + * and/or device. + */ new_pool = xp_assign_umem(xs->pool, umem_xs->umem); if (!new_pool) { sockfd_put(sock); From patchwork Thu Jul 2 12:19:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Magnus Karlsson X-Patchwork-Id: 1321378 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: incoming-bpf@patchwork.ozlabs.org Delivered-To: patchwork-incoming-bpf@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=bpf-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49yHH01j3Pz9sR4 for ; Thu, 2 Jul 2020 22:20:12 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728890AbgGBMUL (ORCPT ); Thu, 2 Jul 2020 08:20:11 -0400 Received: from mga12.intel.com ([192.55.52.136]:6897 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729084AbgGBMUL (ORCPT ); Thu, 2 Jul 2020 08:20:11 -0400 IronPort-SDR: iS6ZEDNtX0V5TRr/HvUG2eGrsW/HpbYsEQbvvOo81adBG/JRiVfqQIqMyN6AxGrEV3GdBFJv/3 TREJ1tSAvRyw== X-IronPort-AV: E=McAfee;i="6000,8403,9669"; a="126486149" X-IronPort-AV: E=Sophos;i="5.75,304,1589266800"; d="scan'208";a="126486149" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Jul 2020 05:20:09 -0700 IronPort-SDR: Pbwau3Z3tQ85+kXIoNJwdrQ52B9pG5rkN/LtHp4+gE1eow8ukPB5PUOoV6FEd61l12zF+IN3vu yhcQVQV38HvA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,304,1589266800"; d="scan'208";a="425933447" Received: from mkarlsso-mobl.ger.corp.intel.com (HELO localhost.localdomain) ([10.252.39.242]) by orsmga004.jf.intel.com with ESMTP; 02 Jul 2020 05:20:05 -0700 From: Magnus Karlsson To: magnus.karlsson@intel.com, bjorn.topel@intel.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org, jonathan.lemon@gmail.com, maximmi@mellanox.com Cc: bpf@vger.kernel.org, jeffrey.t.kirsher@intel.com, maciej.fijalkowski@intel.com, maciejromanfijalkowski@gmail.com, cristian.dumitrescu@intel.com Subject: [PATCH bpf-next 12/14] libbpf: support shared umems between queues and devices Date: Thu, 2 Jul 2020 14:19:11 +0200 Message-Id: <1593692353-15102-13-git-send-email-magnus.karlsson@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1593692353-15102-1-git-send-email-magnus.karlsson@intel.com> References: <1593692353-15102-1-git-send-email-magnus.karlsson@intel.com> Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Add support for shared umems between hardware queues and devices to the AF_XDP part of libbpf. This so that zero-copy can be achieved in applications that want to send and receive packets between HW queues on one device or between different devices/netdevs. In order to create sockets that share a umem between hardware queues and devices, a new function has been added called xsk_socket__create_shared(). It takes the same arguments as xsk_socket_create() plus references to a fill ring and a completion ring. So for every socket that share a umem, you need to have one more set of fill and completion rings. This in order to maintain the single-producer single-consumer semantics of the rings. You can create all the sockets via the new xsk_socket__create_shared() call, or create the first one with xsk_socket__create() and the rest with xsk_socket__create_shared(). Both methods work. Signed-off-by: Magnus Karlsson --- tools/lib/bpf/libbpf.map | 1 + tools/lib/bpf/xsk.c | 376 ++++++++++++++++++++++++++++++----------------- tools/lib/bpf/xsk.h | 9 ++ 3 files changed, 254 insertions(+), 132 deletions(-) diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map index 6544d2c..eb8065b 100644 --- a/tools/lib/bpf/libbpf.map +++ b/tools/lib/bpf/libbpf.map @@ -288,4 +288,5 @@ LIBBPF_0.1.0 { bpf_map__value_size; bpf_program__autoload; bpf_program__set_autoload; + xsk_socket__create_shared; } LIBBPF_0.0.9; diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c index f7f4efb..86ad4f7 100644 --- a/tools/lib/bpf/xsk.c +++ b/tools/lib/bpf/xsk.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include #include @@ -48,26 +49,35 @@ #endif struct xsk_umem { - struct xsk_ring_prod *fill; - struct xsk_ring_cons *comp; + struct xsk_ring_prod *fill_save; + struct xsk_ring_cons *comp_save; char *umem_area; struct xsk_umem_config config; int fd; int refcount; + struct list_head ctx_list; +}; + +struct xsk_ctx { + struct xsk_ring_prod *fill; + struct xsk_ring_cons *comp; + __u32 queue_id; + struct xsk_umem *umem; + int refcount; + int ifindex; + struct list_head list; + int prog_fd; + int xsks_map_fd; + char ifname[IFNAMSIZ]; }; struct xsk_socket { struct xsk_ring_cons *rx; struct xsk_ring_prod *tx; __u64 outstanding_tx; - struct xsk_umem *umem; + struct xsk_ctx *ctx; struct xsk_socket_config config; int fd; - int ifindex; - int prog_fd; - int xsks_map_fd; - __u32 queue_id; - char ifname[IFNAMSIZ]; }; struct xsk_nl_info { @@ -203,15 +213,73 @@ static int xsk_get_mmap_offsets(int fd, struct xdp_mmap_offsets *off) return -EINVAL; } +static int xsk_create_umem_rings(struct xsk_umem *umem, int fd, + struct xsk_ring_prod *fill, + struct xsk_ring_cons *comp) +{ + struct xdp_mmap_offsets off; + void *map; + int err; + + err = setsockopt(fd, SOL_XDP, XDP_UMEM_FILL_RING, + &umem->config.fill_size, + sizeof(umem->config.fill_size)); + if (err) + return -errno; + + err = setsockopt(fd, SOL_XDP, XDP_UMEM_COMPLETION_RING, + &umem->config.comp_size, + sizeof(umem->config.comp_size)); + if (err) + return -errno; + + err = xsk_get_mmap_offsets(fd, &off); + if (err) + return -errno; + + map = mmap(NULL, off.fr.desc + umem->config.fill_size * sizeof(__u64), + PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, fd, + XDP_UMEM_PGOFF_FILL_RING); + if (map == MAP_FAILED) + return -errno; + + fill->mask = umem->config.fill_size - 1; + fill->size = umem->config.fill_size; + fill->producer = map + off.fr.producer; + fill->consumer = map + off.fr.consumer; + fill->flags = map + off.fr.flags; + fill->ring = map + off.fr.desc; + fill->cached_cons = umem->config.fill_size; + + map = mmap(NULL, off.cr.desc + umem->config.comp_size * sizeof(__u64), + PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, fd, + XDP_UMEM_PGOFF_COMPLETION_RING); + if (map == MAP_FAILED) { + err = -errno; + goto out_mmap; + } + + comp->mask = umem->config.comp_size - 1; + comp->size = umem->config.comp_size; + comp->producer = map + off.cr.producer; + comp->consumer = map + off.cr.consumer; + comp->flags = map + off.cr.flags; + comp->ring = map + off.cr.desc; + + return 0; + +out_mmap: + munmap(map, off.fr.desc + umem->config.fill_size * sizeof(__u64)); + return err; +} + int xsk_umem__create_v0_0_4(struct xsk_umem **umem_ptr, void *umem_area, __u64 size, struct xsk_ring_prod *fill, struct xsk_ring_cons *comp, const struct xsk_umem_config *usr_config) { - struct xdp_mmap_offsets off; struct xdp_umem_reg mr; struct xsk_umem *umem; - void *map; int err; if (!umem_area || !umem_ptr || !fill || !comp) @@ -230,6 +298,7 @@ int xsk_umem__create_v0_0_4(struct xsk_umem **umem_ptr, void *umem_area, } umem->umem_area = umem_area; + INIT_LIST_HEAD(&umem->ctx_list); xsk_set_umem_config(&umem->config, usr_config); memset(&mr, 0, sizeof(mr)); @@ -244,71 +313,16 @@ int xsk_umem__create_v0_0_4(struct xsk_umem **umem_ptr, void *umem_area, err = -errno; goto out_socket; } - err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_FILL_RING, - &umem->config.fill_size, - sizeof(umem->config.fill_size)); - if (err) { - err = -errno; - goto out_socket; - } - err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_COMPLETION_RING, - &umem->config.comp_size, - sizeof(umem->config.comp_size)); - if (err) { - err = -errno; - goto out_socket; - } - err = xsk_get_mmap_offsets(umem->fd, &off); - if (err) { - err = -errno; - goto out_socket; - } - - map = mmap(NULL, off.fr.desc + umem->config.fill_size * sizeof(__u64), - PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, umem->fd, - XDP_UMEM_PGOFF_FILL_RING); - if (map == MAP_FAILED) { - err = -errno; + err = xsk_create_umem_rings(umem, umem->fd, fill, comp); + if (err) goto out_socket; - } - - umem->fill = fill; - fill->mask = umem->config.fill_size - 1; - fill->size = umem->config.fill_size; - fill->producer = map + off.fr.producer; - fill->consumer = map + off.fr.consumer; - fill->flags = map + off.fr.flags; - fill->ring = map + off.fr.desc; - fill->cached_prod = *fill->producer; - /* cached_cons is "size" bigger than the real consumer pointer - * See xsk_prod_nb_free - */ - fill->cached_cons = *fill->consumer + umem->config.fill_size; - - map = mmap(NULL, off.cr.desc + umem->config.comp_size * sizeof(__u64), - PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, umem->fd, - XDP_UMEM_PGOFF_COMPLETION_RING); - if (map == MAP_FAILED) { - err = -errno; - goto out_mmap; - } - - umem->comp = comp; - comp->mask = umem->config.comp_size - 1; - comp->size = umem->config.comp_size; - comp->producer = map + off.cr.producer; - comp->consumer = map + off.cr.consumer; - comp->flags = map + off.cr.flags; - comp->ring = map + off.cr.desc; - comp->cached_prod = *comp->producer; - comp->cached_cons = *comp->consumer; + umem->fill_save = fill; + umem->comp_save = comp; *umem_ptr = umem; return 0; -out_mmap: - munmap(map, off.fr.desc + umem->config.fill_size * sizeof(__u64)); out_socket: close(umem->fd); out_umem_alloc: @@ -342,6 +356,7 @@ DEFAULT_VERSION(xsk_umem__create_v0_0_4, xsk_umem__create, LIBBPF_0.0.4) static int xsk_load_xdp_prog(struct xsk_socket *xsk) { static const int log_buf_size = 16 * 1024; + struct xsk_ctx *ctx = xsk->ctx; char log_buf[log_buf_size]; int err, prog_fd; @@ -369,7 +384,7 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk) /* *(u32 *)(r10 - 4) = r2 */ BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -4), /* r1 = xskmap[] */ - BPF_LD_MAP_FD(BPF_REG_1, xsk->xsks_map_fd), + BPF_LD_MAP_FD(BPF_REG_1, ctx->xsks_map_fd), /* r3 = XDP_PASS */ BPF_MOV64_IMM(BPF_REG_3, 2), /* call bpf_redirect_map */ @@ -381,7 +396,7 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk) /* r2 += -4 */ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), /* r1 = xskmap[] */ - BPF_LD_MAP_FD(BPF_REG_1, xsk->xsks_map_fd), + BPF_LD_MAP_FD(BPF_REG_1, ctx->xsks_map_fd), /* call bpf_map_lookup_elem */ BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem), /* r1 = r0 */ @@ -393,7 +408,7 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk) /* r2 = *(u32 *)(r10 - 4) */ BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_10, -4), /* r1 = xskmap[] */ - BPF_LD_MAP_FD(BPF_REG_1, xsk->xsks_map_fd), + BPF_LD_MAP_FD(BPF_REG_1, ctx->xsks_map_fd), /* r3 = 0 */ BPF_MOV64_IMM(BPF_REG_3, 0), /* call bpf_redirect_map */ @@ -411,19 +426,21 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk) return prog_fd; } - err = bpf_set_link_xdp_fd(xsk->ifindex, prog_fd, xsk->config.xdp_flags); + err = bpf_set_link_xdp_fd(xsk->ctx->ifindex, prog_fd, + xsk->config.xdp_flags); if (err) { close(prog_fd); return err; } - xsk->prog_fd = prog_fd; + ctx->prog_fd = prog_fd; return 0; } static int xsk_get_max_queues(struct xsk_socket *xsk) { struct ethtool_channels channels = { .cmd = ETHTOOL_GCHANNELS }; + struct xsk_ctx *ctx = xsk->ctx; struct ifreq ifr = {}; int fd, err, ret; @@ -432,7 +449,7 @@ static int xsk_get_max_queues(struct xsk_socket *xsk) return -errno; ifr.ifr_data = (void *)&channels; - memcpy(ifr.ifr_name, xsk->ifname, IFNAMSIZ - 1); + memcpy(ifr.ifr_name, ctx->ifname, IFNAMSIZ - 1); ifr.ifr_name[IFNAMSIZ - 1] = '\0'; err = ioctl(fd, SIOCETHTOOL, &ifr); if (err && errno != EOPNOTSUPP) { @@ -460,6 +477,7 @@ static int xsk_get_max_queues(struct xsk_socket *xsk) static int xsk_create_bpf_maps(struct xsk_socket *xsk) { + struct xsk_ctx *ctx = xsk->ctx; int max_queues; int fd; @@ -472,15 +490,17 @@ static int xsk_create_bpf_maps(struct xsk_socket *xsk) if (fd < 0) return fd; - xsk->xsks_map_fd = fd; + ctx->xsks_map_fd = fd; return 0; } static void xsk_delete_bpf_maps(struct xsk_socket *xsk) { - bpf_map_delete_elem(xsk->xsks_map_fd, &xsk->queue_id); - close(xsk->xsks_map_fd); + struct xsk_ctx *ctx = xsk->ctx; + + bpf_map_delete_elem(ctx->xsks_map_fd, &ctx->queue_id); + close(ctx->xsks_map_fd); } static int xsk_lookup_bpf_maps(struct xsk_socket *xsk) @@ -488,10 +508,11 @@ static int xsk_lookup_bpf_maps(struct xsk_socket *xsk) __u32 i, *map_ids, num_maps, prog_len = sizeof(struct bpf_prog_info); __u32 map_len = sizeof(struct bpf_map_info); struct bpf_prog_info prog_info = {}; + struct xsk_ctx *ctx = xsk->ctx; struct bpf_map_info map_info; int fd, err; - err = bpf_obj_get_info_by_fd(xsk->prog_fd, &prog_info, &prog_len); + err = bpf_obj_get_info_by_fd(ctx->prog_fd, &prog_info, &prog_len); if (err) return err; @@ -505,11 +526,11 @@ static int xsk_lookup_bpf_maps(struct xsk_socket *xsk) prog_info.nr_map_ids = num_maps; prog_info.map_ids = (__u64)(unsigned long)map_ids; - err = bpf_obj_get_info_by_fd(xsk->prog_fd, &prog_info, &prog_len); + err = bpf_obj_get_info_by_fd(ctx->prog_fd, &prog_info, &prog_len); if (err) goto out_map_ids; - xsk->xsks_map_fd = -1; + ctx->xsks_map_fd = -1; for (i = 0; i < prog_info.nr_map_ids; i++) { fd = bpf_map_get_fd_by_id(map_ids[i]); @@ -523,7 +544,7 @@ static int xsk_lookup_bpf_maps(struct xsk_socket *xsk) } if (!strcmp(map_info.name, "xsks_map")) { - xsk->xsks_map_fd = fd; + ctx->xsks_map_fd = fd; continue; } @@ -531,7 +552,7 @@ static int xsk_lookup_bpf_maps(struct xsk_socket *xsk) } err = 0; - if (xsk->xsks_map_fd == -1) + if (ctx->xsks_map_fd == -1) err = -ENOENT; out_map_ids: @@ -541,16 +562,19 @@ static int xsk_lookup_bpf_maps(struct xsk_socket *xsk) static int xsk_set_bpf_maps(struct xsk_socket *xsk) { - return bpf_map_update_elem(xsk->xsks_map_fd, &xsk->queue_id, + struct xsk_ctx *ctx = xsk->ctx; + + return bpf_map_update_elem(ctx->xsks_map_fd, &ctx->queue_id, &xsk->fd, 0); } static int xsk_setup_xdp_prog(struct xsk_socket *xsk) { + struct xsk_ctx *ctx = xsk->ctx; __u32 prog_id = 0; int err; - err = bpf_get_link_xdp_id(xsk->ifindex, &prog_id, + err = bpf_get_link_xdp_id(ctx->ifindex, &prog_id, xsk->config.xdp_flags); if (err) return err; @@ -566,12 +590,12 @@ static int xsk_setup_xdp_prog(struct xsk_socket *xsk) return err; } } else { - xsk->prog_fd = bpf_prog_get_fd_by_id(prog_id); - if (xsk->prog_fd < 0) + ctx->prog_fd = bpf_prog_get_fd_by_id(prog_id); + if (ctx->prog_fd < 0) return -errno; err = xsk_lookup_bpf_maps(xsk); if (err) { - close(xsk->prog_fd); + close(ctx->prog_fd); return err; } } @@ -580,25 +604,110 @@ static int xsk_setup_xdp_prog(struct xsk_socket *xsk) err = xsk_set_bpf_maps(xsk); if (err) { xsk_delete_bpf_maps(xsk); - close(xsk->prog_fd); + close(ctx->prog_fd); return err; } return 0; } -int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, - __u32 queue_id, struct xsk_umem *umem, - struct xsk_ring_cons *rx, struct xsk_ring_prod *tx, - const struct xsk_socket_config *usr_config) +static struct xsk_ctx *xsk_get_ctx(struct xsk_umem *umem, int ifindex, + __u32 queue_id) +{ + struct xsk_ctx *ctx; + + if (list_empty(&umem->ctx_list)) + return NULL; + + list_for_each_entry(ctx, &umem->ctx_list, list) { + if (ctx->ifindex == ifindex && ctx->queue_id == queue_id) { + ctx->refcount++; + return ctx; + } + } + + return NULL; +} + +static void xsk_put_ctx(struct xsk_ctx *ctx) +{ + struct xsk_umem *umem = ctx->umem; + struct xdp_mmap_offsets off; + int err; + + if (--ctx->refcount == 0) { + err = xsk_get_mmap_offsets(umem->fd, &off); + if (!err) { + munmap(ctx->fill->ring - off.fr.desc, + off.fr.desc + umem->config.fill_size * + sizeof(__u64)); + munmap(ctx->comp->ring - off.cr.desc, + off.cr.desc + umem->config.comp_size * + sizeof(__u64)); + } + + list_del(&ctx->list); + free(ctx); + } +} + +static struct xsk_ctx *xsk_create_ctx(struct xsk_socket *xsk, + struct xsk_umem *umem, int ifindex, + const char *ifname, __u32 queue_id, + struct xsk_ring_prod *fill, + struct xsk_ring_cons *comp) +{ + struct xsk_ctx *ctx; + int err; + + ctx = calloc(1, sizeof(*ctx)); + if (!ctx) + return NULL; + + if (!umem->fill_save) { + err = xsk_create_umem_rings(umem, xsk->fd, fill, comp); + if (err) { + free(ctx); + return NULL; + } + } else if (umem->fill_save != fill || umem->comp_save != comp) { + /* Copy over rings to new structs. */ + memcpy(fill, umem->fill_save, sizeof(*fill)); + memcpy(comp, umem->comp_save, sizeof(*comp)); + } + + ctx->ifindex = ifindex; + ctx->refcount = 1; + ctx->umem = umem; + ctx->queue_id = queue_id; + memcpy(ctx->ifname, ifname, IFNAMSIZ - 1); + ctx->ifname[IFNAMSIZ - 1] = '\0'; + + umem->fill_save = NULL; + umem->comp_save = NULL; + ctx->fill = fill; + ctx->comp = comp; + list_add(&ctx->list, &umem->ctx_list); + return ctx; +} + +int xsk_socket__create_shared(struct xsk_socket **xsk_ptr, + const char *ifname, + __u32 queue_id, struct xsk_umem *umem, + struct xsk_ring_cons *rx, + struct xsk_ring_prod *tx, + struct xsk_ring_prod *fill, + struct xsk_ring_cons *comp, + const struct xsk_socket_config *usr_config) { void *rx_map = NULL, *tx_map = NULL; struct sockaddr_xdp sxdp = {}; struct xdp_mmap_offsets off; struct xsk_socket *xsk; - int err; + struct xsk_ctx *ctx; + int err, ifindex; - if (!umem || !xsk_ptr || !(rx || tx)) + if (!umem || !xsk_ptr || !(rx || tx) || !fill || !comp) return -EFAULT; xsk = calloc(1, sizeof(*xsk)); @@ -609,10 +718,10 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, if (err) goto out_xsk_alloc; - if (umem->refcount && - !(xsk->config.libbpf_flags & XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD)) { - pr_warn("Error: shared umems not supported by libbpf supplied XDP program.\n"); - err = -EBUSY; + xsk->outstanding_tx = 0; + ifindex = if_nametoindex(ifname); + if (!ifindex) { + err = -errno; goto out_xsk_alloc; } @@ -626,16 +735,16 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, xsk->fd = umem->fd; } - xsk->outstanding_tx = 0; - xsk->queue_id = queue_id; - xsk->umem = umem; - xsk->ifindex = if_nametoindex(ifname); - if (!xsk->ifindex) { - err = -errno; - goto out_socket; + ctx = xsk_get_ctx(umem, ifindex, queue_id); + if (!ctx) { + ctx = xsk_create_ctx(xsk, umem, ifindex, ifname, queue_id, + fill, comp); + if (!ctx) { + err = -ENOMEM; + goto out_socket; + } } - memcpy(xsk->ifname, ifname, IFNAMSIZ - 1); - xsk->ifname[IFNAMSIZ - 1] = '\0'; + xsk->ctx = ctx; if (rx) { err = setsockopt(xsk->fd, SOL_XDP, XDP_RX_RING, @@ -643,7 +752,7 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, sizeof(xsk->config.rx_size)); if (err) { err = -errno; - goto out_socket; + goto out_put_ctx; } } if (tx) { @@ -652,14 +761,14 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, sizeof(xsk->config.tx_size)); if (err) { err = -errno; - goto out_socket; + goto out_put_ctx; } } err = xsk_get_mmap_offsets(xsk->fd, &off); if (err) { err = -errno; - goto out_socket; + goto out_put_ctx; } if (rx) { @@ -669,7 +778,7 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, xsk->fd, XDP_PGOFF_RX_RING); if (rx_map == MAP_FAILED) { err = -errno; - goto out_socket; + goto out_put_ctx; } rx->mask = xsk->config.rx_size - 1; @@ -708,10 +817,10 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, xsk->tx = tx; sxdp.sxdp_family = PF_XDP; - sxdp.sxdp_ifindex = xsk->ifindex; - sxdp.sxdp_queue_id = xsk->queue_id; + sxdp.sxdp_ifindex = ctx->ifindex; + sxdp.sxdp_queue_id = ctx->queue_id; if (umem->refcount > 1) { - sxdp.sxdp_flags = XDP_SHARED_UMEM; + sxdp.sxdp_flags |= XDP_SHARED_UMEM; sxdp.sxdp_shared_umem_fd = umem->fd; } else { sxdp.sxdp_flags = xsk->config.bind_flags; @@ -723,7 +832,7 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, goto out_mmap_tx; } - xsk->prog_fd = -1; + ctx->prog_fd = -1; if (!(xsk->config.libbpf_flags & XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD)) { err = xsk_setup_xdp_prog(xsk); @@ -742,6 +851,8 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, if (rx) munmap(rx_map, off.rx.desc + xsk->config.rx_size * sizeof(struct xdp_desc)); +out_put_ctx: + xsk_put_ctx(ctx); out_socket: if (--umem->refcount) close(xsk->fd); @@ -750,25 +861,24 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, return err; } -int xsk_umem__delete(struct xsk_umem *umem) +int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, + __u32 queue_id, struct xsk_umem *umem, + struct xsk_ring_cons *rx, struct xsk_ring_prod *tx, + const struct xsk_socket_config *usr_config) { - struct xdp_mmap_offsets off; - int err; + return xsk_socket__create_shared(xsk_ptr, ifname, queue_id, umem, + rx, tx, umem->fill_save, + umem->comp_save, usr_config); +} +int xsk_umem__delete(struct xsk_umem *umem) +{ if (!umem) return 0; if (umem->refcount) return -EBUSY; - err = xsk_get_mmap_offsets(umem->fd, &off); - if (!err) { - munmap(umem->fill->ring - off.fr.desc, - off.fr.desc + umem->config.fill_size * sizeof(__u64)); - munmap(umem->comp->ring - off.cr.desc, - off.cr.desc + umem->config.comp_size * sizeof(__u64)); - } - close(umem->fd); free(umem); @@ -778,15 +888,16 @@ int xsk_umem__delete(struct xsk_umem *umem) void xsk_socket__delete(struct xsk_socket *xsk) { size_t desc_sz = sizeof(struct xdp_desc); + struct xsk_ctx *ctx = xsk->ctx; struct xdp_mmap_offsets off; int err; if (!xsk) return; - if (xsk->prog_fd != -1) { + if (ctx->prog_fd != -1) { xsk_delete_bpf_maps(xsk); - close(xsk->prog_fd); + close(ctx->prog_fd); } err = xsk_get_mmap_offsets(xsk->fd, &off); @@ -799,14 +910,15 @@ void xsk_socket__delete(struct xsk_socket *xsk) munmap(xsk->tx->ring - off.tx.desc, off.tx.desc + xsk->config.tx_size * desc_sz); } - } - xsk->umem->refcount--; + xsk_put_ctx(ctx); + + ctx->umem->refcount--; /* Do not close an fd that also has an associated umem connected * to it. */ - if (xsk->fd != xsk->umem->fd) + if (xsk->fd != ctx->umem->fd) close(xsk->fd); free(xsk); } diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h index 584f682..1069c46 100644 --- a/tools/lib/bpf/xsk.h +++ b/tools/lib/bpf/xsk.h @@ -234,6 +234,15 @@ LIBBPF_API int xsk_socket__create(struct xsk_socket **xsk, struct xsk_ring_cons *rx, struct xsk_ring_prod *tx, const struct xsk_socket_config *config); +LIBBPF_API int +xsk_socket__create_shared(struct xsk_socket **xsk_ptr, + const char *ifname, + __u32 queue_id, struct xsk_umem *umem, + struct xsk_ring_cons *rx, + struct xsk_ring_prod *tx, + struct xsk_ring_prod *fill, + struct xsk_ring_cons *comp, + const struct xsk_socket_config *config); /* Returns 0 for success and -EBUSY if the umem is still in use. */ LIBBPF_API int xsk_umem__delete(struct xsk_umem *umem); From patchwork Thu Jul 2 12:19:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Magnus Karlsson X-Patchwork-Id: 1321380 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: incoming-bpf@patchwork.ozlabs.org Delivered-To: patchwork-incoming-bpf@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=bpf-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49yHH40CFQz9sR4 for ; Thu, 2 Jul 2020 22:20:16 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729091AbgGBMUP (ORCPT ); Thu, 2 Jul 2020 08:20:15 -0400 Received: from mga12.intel.com ([192.55.52.136]:6897 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728921AbgGBMUO (ORCPT ); Thu, 2 Jul 2020 08:20:14 -0400 IronPort-SDR: IX0tl24Jxlvu+SaVBFYdYpIG8C59W5Ond/6M8pty1ySuX0Y2Ajo+Owwloanc/SXjm/eGaSSere hYL/wcOfmLeA== X-IronPort-AV: E=McAfee;i="6000,8403,9669"; a="126486164" X-IronPort-AV: E=Sophos;i="5.75,304,1589266800"; d="scan'208";a="126486164" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Jul 2020 05:20:14 -0700 IronPort-SDR: va1GS/Ygs/12gQVDPPkgS88hxJ/f/RmLlLOXeYhVxLiXi8qqd4W09cxhxwQFphaRZzVQSmqsnx OjR7ycXQpfNQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,304,1589266800"; d="scan'208";a="425933477" Received: from mkarlsso-mobl.ger.corp.intel.com (HELO localhost.localdomain) ([10.252.39.242]) by orsmga004.jf.intel.com with ESMTP; 02 Jul 2020 05:20:09 -0700 From: Magnus Karlsson To: magnus.karlsson@intel.com, bjorn.topel@intel.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org, jonathan.lemon@gmail.com, maximmi@mellanox.com Cc: Cristian Dumitrescu , bpf@vger.kernel.org, jeffrey.t.kirsher@intel.com, maciej.fijalkowski@intel.com, maciejromanfijalkowski@gmail.com Subject: [PATCH bpf-next 13/14] samples/bpf: add new sample xsk_fwd.c Date: Thu, 2 Jul 2020 14:19:12 +0200 Message-Id: <1593692353-15102-14-git-send-email-magnus.karlsson@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1593692353-15102-1-git-send-email-magnus.karlsson@intel.com> References: <1593692353-15102-1-git-send-email-magnus.karlsson@intel.com> Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org From: Cristian Dumitrescu This sample code illustrates the packet forwarding between multiple AF_XDP sockets in multi-threading environment. All the threads and sockets are sharing a common buffer pool, with each socket having its own private buffer cache. The sockets are created with the xsk_socket__create_shared() function, which allows multiple AF_XDP sockets to share the same UMEM object. Example 1: Single thread handling two sockets. Packets received from socket A (on top of interface IFA, queue QA) are forwarded to socket B (on top of interface IFB, queue QB) and vice-versa. The thread is affinitized to CPU core C: ./xsk_fwd -i IFA -q QA -i IFB -q QB -c C Example 2: Two threads, each handling two sockets. Packets from socket A are sent to socket B (by thread X), packets from socket B are sent to socket A (by thread X); packets from socket C are sent to socket D (by thread Y), packets from socket D are sent to socket C (by thread Y). The two threads are bound to CPU cores CX and CY: ./xdp_fwd -i IFA -q QA -i IFB -q QB -i IFC -q QC -i IFD -q QD -c CX -c CY Signed-off-by: Cristian Dumitrescu --- samples/bpf/Makefile | 3 + samples/bpf/xsk_fwd.c | 1075 +++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 1078 insertions(+) create mode 100644 samples/bpf/xsk_fwd.c diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile index 8403e47..92a3cfb 100644 --- a/samples/bpf/Makefile +++ b/samples/bpf/Makefile @@ -48,6 +48,7 @@ tprogs-y += syscall_tp tprogs-y += cpustat tprogs-y += xdp_adjust_tail tprogs-y += xdpsock +tprogs-y += xsk_fwd tprogs-y += xdp_fwd tprogs-y += task_fd_query tprogs-y += xdp_sample_pkts @@ -104,6 +105,7 @@ syscall_tp-objs := bpf_load.o syscall_tp_user.o cpustat-objs := bpf_load.o cpustat_user.o xdp_adjust_tail-objs := xdp_adjust_tail_user.o xdpsock-objs := xdpsock_user.o +xsk_fwd-objs := xsk_fwd.o xdp_fwd-objs := xdp_fwd_user.o task_fd_query-objs := bpf_load.o task_fd_query_user.o $(TRACE_HELPERS) xdp_sample_pkts-objs := xdp_sample_pkts_user.o $(TRACE_HELPERS) @@ -203,6 +205,7 @@ TPROGLDLIBS_trace_output += -lrt TPROGLDLIBS_map_perf_test += -lrt TPROGLDLIBS_test_overhead += -lrt TPROGLDLIBS_xdpsock += -pthread +TPROGLDLIBS_xsk_fwd += -pthread # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline: # make M=samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang diff --git a/samples/bpf/xsk_fwd.c b/samples/bpf/xsk_fwd.c new file mode 100644 index 0000000..3af105e --- /dev/null +++ b/samples/bpf/xsk_fwd.c @@ -0,0 +1,1075 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright(c) 2020 Intel Corporation. */ + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +#include +#include +#include + +#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) + +typedef __u64 u64; +typedef __u32 u32; +typedef __u16 u16; +typedef __u8 u8; + +/* This program illustrates the packet forwarding between multiple AF_XDP + * sockets in multi-threaded environment. All threads are sharing a common + * buffer pool, with each socket having its own private buffer cache. + * + * Example 1: Single thread handling two sockets. The packets received by socket + * A (interface IFA, queue QA) are forwarded to socket B (interface IFB, queue + * QB), while the packets received by socket B are forwarded to socket A. The + * thread is running on CPU core X: + * + * ./xsk_fwd -i IFA -q QA -i IFB -q QB -c X + * + * Example 2: Two threads, each handling two sockets. The thread running on CPU + * core X forwards all the packets received by socket A to socket B, and all the + * packets received by socket B to socket A. The thread running on CPU core Y is + * performing the same packet forwarding between sockets C and D: + * + * ./xsk_fwd -i IFA -q QA -i IFB -q QB -i IFC -q QC -i IFD -q QD + * -c CX -c CY + */ + +/* + * Buffer pool and buffer cache + * + * For packet forwarding, the packet buffers are typically allocated from the + * pool for packet reception and freed back to the pool for further reuse once + * the packet transmission is completed. + * + * The buffer pool is shared between multiple threads. In order to minimize the + * access latency to the shared buffer pool, each thread creates one (or + * several) buffer caches, which, unlike the buffer pool, are private to the + * thread that creates them and therefore cannot be shared with other threads. + * The access to the shared pool is only needed either (A) when the cache gets + * empty due to repeated buffer allocations and it needs to be replenished from + * the pool, or (B) when the cache gets full due to repeated buffer free and it + * needs to be flushed back to the pull. + * + * In a packet forwarding system, a packet received on any input port can + * potentially be transmitted on any output port, depending on the forwarding + * configuration. For AF_XDP sockets, for this to work with zero-copy of the + * packet buffers when, it is required that the buffer pool memory fits into the + * UMEM area shared by all the sockets. + */ + +struct bpool_params { + u32 n_buffers; + u32 buffer_size; + int mmap_flags; + + u32 n_users_max; + u32 n_buffers_per_slab; +}; + +/* This buffer pool implementation organizes the buffers into equally sized + * slabs of *n_buffers_per_slab*. Initially, there are *n_slabs* slabs in the + * pool that are completely filled with buffer pointers (full slabs). + * + * Each buffer cache has a slab for buffer allocation and a slab for buffer + * free, with both of these slabs initially empty. When the cache's allocation + * slab goes empty, it is swapped with one of the available full slabs from the + * pool, if any is available. When the cache's free slab goes full, it is + * swapped for one of the empty slabs from the pool, which is guaranteed to + * succeed. + * + * Partially filled slabs never get traded between the cache and the pool + * (except when the cache itself is destroyed), which enables fast operation + * through pointer swapping. + */ +struct bpool { + struct bpool_params params; + pthread_mutex_t lock; + void *addr; + + u64 **slabs; + u64 **slabs_reserved; + u64 *buffers; + u64 *buffers_reserved; + + u64 n_slabs; + u64 n_slabs_reserved; + u64 n_buffers; + + u64 n_slabs_available; + u64 n_slabs_reserved_available; + + struct xsk_umem_config umem_cfg; + struct xsk_ring_prod umem_fq; + struct xsk_ring_cons umem_cq; + struct xsk_umem *umem; +}; + +static struct bpool * +bpool_init(struct bpool_params *params, + struct xsk_umem_config *umem_cfg) +{ + struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY}; + u64 n_slabs, n_slabs_reserved, n_buffers, n_buffers_reserved; + u64 slabs_size, slabs_reserved_size; + u64 buffers_size, buffers_reserved_size; + u64 total_size, i; + struct bpool *bp; + u8 *p; + int status; + + /* mmap prep. */ + if (setrlimit(RLIMIT_MEMLOCK, &r)) + return NULL; + + /* bpool internals dimensioning. */ + n_slabs = (params->n_buffers + params->n_buffers_per_slab - 1) / + params->n_buffers_per_slab; + n_slabs_reserved = params->n_users_max * 2; + n_buffers = n_slabs * params->n_buffers_per_slab; + n_buffers_reserved = n_slabs_reserved * params->n_buffers_per_slab; + + slabs_size = n_slabs * sizeof(u64 *); + slabs_reserved_size = n_slabs_reserved * sizeof(u64 *); + buffers_size = n_buffers * sizeof(u64); + buffers_reserved_size = n_buffers_reserved * sizeof(u64); + + total_size = sizeof(struct bpool) + + slabs_size + slabs_reserved_size + + buffers_size + buffers_reserved_size; + + /* bpool memory allocation. */ + p = calloc(total_size, sizeof(u8)); + if (!p) + return NULL; + + /* bpool memory initialization. */ + bp = (struct bpool *)p; + memcpy(&bp->params, params, sizeof(*params)); + bp->params.n_buffers = n_buffers; + + bp->slabs = (u64 **)&p[sizeof(struct bpool)]; + bp->slabs_reserved = (u64 **)&p[sizeof(struct bpool) + + slabs_size]; + bp->buffers = (u64 *)&p[sizeof(struct bpool) + + slabs_size + slabs_reserved_size]; + bp->buffers_reserved = (u64 *)&p[sizeof(struct bpool) + + slabs_size + slabs_reserved_size + buffers_size]; + + bp->n_slabs = n_slabs; + bp->n_slabs_reserved = n_slabs_reserved; + bp->n_buffers = n_buffers; + + for (i = 0; i < n_slabs; i++) + bp->slabs[i] = &bp->buffers[i * params->n_buffers_per_slab]; + bp->n_slabs_available = n_slabs; + + for (i = 0; i < n_slabs_reserved; i++) + bp->slabs_reserved[i] = &bp->buffers_reserved[i * + params->n_buffers_per_slab]; + bp->n_slabs_reserved_available = n_slabs_reserved; + + for (i = 0; i < n_buffers; i++) + bp->buffers[i] = i * params->buffer_size; + + /* lock. */ + status = pthread_mutex_init(&bp->lock, NULL); + if (status) { + free(p); + return NULL; + } + + /* mmap. */ + bp->addr = mmap(NULL, + n_buffers * params->buffer_size, + PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS | params->mmap_flags, + -1, + 0); + if (bp->addr == MAP_FAILED) { + pthread_mutex_destroy(&bp->lock); + free(p); + return NULL; + } + + /* umem. */ + status = xsk_umem__create(&bp->umem, + bp->addr, + bp->params.n_buffers * bp->params.buffer_size, + &bp->umem_fq, + &bp->umem_cq, + umem_cfg); + if (status) { + munmap(bp->addr, bp->params.n_buffers * bp->params.buffer_size); + pthread_mutex_destroy(&bp->lock); + free(p); + return NULL; + } + memcpy(&bp->umem_cfg, umem_cfg, sizeof(*umem_cfg)); + + return bp; +} + +static void +bpool_free(struct bpool *bp) +{ + if (!bp) + return; + + xsk_umem__delete(bp->umem); + munmap(bp->addr, bp->params.n_buffers * bp->params.buffer_size); + pthread_mutex_destroy(&bp->lock); + free(bp); +} + +struct bcache { + struct bpool *bp; + + u64 *slab_cons; + u64 *slab_prod; + + u64 n_buffers_cons; + u64 n_buffers_prod; +}; + +static u32 +bcache_slab_size(struct bcache *bc) +{ + struct bpool *bp = bc->bp; + + return bp->params.n_buffers_per_slab; +} + +static struct bcache * +bcache_init(struct bpool *bp) +{ + struct bcache *bc; + + bc = calloc(1, sizeof(struct bcache)); + if (!bc) + return NULL; + + bc->bp = bp; + bc->n_buffers_cons = 0; + bc->n_buffers_prod = 0; + + pthread_mutex_lock(&bp->lock); + if (bp->n_slabs_reserved_available == 0) { + pthread_mutex_unlock(&bp->lock); + free(bc); + return NULL; + } + + bc->slab_cons = bp->slabs_reserved[bp->n_slabs_reserved_available - 1]; + bc->slab_prod = bp->slabs_reserved[bp->n_slabs_reserved_available - 2]; + bp->n_slabs_reserved_available -= 2; + pthread_mutex_unlock(&bp->lock); + + return bc; +} + +static void +bcache_free(struct bcache *bc) +{ + struct bpool *bp; + + if (!bc) + return; + + /* In order to keep this example simple, the case of freeing any + * existing buffers from the cache back to the pool is ignored. + */ + + bp = bc->bp; + pthread_mutex_lock(&bp->lock); + bp->slabs_reserved[bp->n_slabs_reserved_available] = bc->slab_prod; + bp->slabs_reserved[bp->n_slabs_reserved_available + 1] = bc->slab_cons; + bp->n_slabs_reserved_available += 2; + pthread_mutex_unlock(&bp->lock); + + free(bc); +} + +/* To work correctly, the implementation requires that the *n_buffers* input + * argument is never greater than the buffer pool's *n_buffers_per_slab*. This + * is typically the case, with one exception taking place when large number of + * buffers are allocated at init time (e.g. for the UMEM fill queue setup). + */ +static inline u32 +bcache_cons_check(struct bcache *bc, u32 n_buffers) +{ + struct bpool *bp = bc->bp; + u64 n_buffers_per_slab = bp->params.n_buffers_per_slab; + u64 n_buffers_cons = bc->n_buffers_cons; + u64 n_slabs_available; + u64 *slab_full; + + /* + * Consumer slab is not empty: Use what's available locally. Do not + * look for more buffers from the pool when the ask can only be + * partially satisfied. + */ + if (n_buffers_cons) + return (n_buffers_cons < n_buffers) ? + n_buffers_cons : + n_buffers; + + /* + * Consumer slab is empty: look to trade the current consumer slab + * (full) for a full slab from the pool, if any is available. + */ + pthread_mutex_lock(&bp->lock); + n_slabs_available = bp->n_slabs_available; + if (!n_slabs_available) { + pthread_mutex_unlock(&bp->lock); + return 0; + } + + n_slabs_available--; + slab_full = bp->slabs[n_slabs_available]; + bp->slabs[n_slabs_available] = bc->slab_cons; + bp->n_slabs_available = n_slabs_available; + pthread_mutex_unlock(&bp->lock); + + bc->slab_cons = slab_full; + bc->n_buffers_cons = n_buffers_per_slab; + return n_buffers; +} + +static inline u64 +bcache_cons(struct bcache *bc) +{ + u64 n_buffers_cons = bc->n_buffers_cons - 1; + u64 buffer; + + buffer = bc->slab_cons[n_buffers_cons]; + bc->n_buffers_cons = n_buffers_cons; + return buffer; +} + +static inline void +bcache_prod(struct bcache *bc, u64 buffer) +{ + struct bpool *bp = bc->bp; + u64 n_buffers_per_slab = bp->params.n_buffers_per_slab; + u64 n_buffers_prod = bc->n_buffers_prod; + u64 n_slabs_available; + u64 *slab_empty; + + /* + * Producer slab is not yet full: store the current buffer to it. + */ + if (n_buffers_prod < n_buffers_per_slab) { + bc->slab_prod[n_buffers_prod] = buffer; + bc->n_buffers_prod = n_buffers_prod + 1; + return; + } + + /* + * Producer slab is full: trade the cache's current producer slab + * (full) for an empty slab from the pool, then store the current + * buffer to the new producer slab. As one full slab exists in the + * cache, it is guaranteed that there is at least one empty slab + * available in the pool. + */ + pthread_mutex_lock(&bp->lock); + n_slabs_available = bp->n_slabs_available; + slab_empty = bp->slabs[n_slabs_available]; + bp->slabs[n_slabs_available] = bc->slab_prod; + bp->n_slabs_available = n_slabs_available + 1; + pthread_mutex_unlock(&bp->lock); + + slab_empty[0] = buffer; + bc->slab_prod = slab_empty; + bc->n_buffers_prod = 1; +} + +/* + * Port + * + * Each of the forwarding ports sits on top of an AF_XDP socket. In order for + * packet forwarding to happen with no packet buffer copy, all the sockets need + * to share the same UMEM area, which is used as the buffer pool memory. + */ +#ifndef MAX_BURST_RX +#define MAX_BURST_RX 64 +#endif + +#ifndef MAX_BURST_TX +#define MAX_BURST_TX 64 +#endif + +struct burst_rx { + u64 addr[MAX_BURST_RX]; + u32 len[MAX_BURST_RX]; +}; + +struct burst_tx { + u64 addr[MAX_BURST_TX]; + u32 len[MAX_BURST_TX]; + u32 n_pkts; +}; + +struct port_params { + struct xsk_socket_config xsk_cfg; + struct bpool *bp; + const char *iface; + u32 iface_queue; +}; + +struct port { + struct port_params params; + + struct bcache *bc; + + struct xsk_ring_cons rxq; + struct xsk_ring_prod txq; + struct xsk_ring_prod umem_fq; + struct xsk_ring_cons umem_cq; + struct xsk_socket *xsk; + int umem_fq_initialized; + + u64 n_pkts_rx; + u64 n_pkts_tx; +}; + +static void +port_free(struct port *p) +{ + if (!p) + return; + + /* To keep this example simple, the code to free the buffers from the + * socket's receive and transmit queues, as well as from the UMEM fill + * and completion queues, is not included. + */ + + if (p->xsk) + xsk_socket__delete(p->xsk); + + bcache_free(p->bc); + + free(p); +} + +static struct port * +port_init(struct port_params *params) +{ + struct port *p; + u32 umem_fq_size, pos = 0; + int status, i; + + /* Memory allocation and initialization. */ + p = calloc(sizeof(struct port), 1); + if (!p) + return NULL; + + memcpy(&p->params, params, sizeof(p->params)); + umem_fq_size = params->bp->umem_cfg.fill_size; + + /* bcache. */ + p->bc = bcache_init(params->bp); + if (!p->bc || + (bcache_slab_size(p->bc) < umem_fq_size) || + (bcache_cons_check(p->bc, umem_fq_size) < umem_fq_size)) { + port_free(p); + return NULL; + } + + /* xsk socket. */ + status = xsk_socket__create_shared(&p->xsk, + params->iface, + params->iface_queue, + params->bp->umem, + &p->rxq, + &p->txq, + &p->umem_fq, + &p->umem_cq, + ¶ms->xsk_cfg); + if (status) { + port_free(p); + return NULL; + } + + /* umem fq. */ + xsk_ring_prod__reserve(&p->umem_fq, umem_fq_size, &pos); + + for (i = 0; i < umem_fq_size; i++) + *xsk_ring_prod__fill_addr(&p->umem_fq, pos + i) = + bcache_cons(p->bc); + + xsk_ring_prod__submit(&p->umem_fq, umem_fq_size); + p->umem_fq_initialized = 1; + + return p; +} + +static inline u32 +port_rx_burst(struct port *p, struct burst_rx *b) +{ + u32 n_pkts, pos, i; + + /* Free buffers for FQ replenish. */ + n_pkts = ARRAY_SIZE(b->addr); + + n_pkts = bcache_cons_check(p->bc, n_pkts); + if (!n_pkts) + return 0; + + /* RXQ. */ + n_pkts = xsk_ring_cons__peek(&p->rxq, n_pkts, &pos); + if (!n_pkts) { + if (xsk_ring_prod__needs_wakeup(&p->umem_fq)) { + struct pollfd pollfd = { + .fd = xsk_socket__fd(p->xsk), + .events = POLLIN, + }; + + poll(&pollfd, 1, 0); + } + return 0; + } + + for (i = 0; i < n_pkts; i++) { + b->addr[i] = xsk_ring_cons__rx_desc(&p->rxq, pos + i)->addr; + b->len[i] = xsk_ring_cons__rx_desc(&p->rxq, pos + i)->len; + } + + xsk_ring_cons__release(&p->rxq, n_pkts); + p->n_pkts_rx += n_pkts; + + /* UMEM FQ. */ + for ( ; ; ) { + int status; + + status = xsk_ring_prod__reserve(&p->umem_fq, n_pkts, &pos); + if (status == n_pkts) + break; + + if (xsk_ring_prod__needs_wakeup(&p->umem_fq)) { + struct pollfd pollfd = { + .fd = xsk_socket__fd(p->xsk), + .events = POLLIN, + }; + + poll(&pollfd, 1, 0); + } + } + + for (i = 0; i < n_pkts; i++) + *xsk_ring_prod__fill_addr(&p->umem_fq, pos + i) = + bcache_cons(p->bc); + + xsk_ring_prod__submit(&p->umem_fq, n_pkts); + + return n_pkts; +} + +static inline void +port_tx_burst(struct port *p, struct burst_tx *b) +{ + u32 n_pkts, pos, i; + int status; + + /* UMEM CQ. */ + n_pkts = p->params.bp->umem_cfg.comp_size; + + n_pkts = xsk_ring_cons__peek(&p->umem_cq, n_pkts, &pos); + + for (i = 0; i < n_pkts; i++) { + u64 addr = *xsk_ring_cons__comp_addr(&p->umem_cq, pos + i); + + bcache_prod(p->bc, addr); + } + + xsk_ring_cons__release(&p->umem_cq, n_pkts); + + /* TXQ. */ + n_pkts = b->n_pkts; + + for ( ; ; ) { + status = xsk_ring_prod__reserve(&p->txq, n_pkts, &pos); + if (status == n_pkts) + break; + + if (xsk_ring_prod__needs_wakeup(&p->txq)) + sendto(xsk_socket__fd(p->xsk), NULL, 0, MSG_DONTWAIT, + NULL, 0); + } + + for (i = 0; i < n_pkts; i++) { + xsk_ring_prod__tx_desc(&p->txq, pos + i)->addr = b->addr[i]; + xsk_ring_prod__tx_desc(&p->txq, pos + i)->len = b->len[i]; + } + + xsk_ring_prod__submit(&p->txq, n_pkts); + if (xsk_ring_prod__needs_wakeup(&p->txq)) + sendto(xsk_socket__fd(p->xsk), NULL, 0, MSG_DONTWAIT, NULL, 0); + p->n_pkts_tx += n_pkts; +} + +/* + * Thread + * + * Packet forwarding threads. + */ +#ifndef MAX_PORTS_PER_THREAD +#define MAX_PORTS_PER_THREAD 16 +#endif + +struct thread_data { + struct port *ports_rx[MAX_PORTS_PER_THREAD]; + struct port *ports_tx[MAX_PORTS_PER_THREAD]; + u32 n_ports_rx; + struct burst_rx burst_rx; + struct burst_tx burst_tx[MAX_PORTS_PER_THREAD]; + u32 cpu_core_id; + int quit; +}; + +static void swap_mac_addresses(void *data) +{ + struct ether_header *eth = (struct ether_header *)data; + struct ether_addr *src_addr = (struct ether_addr *)ð->ether_shost; + struct ether_addr *dst_addr = (struct ether_addr *)ð->ether_dhost; + struct ether_addr tmp; + + tmp = *src_addr; + *src_addr = *dst_addr; + *dst_addr = tmp; +} + +static void * +thread_func(void *arg) +{ + struct thread_data *t = arg; + cpu_set_t cpu_cores; + u32 i; + + CPU_ZERO(&cpu_cores); + CPU_SET(t->cpu_core_id, &cpu_cores); + pthread_setaffinity_np(pthread_self(), sizeof(cpu_set_t), &cpu_cores); + + for (i = 0; !t->quit; i = (i + 1) & (t->n_ports_rx - 1)) { + struct port *port_rx = t->ports_rx[i]; + struct port *port_tx = t->ports_tx[i]; + struct burst_rx *brx = &t->burst_rx; + struct burst_tx *btx = &t->burst_tx[i]; + u32 n_pkts, j; + + /* RX. */ + n_pkts = port_rx_burst(port_rx, brx); + if (!n_pkts) + continue; + + /* Process & TX. */ + for (j = 0; j < n_pkts; j++) { + u64 addr = xsk_umem__add_offset_to_addr(brx->addr[j]); + u8 *pkt = xsk_umem__get_data(port_rx->params.bp->addr, + addr); + + swap_mac_addresses(pkt); + + btx->addr[btx->n_pkts] = brx->addr[j]; + btx->len[btx->n_pkts] = brx->len[j]; + btx->n_pkts++; + + if (btx->n_pkts == MAX_BURST_TX) { + port_tx_burst(port_tx, btx); + btx->n_pkts = 0; + } + } + } + + return NULL; +} + +/* + * Process + */ +static const struct bpool_params bpool_params_default = { + .n_buffers = 64 * 1024, + .buffer_size = XSK_UMEM__DEFAULT_FRAME_SIZE, + .mmap_flags = 0, + + .n_users_max = 16, + .n_buffers_per_slab = XSK_RING_PROD__DEFAULT_NUM_DESCS, +}; + +static const struct xsk_umem_config umem_cfg_default = { + .fill_size = XSK_RING_PROD__DEFAULT_NUM_DESCS, + .comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS, + .frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE, + .frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM, + .flags = 0, +}; + +static const struct port_params port_params_default = { + .xsk_cfg = { + .rx_size = XSK_RING_CONS__DEFAULT_NUM_DESCS, + .tx_size = XSK_RING_PROD__DEFAULT_NUM_DESCS, + .libbpf_flags = 0, + .xdp_flags = 0, + .bind_flags = 0, + }, + + .bp = NULL, + .iface = NULL, + .iface_queue = 0, +}; + +#ifndef MAX_PORTS +#define MAX_PORTS 64 +#endif + +#ifndef MAX_THREADS +#define MAX_THREADS 64 +#endif + +static struct bpool_params bpool_params; +static struct xsk_umem_config umem_cfg; +static struct bpool *bp; + +static struct port_params port_params[MAX_PORTS]; +static struct port *ports[MAX_PORTS]; +static u64 n_pkts_rx[MAX_PORTS]; +static u64 n_pkts_tx[MAX_PORTS]; +static int n_ports; + +static pthread_t threads[MAX_THREADS]; +static struct thread_data thread_data[MAX_THREADS]; +static int n_threads; + +static void +print_usage(char *prog_name) +{ + const char *usage = + "Usage:\n" + "\t%s [ -b SIZE ] -c CORE -i INTERFACE [ -q QUEUE ]\n" + "\n" + "-c CORE CPU core to run a packet forwarding thread\n" + " on. May be invoked multiple times.\n" + "\n" + "-b SIZE Number of buffers in the buffer pool shared\n" + " by all the forwarding threads. Default: %u.\n" + "\n" + "-i INTERFACE Network interface. Each (INTERFACE, QUEUE)\n" + " pair specifies one forwarding port. May be\n" + " invoked multiple times.\n" + "\n" + "-q QUEUE Network interface queue for RX and TX. Each\n" + " (INTERFACE, QUEUE) pair specified one\n" + " forwarding port. Default: %u. May be invoked\n" + " multiple times.\n" + "\n"; + printf(usage, + prog_name, + bpool_params_default.n_buffers, + port_params_default.iface_queue); +} + +static int +parse_args(int argc, char **argv) +{ + struct option lgopts[] = { + { NULL, 0, 0, 0 } + }; + int opt, option_index; + + /* Parse the input arguments. */ + for ( ; ;) { + opt = getopt_long(argc, argv, "c:i:q:", lgopts, &option_index); + if (opt == EOF) + break; + + switch (opt) { + case 'b': + bpool_params.n_buffers = atoi(optarg); + break; + + case 'c': + if (n_threads == MAX_THREADS) { + printf("Max number of threads (%d) reached.\n", + MAX_THREADS); + return -1; + } + + thread_data[n_threads].cpu_core_id = atoi(optarg); + n_threads++; + break; + + case 'i': + if (n_ports == MAX_PORTS) { + printf("Max number of ports (%d) reached.\n", + MAX_PORTS); + return -1; + } + + port_params[n_ports].iface = optarg; + port_params[n_ports].iface_queue = 0; + n_ports++; + break; + + case 'q': + if (n_ports == 0) { + printf("No port specified for queue.\n"); + return -1; + } + port_params[n_ports - 1].iface_queue = atoi(optarg); + break; + + default: + printf("Illegal argument.\n"); + return -1; + } + } + + optind = 1; /* reset getopt lib */ + + /* Check the input arguments. */ + if (!n_ports) { + printf("No ports specified.\n"); + return -1; + } + + if (!n_threads) { + printf("No threads specified.\n"); + return -1; + } + + if (n_ports % n_threads) { + printf("Ports cannot be evenly distributed to threads.\n"); + return -1; + } + + return 0; +} + +static void +print_port(u32 port_id) +{ + struct port *port = ports[port_id]; + + printf("Port %u: interface = %s, queue = %u\n", + port_id, port->params.iface, port->params.iface_queue); +} + +static void +print_thread(u32 thread_id) +{ + struct thread_data *t = &thread_data[thread_id]; + u32 i; + + printf("Thread %u (CPU core %u): ", + thread_id, t->cpu_core_id); + + for (i = 0; i < t->n_ports_rx; i++) { + struct port *port_rx = t->ports_rx[i]; + struct port *port_tx = t->ports_tx[i]; + + printf("(%s, %u) -> (%s, %u), ", + port_rx->params.iface, + port_rx->params.iface_queue, + port_tx->params.iface, + port_tx->params.iface_queue); + } + + printf("\n"); +} + +static void +print_port_stats_separator(void) +{ + printf("+-%4s-+-%12s-+-%13s-+-%12s-+-%13s-+\n", + "----", + "------------", + "-------------", + "------------", + "-------------"); +} + +static void +print_port_stats_header(void) +{ + print_port_stats_separator(); + printf("| %4s | %12s | %13s | %12s | %13s |\n", + "Port", + "RX packets", + "RX rate (pps)", + "TX packets", + "TX_rate (pps)"); + print_port_stats_separator(); +} + +static void +print_port_stats_trailer(void) +{ + print_port_stats_separator(); + printf("\n"); +} + +static void +print_port_stats(int port_id, u64 ns_diff) +{ + struct port *p = ports[port_id]; + double rx_pps, tx_pps; + + rx_pps = (p->n_pkts_rx - n_pkts_rx[port_id]) * 1000000000. / ns_diff; + tx_pps = (p->n_pkts_tx - n_pkts_tx[port_id]) * 1000000000. / ns_diff; + + printf("| %4d | %12llu | %13.0f | %12llu | %13.0f |\n", + port_id, + p->n_pkts_rx, + rx_pps, + p->n_pkts_tx, + tx_pps); + + n_pkts_rx[port_id] = p->n_pkts_rx; + n_pkts_tx[port_id] = p->n_pkts_tx; +} + +static void +print_port_stats_all(u64 ns_diff) +{ + int i; + + print_port_stats_header(); + for (i = 0; i < n_ports; i++) + print_port_stats(i, ns_diff); + print_port_stats_trailer(); +} + +static int quit; + +static void +signal_handler(int sig) +{ + quit = 1; +} + +int main(int argc, char **argv) +{ + struct timespec time; + u64 ns0; + int i; + + /* Parse args. */ + memcpy(&bpool_params, &bpool_params_default, + sizeof(struct bpool_params)); + memcpy(&umem_cfg, &umem_cfg_default, + sizeof(struct xsk_umem_config)); + for (i = 0; i < MAX_PORTS; i++) + memcpy(&port_params[i], &port_params_default, + sizeof(struct port_params)); + + if (parse_args(argc, argv)) { + print_usage(argv[0]); + return -1; + } + + /* Buffer pool initialization. */ + bp = bpool_init(&bpool_params, &umem_cfg); + if (!bp) { + printf("Buffer pool initialization failed.\n"); + return -1; + } + printf("Buffer pool created successfully.\n"); + + /* Ports initialization. */ + for (i = 0; i < MAX_PORTS; i++) + port_params[i].bp = bp; + + for (i = 0; i < n_ports; i++) { + ports[i] = port_init(&port_params[i]); + if (!ports[i]) { + printf("Port %d initialization failed.\n", i); + return -1; + } + print_port(i); + } + printf("All ports created successfully.\n"); + + /* Threads. */ + for (i = 0; i < n_threads; i++) { + struct thread_data *t = &thread_data[i]; + u32 n_ports_per_thread = n_ports / n_threads, j; + + for (j = 0; j < n_ports_per_thread; j++) { + t->ports_rx[j] = ports[i * n_ports_per_thread + j]; + t->ports_tx[j] = ports[i * n_ports_per_thread + + (j + 1) % n_ports_per_thread]; + } + + t->n_ports_rx = n_ports_per_thread; + + print_thread(i); + } + + for (i = 0; i < n_threads; i++) { + int status; + + status = pthread_create(&threads[i], + NULL, + thread_func, + &thread_data[i]); + if (status) { + printf("Thread %d creation failed.\n", i); + return -1; + } + } + printf("All threads created successfully.\n"); + + /* Print statistics. */ + signal(SIGINT, signal_handler); + signal(SIGTERM, signal_handler); + signal(SIGABRT, signal_handler); + + clock_gettime(CLOCK_MONOTONIC, &time); + ns0 = time.tv_sec * 1000000000UL + time.tv_nsec; + for ( ; !quit; ) { + u64 ns1, ns_diff; + + sleep(1); + clock_gettime(CLOCK_MONOTONIC, &time); + ns1 = time.tv_sec * 1000000000UL + time.tv_nsec; + ns_diff = ns1 - ns0; + ns0 = ns1; + + print_port_stats_all(ns_diff); + } + + /* Threads completion. */ + printf("Quit.\n"); + for (i = 0; i < n_threads; i++) + thread_data[i].quit = 1; + + for (i = 0; i < n_threads; i++) + pthread_join(threads[i], NULL); + + /* Ports free. */ + for (i = 0; i < n_ports; i++) + port_free(ports[i]); + + /* Buffer pool free. */ + bpool_free(bp); + + return 0; +} From patchwork Thu Jul 2 12:19:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Magnus Karlsson X-Patchwork-Id: 1321382 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: incoming-bpf@patchwork.ozlabs.org Delivered-To: patchwork-incoming-bpf@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=bpf-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49yHH73yKMz9sR4 for ; Thu, 2 Jul 2020 22:20:19 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728933AbgGBMUS (ORCPT ); Thu, 2 Jul 2020 08:20:18 -0400 Received: from mga12.intel.com ([192.55.52.136]:6897 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728921AbgGBMUR (ORCPT ); Thu, 2 Jul 2020 08:20:17 -0400 IronPort-SDR: xtOkRQasXUGkBOYeRY9lm13dDVu13pD3Mk6K0YTHRWTIkAB+r2io25NbHRrDgx7DNFFoGtbJur 5Q6eIInuMWIQ== X-IronPort-AV: E=McAfee;i="6000,8403,9669"; a="126486174" X-IronPort-AV: E=Sophos;i="5.75,304,1589266800"; d="scan'208";a="126486174" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Jul 2020 05:20:17 -0700 IronPort-SDR: 42O+LHzk2ACppVGu7nE6BVAPwrjiev36xpHwFn7e3KqP/SEYvf9c4uFHa7Oh6h1xdP9NMxcdNx GlM9efUiNMeg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,304,1589266800"; d="scan'208";a="425933488" Received: from mkarlsso-mobl.ger.corp.intel.com (HELO localhost.localdomain) ([10.252.39.242]) by orsmga004.jf.intel.com with ESMTP; 02 Jul 2020 05:20:14 -0700 From: Magnus Karlsson To: magnus.karlsson@intel.com, bjorn.topel@intel.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org, jonathan.lemon@gmail.com, maximmi@mellanox.com Cc: bpf@vger.kernel.org, jeffrey.t.kirsher@intel.com, maciej.fijalkowski@intel.com, maciejromanfijalkowski@gmail.com, cristian.dumitrescu@intel.com Subject: [PATCH bpf-next 14/14] xsk: documentation for XDP_SHARED_UMEM between queues and netdevs Date: Thu, 2 Jul 2020 14:19:13 +0200 Message-Id: <1593692353-15102-15-git-send-email-magnus.karlsson@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1593692353-15102-1-git-send-email-magnus.karlsson@intel.com> References: <1593692353-15102-1-git-send-email-magnus.karlsson@intel.com> Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Add documentation for the XDP_SHARED_UMEM feature when a UMEM is shared between different queues and/or netdevs. Signed-off-by: Magnus Karlsson --- Documentation/networking/af_xdp.rst | 68 +++++++++++++++++++++++++++++++------ 1 file changed, 58 insertions(+), 10 deletions(-) diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst index 5bc55a4..2ccc564 100644 --- a/Documentation/networking/af_xdp.rst +++ b/Documentation/networking/af_xdp.rst @@ -258,14 +258,21 @@ socket into zero-copy mode or fail. XDP_SHARED_UMEM bind flag ------------------------- -This flag enables you to bind multiple sockets to the same UMEM, but -only if they share the same queue id. In this mode, each socket has -their own RX and TX rings, but the UMEM (tied to the fist socket -created) only has a single FILL ring and a single COMPLETION -ring. To use this mode, create the first socket and bind it in the normal -way. Create a second socket and create an RX and a TX ring, or at -least one of them, but no FILL or COMPLETION rings as the ones from -the first socket will be used. In the bind call, set he +This flag enables you to bind multiple sockets to the same UMEM. It +works on the same queue id, between queue ids and between +netdevs/devices. In this mode, each socket has their own RX and TX +rings as usual, but you are going to have one or more FILL and +COMPLETION ring pairs. You have to create one of these pairs per +unique netdev and queue id tuple that you bind to. + +Starting with the case were we would like to share a UMEM between +sockets bound to the same netdev and queue id. The UMEM (tied to the +fist socket created) will only have a single FILL ring and a single +COMPLETION ring as there is only on unique netdev,queue_id tuple that +we have bound to. To use this mode, create the first socket and bind +it in the normal way. Create a second socket and create an RX and a TX +ring, or at least one of them, but no FILL or COMPLETION rings as the +ones from the first socket will be used. In the bind call, set he XDP_SHARED_UMEM option and provide the initial socket's fd in the sxdp_shared_umem_fd field. You can attach an arbitrary number of extra sockets this way. @@ -305,11 +312,41 @@ concurrently. There are no synchronization primitives in the libbpf code that protects multiple users at this point in time. Libbpf uses this mode if you create more than one socket tied to the -same umem. However, note that you need to supply the +same UMEM. However, note that you need to supply the XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD libbpf_flag with the xsk_socket__create calls and load your own XDP program as there is no built in one in libbpf that will route the traffic for you. +The second case is when you share a UMEM between sockets that are +bound to different queue ids and/or netdevs. In this case you have to +create one FILL ring and one COMPLETION ring for each unique +netdev,queue_id pair. Let us say you want to create two sockets bound +to two different queue ids on the same netdev. Create the first socket +and bind it in the normal way. Create a second socket and create an RX +and a TX ring, or at least one of them, and then one FILL and +COMPLETION ring for this socket. Then in the bind call, set he +XDP_SHARED_UMEM option and provide the initial socket's fd in the +sxdp_shared_umem_fd field as you registered the UMEM on that +socket. These two sockets will now share one and the same UMEM. + +There is no need to supply an XDP program like the one in the previous +case where sockets were bound to the same queue id and +device. Instead, use the NIC's packet steering capabilities to steer +the packets to the right queue. In the previous example, there is only +one queue shared among sockets, so the NIC cannot do this steering. It +can only steer between queues. + +In libbpf, you need to use the xsk_socket__create_shared() API as it +takes a reference to a FILL ring and a COMPLETION ring that will be +created for you and bound to the shared UMEM. You can use this +function for all the sockets you create, or you can use it for the +second and following ones and use xsk_socket__create() for the first +one. Both methods yield the same result. + +Note that a UMEM can be shared between sockets on the same queue id +and device, as well as between queues on the same device and between +devices at the same time. + XDP_USE_NEED_WAKEUP bind flag ----------------------------- @@ -364,7 +401,7 @@ resources by only setting up one of them. Both the FILL ring and the COMPLETION ring are mandatory as you need to have a UMEM tied to your socket. But if the XDP_SHARED_UMEM flag is used, any socket after the first one does not have a UMEM and should in that case not have any -FILL or COMPLETION rings created as the ones from the shared umem will +FILL or COMPLETION rings created as the ones from the shared UMEM will be used. Note, that the rings are single-producer single-consumer, so do not try to access them from multiple processes at the same time. See the XDP_SHARED_UMEM section. @@ -567,6 +604,17 @@ A: The short answer is no, that is not supported at the moment. The switch, or other distribution mechanism, in your NIC to direct traffic to the correct queue id and socket. +Q: My packets are sometimes corrupted. What is wrong? + +A: Care has to be taken not to feed the same buffer in the UMEM into + more than one ring at the same time. If you for example feed the + same buffer into the FILL ring and the TX ring at the same time, the + NIC might receive data into the buffer at the same time it is + sending it. This will cause some packets to become corrupted. Same + thing goes for feeding the same buffer into the FILL rings + belonging to different queue ids or netdevs bound with the + XDP_SHARED_UMEM flag. + Credits =======