From patchwork Wed Nov 4 14:08:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Magnus Karlsson X-Patchwork-Id: 1393972 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=P+YRuYfG; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4CR7ng3hqKz9sT6 for ; Thu, 5 Nov 2020 01:09:43 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730193AbgKDOJk (ORCPT ); Wed, 4 Nov 2020 09:09:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41590 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730172AbgKDOJi (ORCPT ); Wed, 4 Nov 2020 09:09:38 -0500 Received: from mail-pg1-x541.google.com (mail-pg1-x541.google.com [IPv6:2607:f8b0:4864:20::541]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2F82EC061A4A; Wed, 4 Nov 2020 06:09:38 -0800 (PST) Received: by mail-pg1-x541.google.com with SMTP id z24so16725650pgk.3; Wed, 04 Nov 2020 06:09:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=XImW9tcA2E8T8s/M7K6i2CJBpkrNf1hayYcS5d4M3qk=; b=P+YRuYfGUaLwuvbY5fTK1UERFSAOh9XJV+WOwdi1s+/Wl7G/r/BS3FxGUriBWI0Xdi n2B5iwjgDbxj3USkEgbBLqTOSvoyadHFdMaq33FOEWLyt3cjjWjlk3o5ekUQfn+IyAwq Fd9Psyyzt/0D0kiMsKWQdbhOA7lzgegXFcHoDZlW5aik1rjkHoWNIcO+ges6VQZTO8s9 plhXqngZdDroOSmbSEpb4g/f1lbdO0tBZ5eiW85b5gigNx9y4dJ/PNfMIi86xAhMSZvx lV6j5821St7C5xIrGhelGc39ewKXJB0qS96y9/P5C8ypfJ+sMN8YDSkX1jthEnpWfJ+A TeFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=XImW9tcA2E8T8s/M7K6i2CJBpkrNf1hayYcS5d4M3qk=; b=TlOKFy6mrUJP875AYTPTDFTqTytB8tJ35lOyUKffG3QTcjQC54xg8zZZtafbw0s9zP hKB9fZOTTHl5QiCtb+yeWJBkFi4bwTZ7qoJ0R9iMaAFoTnUmKBj6w5ZQNQYlot0A9D+a SJco0fKqHO8wlPxLjEPjpvkaqy3cZuQRueNg61J+bVTyDZW5jL205ufM3txEVEf+HRIz 15jJigOyxxoFBmffqWRax3UkoEg19bJtz1+j4O8Qa2Tu1d64FP5XVJ6zukqb+y9wPMwQ Nrp/PkM7zTEZuBY/9cG+N3927qnCFpfzEX7NgqefEFoWINhw2zTZVODUdbRJ3ukcTlTw FszQ== X-Gm-Message-State: AOAM532flPy1zNp8ej7eFpBt4wr0CBMnGPNp4HcrrttPajuEZJnBAKl/ V6n4TrWwot03GBwjfO4K6aaurJkLiH1ajx1LfKo= X-Google-Smtp-Source: ABdhPJw9LtSqlcdh8eDV2aPds2ShVc6s7k+3fU01C3m4iK4eNuDHuWnz5uRVJhncj1xOtIxuGXuWpg== X-Received: by 2002:a17:90a:7024:: with SMTP id f33mr4490145pjk.114.1604498977816; Wed, 04 Nov 2020 06:09:37 -0800 (PST) Received: from VM.ger.corp.intel.com ([192.55.55.43]) by smtp.gmail.com with ESMTPSA id q123sm2724818pfq.56.2020.11.04.06.09.33 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 04 Nov 2020 06:09:37 -0800 (PST) From: Magnus Karlsson To: magnus.karlsson@intel.com, bjorn.topel@intel.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org, jonathan.lemon@gmail.com Cc: bpf@vger.kernel.org, jeffrey.t.kirsher@intel.com, anthony.l.nguyen@intel.com, maciej.fijalkowski@intel.com, maciejromanfijalkowski@gmail.com, intel-wired-lan@lists.osuosl.org Subject: [PATCH bpf-next 1/6] i40e: introduce lazy Tx completions for AF_XDP zero-copy Date: Wed, 4 Nov 2020 15:08:57 +0100 Message-Id: <1604498942-24274-2-git-send-email-magnus.karlsson@gmail.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1604498942-24274-1-git-send-email-magnus.karlsson@gmail.com> References: <1604498942-24274-1-git-send-email-magnus.karlsson@gmail.com> Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Magnus Karlsson Introduce lazy Tx completions when a queue is used for AF_XDP zero-copy. In the current design, each time we get into the NAPI poll loop we try to complete as many Tx packets as possible from the NIC. This is performed by reading the head pointer register in the NIC that tells us how many packets have been completed. Reading this register is expensive as it is across PCIe, so let us try to limit the number of times it is read by only completing Tx packets to user-space when the number of available descriptors in the Tx HW ring is below some threshold. This will decrease the number of reads issued to the NIC and improves performance with 1.5% - 2% for the l2fwd xdpsock microbenchmark. The threshold is set to the minimum possible size that the HW ring can have. This so that we do not run into a scenario where the threshold is higher than the configured number of descriptors in the HW ring. Signed-off-by: Magnus Karlsson --- drivers/net/ethernet/intel/i40e/i40e_xsk.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c index 6acede0..f8815b3 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c @@ -9,6 +9,8 @@ #include "i40e_txrx_common.h" #include "i40e_xsk.h" +#define I40E_TX_COMPLETION_THRESHOLD I40E_MIN_NUM_DESCRIPTORS + int i40e_alloc_rx_bi_zc(struct i40e_ring *rx_ring) { unsigned long sz = sizeof(*rx_ring->rx_bi_zc) * rx_ring->count; @@ -460,12 +462,15 @@ static void i40e_clean_xdp_tx_buffer(struct i40e_ring *tx_ring, **/ bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring) { + u32 i, completed_frames, xsk_frames = 0, head_idx; struct xsk_buff_pool *bp = tx_ring->xsk_pool; - u32 i, completed_frames, xsk_frames = 0; - u32 head_idx = i40e_get_head(tx_ring); struct i40e_tx_buffer *tx_bi; unsigned int ntc; + if (I40E_DESC_UNUSED(tx_ring) >= I40E_TX_COMPLETION_THRESHOLD) + goto out_xmit; + + head_idx = i40e_get_head(tx_ring); if (head_idx < tx_ring->next_to_clean) head_idx += tx_ring->count; completed_frames = head_idx - tx_ring->next_to_clean; From patchwork Wed Nov 4 14:08:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Magnus Karlsson X-Patchwork-Id: 1393974 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=pqfALFjy; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4CR7nl5Lcdz9sVM for ; Thu, 5 Nov 2020 01:09:47 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730222AbgKDOJq (ORCPT ); Wed, 4 Nov 2020 09:09:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41604 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730160AbgKDOJm (ORCPT ); Wed, 4 Nov 2020 09:09:42 -0500 Received: from mail-pg1-x544.google.com (mail-pg1-x544.google.com [IPv6:2607:f8b0:4864:20::544]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A91D3C0613D3; Wed, 4 Nov 2020 06:09:42 -0800 (PST) Received: by mail-pg1-x544.google.com with SMTP id i7so14774000pgh.6; Wed, 04 Nov 2020 06:09:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=8+wmdl8dxGrL4PiYjqO1IGkwfNXPjaWfQTVtL5mk3TU=; b=pqfALFjyvFpVFRkFsJUEQWBG1QOKGsIgGOga8VgN6ZqRXRoEarHrMOg3EaA/cevxMJ OHh7oDXHKItfvhHKg1l8hH4ze1C88whzbodX2iRIa36ANhUrOvAkHVPt5BWpIugKsvgx VV4yfRMElhX7WfIRm8xopCdT3bqnm84kNIf924oric/ZSsQ9VQ/iwMkjlnRFjWiZII1r Deq1joxcO65nhHdaoBOG7PErCT7dPxbW4UbOJ6cQ3vtMkywVEY4RpwtMNswx79Df8JYO /vwb+mOSe90WdBa2G3BzTlALfjrh/VOCiE0k7TUNEZT37Xb7hIwUKvxxKWAilBDqqxeG Yz8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=8+wmdl8dxGrL4PiYjqO1IGkwfNXPjaWfQTVtL5mk3TU=; b=c4I1Vo9UBvJCDzjo/qKnqABkTA/sNyA0v6yRb/MqacmW0T3pMhgyLjPNRrPRQ2yFIK 91KOYfzUfjHBZtfkhkiVL8+LMOJ054MtIXVfwLS9JLd2XiXqR9+ur3FVNwOEMXrK+75Z Hzgf8nbgEPBG+0E1sZYz5vJtsq4LHvj2T2kV9cmNiiWp+r9T3VEj1gPvBjiJmqhkktrK +SiLlx67ZVKampCLkmbvgoAEKRQ8pgkDe2WC3B5TCFgRp+84lCfS5uV8B1w/dcB7fpmh Qc1rrJvwUQHJdK1pD0BTFW0G5g1fdwPNWjI40dYBAZj844Av5nyuMKpuL/F23YpLo93C EKvA== X-Gm-Message-State: AOAM5305mjPozRKSmu/Ks3rfZSPS2GGDAd3JTy2/YG8ucw6ION9+soLD nGAcx97EMbtMkuz7Ti6F4nKfwZACes63kFboZGs= X-Google-Smtp-Source: ABdhPJzIoXz/gBu1yr5pXzZGVVq8Z08Ksz/3VbhzpwCKerO50ZqQ6h9oFGmPWePptpisDK0xiKadmg== X-Received: by 2002:a17:90a:e658:: with SMTP id ep24mr3478312pjb.171.1604498982258; Wed, 04 Nov 2020 06:09:42 -0800 (PST) Received: from VM.ger.corp.intel.com ([192.55.55.43]) by smtp.gmail.com with ESMTPSA id q123sm2724818pfq.56.2020.11.04.06.09.38 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 04 Nov 2020 06:09:41 -0800 (PST) From: Magnus Karlsson To: magnus.karlsson@intel.com, bjorn.topel@intel.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org, jonathan.lemon@gmail.com Cc: bpf@vger.kernel.org, jeffrey.t.kirsher@intel.com, anthony.l.nguyen@intel.com, maciej.fijalkowski@intel.com, maciejromanfijalkowski@gmail.com, intel-wired-lan@lists.osuosl.org Subject: [PATCH bpf-next 2/6] samples/bpf: increment Tx stats at sending Date: Wed, 4 Nov 2020 15:08:58 +0100 Message-Id: <1604498942-24274-3-git-send-email-magnus.karlsson@gmail.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1604498942-24274-1-git-send-email-magnus.karlsson@gmail.com> References: <1604498942-24274-1-git-send-email-magnus.karlsson@gmail.com> Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Magnus Karlsson Increment the statistics over how many Tx packets have been sent at the time of sending instead of at the time of completion. This as a completion event means that the buffer has been sent AND returned to user space. The packet always gets sent shortly after sendto() is called. The kernel might, for performance reasons, decide to not return every single buffer to user space immediately after sending, for example, only after a batch of packets have been transmitted. Incrementing the number of packets sent at completion, will in that case be confusing as if you send a single packet, the counter might show zero for a while even though the packet has been transmitted. Signed-off-by: Magnus Karlsson Acked-by: John Fastabend --- samples/bpf/xdpsock_user.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c index 1149e94..2567f0d 100644 --- a/samples/bpf/xdpsock_user.c +++ b/samples/bpf/xdpsock_user.c @@ -1146,7 +1146,6 @@ static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk, xsk_ring_prod__submit(&xsk->umem->fq, rcvd); xsk_ring_cons__release(&xsk->umem->cq, rcvd); xsk->outstanding_tx -= rcvd; - xsk->ring_stats.tx_npkts += rcvd; } } @@ -1168,7 +1167,6 @@ static inline void complete_tx_only(struct xsk_socket_info *xsk, if (rcvd > 0) { xsk_ring_cons__release(&xsk->umem->cq, rcvd); xsk->outstanding_tx -= rcvd; - xsk->ring_stats.tx_npkts += rcvd; } } @@ -1260,6 +1258,7 @@ static void tx_only(struct xsk_socket_info *xsk, u32 *frame_nb, int batch_size) } xsk_ring_prod__submit(&xsk->tx, batch_size); + xsk->ring_stats.tx_npkts += batch_size; xsk->outstanding_tx += batch_size; *frame_nb += batch_size; *frame_nb %= NUM_FRAMES; @@ -1348,6 +1347,7 @@ static void l2fwd(struct xsk_socket_info *xsk, struct pollfd *fds) } return; } + xsk->ring_stats.rx_npkts += rcvd; ret = xsk_ring_prod__reserve(&xsk->tx, rcvd, &idx_tx); while (ret != rcvd) { @@ -1379,7 +1379,7 @@ static void l2fwd(struct xsk_socket_info *xsk, struct pollfd *fds) xsk_ring_prod__submit(&xsk->tx, rcvd); xsk_ring_cons__release(&xsk->rx, rcvd); - xsk->ring_stats.rx_npkts += rcvd; + xsk->ring_stats.tx_npkts += rcvd; xsk->outstanding_tx += rcvd; } From patchwork Wed Nov 4 14:08:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Magnus Karlsson X-Patchwork-Id: 1393977 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=S+f8J1BL; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4CR7np0Z5sz9sT6 for ; Thu, 5 Nov 2020 01:09:50 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730235AbgKDOJt (ORCPT ); Wed, 4 Nov 2020 09:09:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41616 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730160AbgKDOJr (ORCPT ); Wed, 4 Nov 2020 09:09:47 -0500 Received: from mail-pg1-x544.google.com (mail-pg1-x544.google.com [IPv6:2607:f8b0:4864:20::544]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 191E3C0613D3; Wed, 4 Nov 2020 06:09:47 -0800 (PST) Received: by mail-pg1-x544.google.com with SMTP id x13so16692999pgp.7; Wed, 04 Nov 2020 06:09:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=YzZ/wVl7IVpa6qekpbqcxmPtcuUgeOFQO5frBo5Xdhk=; b=S+f8J1BLMMB2KLffbLNDm3H2iEhiqrofNTZMdAIHgyAZTWE5+4lZap6IBaHV0YHHbO r1HX/6xG4F4e7/gcil+Zz/zn4tMVMb36GNIYuBDwfBWEDHPMaXkL4yrMprN4a7B6kFsa KTaouOc3GsDSdhdYA7rmj5uRc3BnwW/oTQEZNHKnjrocE8aUzkJNuXJXXC2txlGCqiEp t74ZVW1XQDLKFWUiPdmUxZscbk+QDIUbYh2lSVZ7rlNyksyb8N/nLMAt/Oms0lZJnrx6 JRgRkyYSIooMayxIUtMdVRrNR8iATdXjXjO5U2D/+Er6z9ABYVibR7p2Gwgk2RugtQpA S/Hg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=YzZ/wVl7IVpa6qekpbqcxmPtcuUgeOFQO5frBo5Xdhk=; b=kUaXBwloTcqCgQg+tlhx1V4Z4vjYFb9tZgQ/u3Q06ardHd1V1vN/lr8YEYEevP0LtR lNh3Astbw7h+O/XViKt6LPpep940jiuh0Epe64xTgD+uasqrMQa4WKDKMYwPjqfZgNWK YqUIPPa49vCNsn28wE91qBPNn552HifkSXWh7OhmxT3ylO3apvAbw6UAf6eqbgIB0fye BRcBrzBS/wKA8eMFzm/7X9HMWDeLY3E/n7UmtNGWjpxpqqN/PGKV1XEVOmWX/IVi+W0T GVNHTW41llVVY8HIbU8V0bUifvJuikiT0sBfTp6yqch0Ug3vIuRTxcuGpyfLN1ja5ZLr 1WNA== X-Gm-Message-State: AOAM530YdDrHP6PvN4qaWlxSmOw5C0PT6SbLnu0t2GrFCcXIkiJIyoGk 42wZq+roGCSBTWTgPznbPOI= X-Google-Smtp-Source: ABdhPJxXLDKaUdff+0Jfv4EqzSjr4mGB/LwOQjUs0rs4hf2CocTMEUfHXdUUN58KhXCOFZopjGo9pQ== X-Received: by 2002:a17:90a:e996:: with SMTP id v22mr4816309pjy.170.1604498986698; Wed, 04 Nov 2020 06:09:46 -0800 (PST) Received: from VM.ger.corp.intel.com ([192.55.55.43]) by smtp.gmail.com with ESMTPSA id q123sm2724818pfq.56.2020.11.04.06.09.42 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 04 Nov 2020 06:09:46 -0800 (PST) From: Magnus Karlsson To: magnus.karlsson@intel.com, bjorn.topel@intel.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org, jonathan.lemon@gmail.com Cc: bpf@vger.kernel.org, jeffrey.t.kirsher@intel.com, anthony.l.nguyen@intel.com, maciej.fijalkowski@intel.com, maciejromanfijalkowski@gmail.com, intel-wired-lan@lists.osuosl.org Subject: [PATCH bpf-next 3/6] i40e: remove unnecessary sw_ring access from xsk Tx Date: Wed, 4 Nov 2020 15:08:59 +0100 Message-Id: <1604498942-24274-4-git-send-email-magnus.karlsson@gmail.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1604498942-24274-1-git-send-email-magnus.karlsson@gmail.com> References: <1604498942-24274-1-git-send-email-magnus.karlsson@gmail.com> Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Magnus Karlsson Remove the unnecessary access to the software ring for the AF_XDP zero-copy driver. This was used to record the length of the packet so that the driver Tx completion code could sum this up to produce the total bytes sent. This is now performed during the transmission of the packet, so no need to record this in the software ring. Signed-off-by: Magnus Karlsson Acked-by: John Fastabend xsk_pool, dma, desc.len); - tx_bi = &xdp_ring->tx_bi[xdp_ring->next_to_use]; - tx_bi->bytecount = desc.len; - tx_desc = I40E_TX_DESC(xdp_ring, xdp_ring->next_to_use); tx_desc->buffer_addr = cpu_to_le64(dma); tx_desc->cmd_type_offset_bsz = @@ -417,7 +413,7 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget) 0, desc.len, 0); sent_frames++; - total_bytes += tx_bi->bytecount; + total_bytes += desc.len; xdp_ring->next_to_use++; if (xdp_ring->next_to_use == xdp_ring->count) From patchwork Wed Nov 4 14:09:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Magnus Karlsson X-Patchwork-Id: 1393980 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=iInZt+7v; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4CR7nx218mz9sVK for ; Thu, 5 Nov 2020 01:09:57 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730249AbgKDOJy (ORCPT ); Wed, 4 Nov 2020 09:09:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41644 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730237AbgKDOJw (ORCPT ); Wed, 4 Nov 2020 09:09:52 -0500 Received: from mail-pf1-x441.google.com (mail-pf1-x441.google.com [IPv6:2607:f8b0:4864:20::441]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7BD3CC061A4A; Wed, 4 Nov 2020 06:09:51 -0800 (PST) Received: by mail-pf1-x441.google.com with SMTP id c20so17355649pfr.8; Wed, 04 Nov 2020 06:09:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=SjS5TEp5z8cEIBGlzyQMwfGUqpE+fsLHvYNxytW1zhE=; b=iInZt+7vo/656g3R+HqOPQwJLDpnDzLZdTp/NyL9cIzNchStu1zbL61zamGqJi1Ymy 6JE480O5LJ8U+EtaZtqVenWwm6nuiLxFhlk6+T0SCkqJj+FRtTW2K4JDGt5PYfNvSHFD 49CAxFvktj6Vti/2g4zzxuQ7xUcT/Dhr2sCCVh7ZbB3qFJ9WaE62x9HrCfQwg7RmWjMW 5KqicybSFNNgSX3YTuBECODXL6x9gaHBgMg2h1AHzbro/VbwfCxGENPPkg+k7GzUnY+D MzTdbGrsvQfbsMi/YRUO9DzLCjB0xRp5m4UeRyQrrYOsqqX14WyMLgDAgufN1ZNmv5xn 0y9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=SjS5TEp5z8cEIBGlzyQMwfGUqpE+fsLHvYNxytW1zhE=; b=KSqVB+hR3yxq7XIjEkXqNvpikXyrD7C6Zp7pUiBp00y0YT0Dt43hXxgQPie22EyGZg VP8+skP+pw+vzjM8rGcQ+zF8oQAr2oPOLwzX8t/aR9eMGkEKQEGUhNRKgnvzWLqi2JEu TwQqQojUHCgZKYH8jZP9bqKyjOTmchc0aPbi3fouDM/i+F+3wpE+izB3Y6HFRu/Lwdxh cOFD7cvYcxFYzsVuixEjPEmDzAoTsGJxz7FEvSNVQY0PUMMBi96StKz9bJdZhl2Dorhl XVU4DwMRf8YHreFdNbk5jHG2mrJYutmtLgsRH2GVYEIHOR9q9EUtTPpHTp3K3nN5EylO H/LA== X-Gm-Message-State: AOAM530iwjFECXKzKi3k4ePFQrsaoJU/dZ3p2B0qmaSbFt5ZNmIKhxMX fq4SMIsNuvPeIZFJNVyCR27dewui4Qyjxf5p2AQ= X-Google-Smtp-Source: ABdhPJwNBSJJDYLtTsXrcjAt4hqpccMQ5da+iMvYUZs5LnvSwt+eTwObKsSYWc1EFBYyXl1kjAiLQw== X-Received: by 2002:a63:1924:: with SMTP id z36mr21497231pgl.354.1604498991045; Wed, 04 Nov 2020 06:09:51 -0800 (PST) Received: from VM.ger.corp.intel.com ([192.55.55.43]) by smtp.gmail.com with ESMTPSA id q123sm2724818pfq.56.2020.11.04.06.09.47 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 04 Nov 2020 06:09:50 -0800 (PST) From: Magnus Karlsson To: magnus.karlsson@intel.com, bjorn.topel@intel.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org, jonathan.lemon@gmail.com Cc: bpf@vger.kernel.org, jeffrey.t.kirsher@intel.com, anthony.l.nguyen@intel.com, maciej.fijalkowski@intel.com, maciejromanfijalkowski@gmail.com, intel-wired-lan@lists.osuosl.org Subject: [PATCH bpf-next 4/6] xsk: introduce padding between more ring pointers Date: Wed, 4 Nov 2020 15:09:00 +0100 Message-Id: <1604498942-24274-5-git-send-email-magnus.karlsson@gmail.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1604498942-24274-1-git-send-email-magnus.karlsson@gmail.com> References: <1604498942-24274-1-git-send-email-magnus.karlsson@gmail.com> Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Magnus Karlsson Introduce one cache line worth of padding between the consumer pointer and the flags field as well as between the flags field and the start of the descriptors in all the lockless rings. This so that the x86 HW adjacency prefetcher will not prefetch the adjacent pointer/field when only one pointer/field is going to be used. This improves throughput performance for the l2fwd sample app with 1% on my machine with HW prefetching turned on in the BIOS. Signed-off-by: Magnus Karlsson Acked-by: John Fastabend --- net/xdp/xsk_queue.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h index cdb9cf3..74fac80 100644 --- a/net/xdp/xsk_queue.h +++ b/net/xdp/xsk_queue.h @@ -18,9 +18,11 @@ struct xdp_ring { /* Hinder the adjacent cache prefetcher to prefetch the consumer * pointer if the producer pointer is touched and vice versa. */ - u32 pad ____cacheline_aligned_in_smp; + u32 pad1 ____cacheline_aligned_in_smp; u32 consumer ____cacheline_aligned_in_smp; + u32 pad2 ____cacheline_aligned_in_smp; u32 flags; + u32 pad3 ____cacheline_aligned_in_smp; }; /* Used for the RX and TX queues for packets */ From patchwork Wed Nov 4 14:09:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Magnus Karlsson X-Patchwork-Id: 1393981 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=FMM6T2W0; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4CR7p06yRmz9sT6 for ; Thu, 5 Nov 2020 01:10:00 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730259AbgKDOKA (ORCPT ); Wed, 4 Nov 2020 09:10:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41662 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730019AbgKDOJ4 (ORCPT ); Wed, 4 Nov 2020 09:09:56 -0500 Received: from mail-pg1-x543.google.com (mail-pg1-x543.google.com [IPv6:2607:f8b0:4864:20::543]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 04D3BC0613D3; Wed, 4 Nov 2020 06:09:56 -0800 (PST) Received: by mail-pg1-x543.google.com with SMTP id z24so16726275pgk.3; Wed, 04 Nov 2020 06:09:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=ufTyCT2pMTgNA7C6M66JqcRbCYMojzXJdvFS3omWjsg=; b=FMM6T2W0QUATA7ZOklbC45++200vI8c9nAWczfyULREkfsX1Kqdz2qR4iL4BN8u3Yn 1TRm/B0nXP9t+HXicSpXdG1t14dBaw+Q3i6b0LWRoe6tM/Okyf5sFXjaXT3/aahaXac6 +ERg/8s8Llo2Ya5tkEKwP40ATFmbVXp1zRxL2aOSZ42t86bizQPPGeYxdIFugiSzirPD Bp0yOMHcBh7oAtYbEPg60cj/vYep/pUdnjazswbtSJ3PZyctuWG8V/TJR3p4NpMVjRsJ aJ6NbuTcNjW3xOHTIlcEfGc0XMJdKX4bhW4+hEzccBO5QrA0Shz2hkxByFvk8vzODHd8 drVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=ufTyCT2pMTgNA7C6M66JqcRbCYMojzXJdvFS3omWjsg=; b=Gvxl5jyP4To6fMl7eTtvWPuBuCEyj+cBSGClCvs8I/mmKYiFFmkpGNkIHQHXGWe1GG iqpQuVc93+ydGlsMoKGXVYadJ+qD1i1RvodSse5iZ0QrLlx3m695bbvrVYmvm1ztLssq zEk4KSAC0i0rb0IcOd1BshjhVc4bi3ADiabgvWWmO6k7sSgU50TDxixgV8AgqkxBsQOa 3bEn4nViBjJ6mMLnoshsjDP6FkOJuQjYepKav1Nwkse4v7GvXfLSKJJji/oClgTiGST+ nTahBuvbiWNlbfpjSCCHdZIBEJEQxM1zW9ITWxNXV7udcywPk8VoypG7jo4XYUnh9+xS pXeQ== X-Gm-Message-State: AOAM530Q2uoJux3TKVzcUtGiNi+yxqOAvDcK2k5btUQXcUWTYT1uJUhZ Wxw8A/lrIcl/GbMYIVaMuKc= X-Google-Smtp-Source: ABdhPJykm108/rCOikMAhuf6A0/pjoGXjXL9cgTTLxvpyGUlixQGgQFCzAV/ImD6icupYgcs1mQlhw== X-Received: by 2002:a05:6a00:148a:b029:18b:1ce6:4741 with SMTP id v10-20020a056a00148ab029018b1ce64741mr10213535pfu.49.1604498995537; Wed, 04 Nov 2020 06:09:55 -0800 (PST) Received: from VM.ger.corp.intel.com ([192.55.55.43]) by smtp.gmail.com with ESMTPSA id q123sm2724818pfq.56.2020.11.04.06.09.51 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 04 Nov 2020 06:09:55 -0800 (PST) From: Magnus Karlsson To: magnus.karlsson@intel.com, bjorn.topel@intel.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org, jonathan.lemon@gmail.com Cc: bpf@vger.kernel.org, jeffrey.t.kirsher@intel.com, anthony.l.nguyen@intel.com, maciej.fijalkowski@intel.com, maciejromanfijalkowski@gmail.com, intel-wired-lan@lists.osuosl.org Subject: [PATCH bpf-next 5/6] xsk: introduce batched Tx descriptor interfaces Date: Wed, 4 Nov 2020 15:09:01 +0100 Message-Id: <1604498942-24274-6-git-send-email-magnus.karlsson@gmail.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1604498942-24274-1-git-send-email-magnus.karlsson@gmail.com> References: <1604498942-24274-1-git-send-email-magnus.karlsson@gmail.com> Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Magnus Karlsson Introduce batched descriptor interfaces in the xsk core code for the Tx path to be used in the driver to write a code path with higher performance. This interface will be used by the i40e driver in the next patch. Though other drivers would likely benefit from this new interface too. Note that batching is only implemented for the common case when there is only one socket bound to the same device and queue id. When this is not the case, we fall back to the old non-batched version of the function. Signed-off-by: Magnus Karlsson --- include/net/xdp_sock_drv.h | 7 ++++ net/xdp/xsk.c | 43 ++++++++++++++++++++++ net/xdp/xsk_queue.h | 89 +++++++++++++++++++++++++++++++++++++++------- 3 files changed, 126 insertions(+), 13 deletions(-) diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h index 5b1ee8a..4e295541 100644 --- a/include/net/xdp_sock_drv.h +++ b/include/net/xdp_sock_drv.h @@ -13,6 +13,7 @@ void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries); bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc); +u32 xsk_tx_peek_release_desc_batch(struct xsk_buff_pool *pool, struct xdp_desc *desc, u32 max); void xsk_tx_release(struct xsk_buff_pool *pool); struct xsk_buff_pool *xsk_get_pool_from_qid(struct net_device *dev, u16 queue_id); @@ -128,6 +129,12 @@ static inline bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, return false; } +static inline u32 xsk_tx_peek_release_desc_batch(struct xsk_buff_pool *pool, struct xdp_desc *desc, + u32 max) +{ + return 0; +} + static inline void xsk_tx_release(struct xsk_buff_pool *pool) { } diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index b71a32e..dd75b5f 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -332,6 +332,49 @@ bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc) } EXPORT_SYMBOL(xsk_tx_peek_desc); +u32 xsk_tx_peek_release_desc_batch(struct xsk_buff_pool *pool, struct xdp_desc *descs, + u32 max_entries) +{ + struct xdp_sock *xs; + u32 nb_pkts; + + rcu_read_lock(); + if (!list_is_singular(&pool->xsk_tx_list)) { + /* Fallback to the non-batched version */ + rcu_read_unlock(); + return xsk_tx_peek_desc(pool, &descs[0]) ? 1 : 0; + } + + xs = list_first_or_null_rcu(&pool->xsk_tx_list, struct xdp_sock, tx_list); + + nb_pkts = xskq_cons_peek_desc_batch(xs->tx, descs, pool, max_entries); + if (!nb_pkts) { + xs->tx->queue_empty_descs++; + goto out; + } + + /* This is the backpressure mechanism for the Tx path. Try to + * reserve space in the completion queue for all packets, but + * if there are fewer slots available, just process that many + * packets. This avoids having to implement any buffering in + * the Tx path. + */ + nb_pkts = xskq_prod_reserve_addr_batch(pool->cq, descs, nb_pkts); + if (!nb_pkts) + goto out; + + xskq_cons_release_n(xs->tx, nb_pkts); + __xskq_cons_release(xs->tx); + xs->sk.sk_write_space(&xs->sk); + rcu_read_unlock(); + return nb_pkts; + +out: + rcu_read_unlock(); + return 0; +} +EXPORT_SYMBOL(xsk_tx_peek_release_desc_batch); + static int xsk_wakeup(struct xdp_sock *xs, u8 flags) { struct net_device *dev = xs->dev; diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h index 74fac80..a85c7e9 100644 --- a/net/xdp/xsk_queue.h +++ b/net/xdp/xsk_queue.h @@ -199,6 +199,33 @@ static inline bool xskq_cons_read_desc(struct xsk_queue *q, return false; } +static inline u32 xskq_cons_read_desc_batch(struct xsk_queue *q, + struct xdp_desc *descs, + struct xsk_buff_pool *pool, u32 max) +{ + u32 cached_cons = q->cached_cons, nb_entries = 0; + + while (cached_cons != q->cached_prod && nb_entries < max) { + struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring; + u32 idx = cached_cons & q->ring_mask; + + descs[nb_entries] = ring->desc[idx]; + if (unlikely(!xskq_cons_is_valid_desc(q, &descs[nb_entries], pool))) { + if (nb_entries) { + /* Invalid entry detected. Return what we have. */ + return nb_entries; + } + /* Use non-batch version to progress beyond invalid entry/entries */ + return xskq_cons_read_desc(q, descs, pool) ? 1 : 0; + } + + nb_entries++; + cached_cons++; + } + + return nb_entries; +} + /* Functions for consumers */ static inline void __xskq_cons_release(struct xsk_queue *q) @@ -220,17 +247,22 @@ static inline void xskq_cons_get_entries(struct xsk_queue *q) __xskq_cons_peek(q); } -static inline bool xskq_cons_has_entries(struct xsk_queue *q, u32 cnt) +static inline u32 xskq_cons_nb_entries(struct xsk_queue *q, u32 max) { u32 entries = q->cached_prod - q->cached_cons; - if (entries >= cnt) - return true; + if (entries >= max) + return max; __xskq_cons_peek(q); entries = q->cached_prod - q->cached_cons; - return entries >= cnt; + return entries >= max ? max : entries; +} + +static inline bool xskq_cons_has_entries(struct xsk_queue *q, u32 cnt) +{ + return xskq_cons_nb_entries(q, cnt) >= cnt ? true : false; } static inline bool xskq_cons_peek_addr_unchecked(struct xsk_queue *q, u64 *addr) @@ -249,16 +281,28 @@ static inline bool xskq_cons_peek_desc(struct xsk_queue *q, return xskq_cons_read_desc(q, desc, pool); } +static inline u32 xskq_cons_peek_desc_batch(struct xsk_queue *q, struct xdp_desc *descs, + struct xsk_buff_pool *pool, u32 max) +{ + u32 entries = xskq_cons_nb_entries(q, max); + + return xskq_cons_read_desc_batch(q, descs, pool, entries); +} + +/* To improve performance in the xskq_cons_release functions, only update local state here. + * Reflect this to global state when we get new entries from the ring in + * xskq_cons_get_entries() and whenever Rx or Tx processing are completed in the NAPI loop. + */ static inline void xskq_cons_release(struct xsk_queue *q) { - /* To improve performance, only update local state here. - * Reflect this to global state when we get new entries - * from the ring in xskq_cons_get_entries() and whenever - * Rx or Tx processing are completed in the NAPI loop. - */ q->cached_cons++; } +static inline void xskq_cons_release_n(struct xsk_queue *q, u32 cnt) +{ + q->cached_cons += cnt; +} + static inline bool xskq_cons_is_full(struct xsk_queue *q) { /* No barriers needed since data is not accessed */ @@ -268,18 +312,23 @@ static inline bool xskq_cons_is_full(struct xsk_queue *q) /* Functions for producers */ -static inline bool xskq_prod_is_full(struct xsk_queue *q) +static inline u32 xskq_prod_nb_free(struct xsk_queue *q, u32 max) { u32 free_entries = q->nentries - (q->cached_prod - q->cached_cons); - if (free_entries) - return false; + if (free_entries >= max) + return max; /* Refresh the local tail pointer */ q->cached_cons = READ_ONCE(q->ring->consumer); free_entries = q->nentries - (q->cached_prod - q->cached_cons); - return !free_entries; + return free_entries >= max ? max : free_entries; +} + +static inline bool xskq_prod_is_full(struct xsk_queue *q) +{ + return xskq_prod_nb_free(q, 1) ? false : true; } static inline int xskq_prod_reserve(struct xsk_queue *q) @@ -304,6 +353,20 @@ static inline int xskq_prod_reserve_addr(struct xsk_queue *q, u64 addr) return 0; } +static inline u32 xskq_prod_reserve_addr_batch(struct xsk_queue *q, struct xdp_desc *descs, + u32 max) +{ + struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring; + u32 nb_entries, i; + + nb_entries = xskq_prod_nb_free(q, max); + + /* A, matches D */ + for (i = 0; i < nb_entries; i++) + ring->desc[q->cached_prod++ & q->ring_mask] = descs[i].addr; + return nb_entries; +} + static inline int xskq_prod_reserve_desc(struct xsk_queue *q, u64 addr, u32 len) { From patchwork Wed Nov 4 14:09:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Magnus Karlsson X-Patchwork-Id: 1393984 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=l6Ouer1Y; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4CR7p82hSsz9sVS for ; Thu, 5 Nov 2020 01:10:08 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730311AbgKDOKG (ORCPT ); Wed, 4 Nov 2020 09:10:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41676 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730269AbgKDOKA (ORCPT ); Wed, 4 Nov 2020 09:10:00 -0500 Received: from mail-pf1-x443.google.com (mail-pf1-x443.google.com [IPv6:2607:f8b0:4864:20::443]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 948EDC0613D3; Wed, 4 Nov 2020 06:10:00 -0800 (PST) Received: by mail-pf1-x443.google.com with SMTP id e7so17332749pfn.12; Wed, 04 Nov 2020 06:10:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=AZ6+rlXfznzQ0Lu5gqEZPCDs1i51ZdHtuCE0TSF7lhA=; b=l6Ouer1Y/gm/Kh1LS9X++/y6xrVq2XutHlo6VSqvFtmlZmvTUZYK8+U5BlEjBNRuT3 7x2zepVrC7jVnZf/wU41IquHtHQwJEKNyf64ai/PHn+9gDR8ONdTUe2zbaDaJIbtTSHQ 661KwlEofbHLKXOkLKtp/VFbP09AoIiKN9Cj2RS1kglyqzFiZV5g0aywB1wyIgPE4BZy j9EFBh8NAmw2dOYPBA42ZShjCHx4VQQhpXItwOGBkDsJ6rVfcZBtWXJy17FaeRiUODHL HNzWKaIwiu5Um5Hl5+Wa4CyZjHkEncrlfEezYcOpIervYyDsWc+hhF7jbRfo4TRII5RR V+jg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=AZ6+rlXfznzQ0Lu5gqEZPCDs1i51ZdHtuCE0TSF7lhA=; b=m2stmXF0zUuvyf0qXWPfNLt5XhxcifDVZVfVFxdBULSus4hJk9ozpG3ZVVzHdOOQ69 WglZRgSOWirJ9F1HaiVAOaycTYw6YwsWtGN5SpK/R9wUsFTSPsLoWa4eztTtTuEFsfMr MkUccSTC6HRRsJ+n6lTCQQ9URnkecQgCY2y8RU6nebnywIhud14PqMJALg9dB+cIyKSq ALWVrLS8KIEqU+v4+phTMGizbwufXxgyC/JG9W2/XuUTeTE/LDso1Rvv7e4rIRMrB/HS GUCCR4GAONpfZetSsoe+KohPF0T4c9XzN9OuT9T8lePatIEYFCKfCYQyrQ5eQNW8yBWe X/5A== X-Gm-Message-State: AOAM533ztxsPX485Cuwp05Ruzm5oxUVjoLQRExhJaT6i8OLG8FtKk801 Wzj4WYikRAYCLBGrTfvZPRs= X-Google-Smtp-Source: ABdhPJzaXWWs5OxgXYSyx5QhVlu6J1XaDvscJDMSoQc6GvVNtGqXUgObfAuz/462z1S+pVAaAaVBWg== X-Received: by 2002:a17:90a:5309:: with SMTP id x9mr4423708pjh.98.1604499000150; Wed, 04 Nov 2020 06:10:00 -0800 (PST) Received: from VM.ger.corp.intel.com ([192.55.55.43]) by smtp.gmail.com with ESMTPSA id q123sm2724818pfq.56.2020.11.04.06.09.55 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 04 Nov 2020 06:09:59 -0800 (PST) From: Magnus Karlsson To: magnus.karlsson@intel.com, bjorn.topel@intel.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org, jonathan.lemon@gmail.com Cc: bpf@vger.kernel.org, jeffrey.t.kirsher@intel.com, anthony.l.nguyen@intel.com, maciej.fijalkowski@intel.com, maciejromanfijalkowski@gmail.com, intel-wired-lan@lists.osuosl.org Subject: [PATCH bpf-next 6/6] i40e: use batched xsk Tx interfaces to increase performance Date: Wed, 4 Nov 2020 15:09:02 +0100 Message-Id: <1604498942-24274-7-git-send-email-magnus.karlsson@gmail.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1604498942-24274-1-git-send-email-magnus.karlsson@gmail.com> References: <1604498942-24274-1-git-send-email-magnus.karlsson@gmail.com> Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Magnus Karlsson Use the new batched xsk interfaces for the Tx path in the i40e driver to improve performance. On my machine, this yields a throughput increase of 4% for the l2fwd sample app in xdpsock. If we instead just look at the Tx part, this patch set increases throughput with above 20% for Tx. Note that I had to explicitly loop unroll the inner loop to get to this performance level, by using a pragma. It is honored by both clang and gcc and should be ignored by versions that do not support it. Using the -funroll-loops compiler command line switch on the source file resulted in a loop unrolling on a higher level that lead to a performance decrease instead of an increase. Signed-off-by: Magnus Karlsson Acked-by: John Fastabend --- drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 2 +- drivers/net/ethernet/intel/i40e/i40e_main.c | 4 +- drivers/net/ethernet/intel/i40e/i40e_txrx.c | 14 ++- drivers/net/ethernet/intel/i40e/i40e_txrx.h | 3 +- drivers/net/ethernet/intel/i40e/i40e_xsk.c | 127 ++++++++++++++++++------- 5 files changed, 110 insertions(+), 40 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c index 26ba1f3..dc34867 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c +++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c @@ -2025,7 +2025,7 @@ static int i40e_set_ringparam(struct net_device *netdev, */ tx_rings[i].desc = NULL; tx_rings[i].rx_bi = NULL; - err = i40e_setup_tx_descriptors(&tx_rings[i]); + err = i40e_setup_tx_descriptors(&tx_rings[i], false); if (err) { while (i) { i--; diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 4f8a2154..c93774a 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -3030,13 +3030,13 @@ static int i40e_vsi_setup_tx_resources(struct i40e_vsi *vsi) int i, err = 0; for (i = 0; i < vsi->num_queue_pairs && !err; i++) - err = i40e_setup_tx_descriptors(vsi->tx_rings[i]); + err = i40e_setup_tx_descriptors(vsi->tx_rings[i], false); if (!i40e_enabled_xdp_vsi(vsi)) return err; for (i = 0; i < vsi->num_queue_pairs && !err; i++) - err = i40e_setup_tx_descriptors(vsi->xdp_rings[i]); + err = i40e_setup_tx_descriptors(vsi->xdp_rings[i], true); return err; } diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c index d43ce13..3e13e0e 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -676,6 +676,8 @@ void i40e_free_tx_resources(struct i40e_ring *tx_ring) i40e_clean_tx_ring(tx_ring); kfree(tx_ring->tx_bi); tx_ring->tx_bi = NULL; + kfree(tx_ring->xsk_descs); + tx_ring->xsk_descs = NULL; if (tx_ring->desc) { dma_free_coherent(tx_ring->dev, tx_ring->size, @@ -1259,10 +1261,11 @@ void i40e_clean_programming_status(struct i40e_ring *rx_ring, u64 qword0_raw, /** * i40e_setup_tx_descriptors - Allocate the Tx descriptors * @tx_ring: the tx ring to set up + * @xdp_ring: true if this is an XDP Tx ring * * Return 0 on success, negative on error **/ -int i40e_setup_tx_descriptors(struct i40e_ring *tx_ring) +int i40e_setup_tx_descriptors(struct i40e_ring *tx_ring, bool xdp_ring) { struct device *dev = tx_ring->dev; int bi_size; @@ -1277,6 +1280,13 @@ int i40e_setup_tx_descriptors(struct i40e_ring *tx_ring) if (!tx_ring->tx_bi) goto err; + if (xdp_ring) { + tx_ring->xsk_descs = kcalloc(I40E_MAX_NUM_DESCRIPTORS, sizeof(*tx_ring->xsk_descs), + GFP_KERNEL); + if (!tx_ring->xsk_descs) + goto err; + } + u64_stats_init(&tx_ring->syncp); /* round up to nearest 4K */ @@ -1300,6 +1310,8 @@ int i40e_setup_tx_descriptors(struct i40e_ring *tx_ring) return 0; err: + kfree(tx_ring->xsk_descs); + tx_ring->xsk_descs = NULL; kfree(tx_ring->tx_bi); tx_ring->tx_bi = NULL; return -ENOMEM; diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h index 2feed92..628d5d7 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h @@ -389,6 +389,7 @@ struct i40e_ring { struct i40e_channel *ch; struct xdp_rxq_info xdp_rxq; struct xsk_buff_pool *xsk_pool; + struct xdp_desc *xsk_descs; /* For storing descriptors in the AF_XDP ZC path */ } ____cacheline_internodealigned_in_smp; static inline bool ring_uses_build_skb(struct i40e_ring *ring) @@ -451,7 +452,7 @@ bool i40e_alloc_rx_buffers(struct i40e_ring *rxr, u16 cleaned_count); netdev_tx_t i40e_lan_xmit_frame(struct sk_buff *skb, struct net_device *netdev); void i40e_clean_tx_ring(struct i40e_ring *tx_ring); void i40e_clean_rx_ring(struct i40e_ring *rx_ring); -int i40e_setup_tx_descriptors(struct i40e_ring *tx_ring); +int i40e_setup_tx_descriptors(struct i40e_ring *tx_ring, bool xdp_ring); int i40e_setup_rx_descriptors(struct i40e_ring *rx_ring); void i40e_free_tx_resources(struct i40e_ring *tx_ring); void i40e_free_rx_resources(struct i40e_ring *rx_ring); diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c index eabe1a3..515d856 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c @@ -383,6 +383,78 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget) return failure ? budget : (int)total_rx_packets; } +static void i40e_xmit_pkt(struct i40e_ring *xdp_ring, struct xdp_desc *desc, + unsigned int *total_bytes) +{ + struct i40e_tx_desc *tx_desc; + dma_addr_t dma; + + dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool, desc->addr); + xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma, desc->len); + + tx_desc = I40E_TX_DESC(xdp_ring, xdp_ring->next_to_use++); + tx_desc->buffer_addr = cpu_to_le64(dma); + tx_desc->cmd_type_offset_bsz = build_ctob(I40E_TX_DESC_CMD_ICRC | I40E_TX_DESC_CMD_EOP, + 0, desc->len, 0); + + *total_bytes += desc->len; +} + +/* This value should match the pragma below. Why 4? It is strictly + * empirical. It seems to be a good compromise between the advantage + * of having simultaneous outstanding reads to the DMA array that can + * hide each others latency and the disadvantage of having a larger + * code path. + */ +#define PKTS_PER_BATCH 4 + +static void i40e_xmit_pkt_batch(struct i40e_ring *xdp_ring, struct xdp_desc *desc, + unsigned int *total_bytes) +{ + u16 ntu = xdp_ring->next_to_use; + struct i40e_tx_desc *tx_desc; + dma_addr_t dma; + u32 i; + +#pragma GCC unroll 4 + for (i = 0; i < PKTS_PER_BATCH; i++) { + dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool, desc[i].addr); + xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma, desc[i].len); + + tx_desc = I40E_TX_DESC(xdp_ring, ntu++); + tx_desc->buffer_addr = cpu_to_le64(dma); + tx_desc->cmd_type_offset_bsz = build_ctob(I40E_TX_DESC_CMD_ICRC | + I40E_TX_DESC_CMD_EOP, + 0, desc[i].len, 0); + + *total_bytes += desc[i].len; + } + + xdp_ring->next_to_use = ntu; +} + +static void i40e_fill_tx_hw_ring(struct i40e_ring *xdp_ring, struct xdp_desc *descs, u32 nb_pkts, + unsigned int *total_bytes) +{ + u32 batched, leftover, i; + + batched = nb_pkts & ~(PKTS_PER_BATCH - 1); + leftover = nb_pkts & (PKTS_PER_BATCH - 1); + for (i = 0; i < batched; i += PKTS_PER_BATCH) + i40e_xmit_pkt_batch(xdp_ring, &descs[i], total_bytes); + for (i = batched; i < batched + leftover; i++) + i40e_xmit_pkt(xdp_ring, &descs[i], total_bytes); +} + +static void i40e_set_rs_bit(struct i40e_ring *xdp_ring) +{ + u16 ntu = xdp_ring->next_to_use ? xdp_ring->next_to_use - 1 : xdp_ring->count - 1; + struct i40e_tx_desc *tx_desc; + + tx_desc = I40E_TX_DESC(xdp_ring, ntu); + tx_desc->cmd_type_offset_bsz |= (I40E_TX_DESC_CMD_RS << I40E_TXD_QW1_CMD_SHIFT); +} + /** * i40e_xmit_zc - Performs zero-copy Tx AF_XDP * @xdp_ring: XDP Tx ring @@ -392,45 +464,30 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget) **/ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget) { - unsigned int sent_frames = 0, total_bytes = 0; - struct i40e_tx_desc *tx_desc = NULL; - struct xdp_desc desc; - dma_addr_t dma; - - while (budget-- > 0) { - if (!xsk_tx_peek_desc(xdp_ring->xsk_pool, &desc)) - break; - - dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool, desc.addr); - xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma, - desc.len); - - tx_desc = I40E_TX_DESC(xdp_ring, xdp_ring->next_to_use); - tx_desc->buffer_addr = cpu_to_le64(dma); - tx_desc->cmd_type_offset_bsz = - build_ctob(I40E_TX_DESC_CMD_ICRC - | I40E_TX_DESC_CMD_EOP, - 0, desc.len, 0); - - sent_frames++; - total_bytes += desc.len; - - xdp_ring->next_to_use++; - if (xdp_ring->next_to_use == xdp_ring->count) - xdp_ring->next_to_use = 0; + struct xdp_desc *descs = xdp_ring->xsk_descs; + u32 nb_pkts, nb_processed = 0; + unsigned int total_bytes = 0; + + nb_pkts = xsk_tx_peek_release_desc_batch(xdp_ring->xsk_pool, descs, budget); + if (!nb_pkts) + return false; + + if (xdp_ring->next_to_use + nb_pkts >= xdp_ring->count) { + nb_processed = xdp_ring->count - xdp_ring->next_to_use; + i40e_fill_tx_hw_ring(xdp_ring, descs, nb_processed, &total_bytes); + xdp_ring->next_to_use = 0; } - if (tx_desc) { - /* Request an interrupt for the last frame and bump tail ptr. */ - tx_desc->cmd_type_offset_bsz |= (I40E_TX_DESC_CMD_RS << - I40E_TXD_QW1_CMD_SHIFT); - i40e_xdp_ring_update_tail(xdp_ring); + i40e_fill_tx_hw_ring(xdp_ring, &descs[nb_processed], nb_pkts - nb_processed, + &total_bytes); - xsk_tx_release(xdp_ring->xsk_pool); - i40e_update_tx_stats(xdp_ring, sent_frames, total_bytes); - } + /* Request an interrupt for the last frame and bump tail ptr. */ + i40e_set_rs_bit(xdp_ring); + i40e_xdp_ring_update_tail(xdp_ring); + + i40e_update_tx_stats(xdp_ring, nb_pkts, total_bytes); - return !!budget; + return true; } /**