{"id":2221427,"url":"http://patchwork.ozlabs.org/api/1.0/patches/2221427/?format=json","project":{"id":26,"url":"http://patchwork.ozlabs.org/api/1.0/projects/26/?format=json","name":"Netfilter Development","link_name":"netfilter-devel","list_id":"netfilter-devel.vger.kernel.org","list_email":"netfilter-devel@vger.kernel.org","web_url":null,"scm_url":null,"webscm_url":null},"msgid":"<28eadbf14db58dd6e402325b62658a86d240e0f9.1775739840.git.daniel@makrotopia.org>","date":"2026-04-09T13:07:44","name":"[RFC,net-next,3/4] nf_flow_table: convert hw byte counts and update sub-interface stats","commit_ref":null,"pull_url":null,"state":"new","archived":false,"hash":"996f1127acbbdb4485b05a3c4b3686ea9c40835c","submitter":{"id":64091,"url":"http://patchwork.ozlabs.org/api/1.0/people/64091/?format=json","name":"Daniel Golle","email":"daniel@makrotopia.org"},"delegate":null,"mbox":"http://patchwork.ozlabs.org/project/netfilter-devel/patch/28eadbf14db58dd6e402325b62658a86d240e0f9.1775739840.git.daniel@makrotopia.org/mbox/","series":[{"id":499290,"url":"http://patchwork.ozlabs.org/api/1.0/series/499290/?format=json","date":"2026-04-09T13:07:22","name":"improve hw flow offload byte accounting","version":1,"mbox":"http://patchwork.ozlabs.org/series/499290/mbox/"}],"check":"pending","checks":"http://patchwork.ozlabs.org/api/patches/2221427/checks/","tags":{},"headers":{"Return-Path":"\n <netfilter-devel+bounces-11766-incoming=patchwork.ozlabs.org@vger.kernel.org>","X-Original-To":["incoming@patchwork.ozlabs.org","netfilter-devel@vger.kernel.org"],"Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org\n (client-ip=2600:3c04:e001:36c::12fc:5321; helo=tor.lore.kernel.org;\n envelope-from=netfilter-devel+bounces-11766-incoming=patchwork.ozlabs.org@vger.kernel.org;\n receiver=patchwork.ozlabs.org)","smtp.subspace.kernel.org;\n arc=none smtp.client-ip=185.142.180.65","smtp.subspace.kernel.org;\n dmarc=none (p=none dis=none) header.from=makrotopia.org","smtp.subspace.kernel.org;\n spf=pass smtp.mailfrom=makrotopia.org"],"Received":["from tor.lore.kernel.org (tor.lore.kernel.org\n [IPv6:2600:3c04:e001:36c::12fc:5321])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519 server-signature ECDSA (secp384r1) server-digest SHA384)\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4fs0hb3KRcz1yD3\n\tfor <incoming@patchwork.ozlabs.org>; Thu, 09 Apr 2026 23:12:15 +1000 (AEST)","from smtp.subspace.kernel.org (conduit.subspace.kernel.org\n [100.90.174.1])\n\tby tor.lore.kernel.org (Postfix) with ESMTP id A521B306AA51\n\tfor <incoming@patchwork.ozlabs.org>; Thu,  9 Apr 2026 13:08:08 +0000 (UTC)","from localhost.localdomain (localhost.localdomain [127.0.0.1])\n\tby smtp.subspace.kernel.org (Postfix) with ESMTP id 22BA53D0930;\n\tThu,  9 Apr 2026 13:07:55 +0000 (UTC)","from pidgin.makrotopia.org (pidgin.makrotopia.org [185.142.180.65])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby smtp.subspace.kernel.org (Postfix) with ESMTPS id 5DA073CF03E;\n\tThu,  9 Apr 2026 13:07:53 +0000 (UTC)","from local\n\tby pidgin.makrotopia.org with esmtpsa (TLS1.3:TLS_AES_256_GCM_SHA384:256)\n\t (Exim 4.99)\n\t(envelope-from <daniel@makrotopia.org>)\n\tid 1wAp6p-000000001km-2eEu;\n\tThu, 09 Apr 2026 13:07:47 +0000"],"ARC-Seal":"i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;\n\tt=1775740074; cv=none;\n b=jFDZgv9jNEBO2aBGHI3/XQmOEQ38X4Ot400rBaFk6HtY99ZhG62uXLwdQD1OdJiuFLdF8GLxmAmn0AfciyfeF8orftXH0EUkMekkDFfBvLMNDFwXm2qDUNBliaZ7724JXQkFH8Vr9etVKMM6UU79h+LQsMS3zBkgPwzBt3ijrPE=","ARC-Message-Signature":"i=1; a=rsa-sha256; d=subspace.kernel.org;\n\ts=arc-20240116; t=1775740074; c=relaxed/simple;\n\tbh=qgHyUF6g4crc16uF8ArziTxB513q2/NNff1YStmKGOY=;\n\th=Date:From:To:Subject:Message-ID:References:MIME-Version:\n\t Content-Type:Content-Disposition:In-Reply-To;\n b=gqG88ufibPn3bUWaF5sCWGSCDbJzjiFzB48ud6wQ5MAdufofo9le3/jplBPmdWGflKuAaGo+z5/y05gduNDKLCcn8u5pEhEZgVKyVqmZN1XwYCEABQ8LI6vjIQ29Qvhiaw468j08hL9bCL3EDYMXSY9HC3I6i63XC/6sLklhSTg=","ARC-Authentication-Results":"i=1; smtp.subspace.kernel.org;\n dmarc=none (p=none dis=none) header.from=makrotopia.org;\n spf=pass smtp.mailfrom=makrotopia.org; arc=none smtp.client-ip=185.142.180.65","Date":"Thu, 9 Apr 2026 14:07:44 +0100","From":"Daniel Golle <daniel@makrotopia.org>","To":"Felix Fietkau <nbd@nbd.name>, John Crispin <john@phrozen.org>,\n\tLorenzo Bianconi <lorenzo@kernel.org>,\n\tAndrew Lunn <andrew+netdev@lunn.ch>,\n\t\"David S. Miller\" <davem@davemloft.net>,\n\tEric Dumazet <edumazet@google.com>,\n\tJakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,\n\tMatthias Brugger <matthias.bgg@gmail.com>,\n\tAngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>,\n\tSimon Horman <horms@kernel.org>,\n\tPablo Neira Ayuso <pablo@netfilter.org>,\n\tFlorian Westphal <fw@strlen.de>, Phil Sutter <phil@nwl.cc>,\n\tnetdev@vger.kernel.org, linux-kernel@vger.kernel.org,\n\tlinux-arm-kernel@lists.infradead.org,\n\tlinux-mediatek@lists.infradead.org, netfilter-devel@vger.kernel.org,\n\tcoreteam@netfilter.org","Subject":"[PATCH RFC net-next 3/4] nf_flow_table: convert hw byte counts and\n update sub-interface stats","Message-ID":"\n <28eadbf14db58dd6e402325b62658a86d240e0f9.1775739840.git.daniel@makrotopia.org>","References":"<cover.1775739840.git.daniel@makrotopia.org>","Precedence":"bulk","X-Mailing-List":"netfilter-devel@vger.kernel.org","List-Id":"<netfilter-devel.vger.kernel.org>","List-Subscribe":"<mailto:netfilter-devel+subscribe@vger.kernel.org>","List-Unsubscribe":"<mailto:netfilter-devel+unsubscribe@vger.kernel.org>","MIME-Version":"1.0","Content-Type":"text/plain; charset=us-ascii","Content-Disposition":"inline","In-Reply-To":"<cover.1775739840.git.daniel@makrotopia.org>"},"content":"Hardware flow offload counters may report L2 frame bytes while\nconntrack expects L3 (IP) bytes. When a driver sets byte_type\nto INGRESS_L2 or EGRESS_L2, subtract the appropriate per-direction\nencap and tunnel overhead to derive L3 byte counts for conntrack.\n\nAdditionally, propagate per-flow stats to bridge, VLAN and PPPoE\nsub-interfaces that are bypassed by hardware offloading. Each\nsub-interface gets the L3 byte count plus the overhead of any\ninner encap layers below it, matching what the software path\nwould count. Both RX and TX directions are updated.\n\nSigned-off-by: Daniel Golle <daniel@makrotopia.org>\n---\n net/netfilter/nf_flow_table_offload.c | 174 +++++++++++++++++++++++++-\n 1 file changed, 172 insertions(+), 2 deletions(-)","diff":"diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c\nindex 002ec15d988bd..67452da487c94 100644\n--- a/net/netfilter/nf_flow_table_offload.c\n+++ b/net/netfilter/nf_flow_table_offload.c\n@@ -5,6 +5,8 @@\n #include <linux/netfilter.h>\n #include <linux/rhashtable.h>\n #include <linux/netdevice.h>\n+#include <linux/if_vlan.h>\n+#include <linux/if_pppox.h>\n #include <linux/tc_act/tc_csum.h>\n #include <net/flow_offload.h>\n #include <net/ip_tunnels.h>\n@@ -1008,10 +1010,135 @@ static void flow_offload_tuple_stats(struct flow_offload_work *offload,\n \t\t\t      &offload->flowtable->flow_block.cb_list);\n }\n \n+static int flow_offload_encap_hlen(const struct flow_offload_tuple *tuple,\n+\t\t\t\t   int idx)\n+{\n+\tswitch (tuple->encap[idx].proto) {\n+\tcase htons(ETH_P_8021Q):\n+\tcase htons(ETH_P_8021AD):\n+\t\treturn VLAN_HLEN;\n+\tcase htons(ETH_P_PPP_SES):\n+\t\treturn PPPOE_SES_HLEN;\n+\t}\n+\treturn 0;\n+}\n+\n+static void flow_offload_encap_netstats(struct net_device *dev,\n+\t\t\t\t\t__be16 encap_proto,\n+\t\t\t\t\tbool rx, u64 pkts, u64 bytes)\n+{\n+\tstruct pcpu_sw_netstats *tstats;\n+\tstruct vlan_pcpu_stats *vstats;\n+\n+\tif (encap_proto == htons(ETH_P_8021Q) ||\n+\t    encap_proto == htons(ETH_P_8021AD)) {\n+\t\tvstats = this_cpu_ptr(vlan_dev_priv(dev)->vlan_pcpu_stats);\n+\t\tu64_stats_update_begin(&vstats->syncp);\n+\t\tif (rx) {\n+\t\t\tu64_stats_add(&vstats->rx_packets, pkts);\n+\t\t\tu64_stats_add(&vstats->rx_bytes, bytes);\n+\t\t} else {\n+\t\t\tu64_stats_add(&vstats->tx_packets, pkts);\n+\t\t\tu64_stats_add(&vstats->tx_bytes, bytes);\n+\t\t}\n+\t\tu64_stats_update_end(&vstats->syncp);\n+\t} else if (dev->tstats) {\n+\t\ttstats = this_cpu_ptr(dev->tstats);\n+\t\tu64_stats_update_begin(&tstats->syncp);\n+\t\tif (rx) {\n+\t\t\tu64_stats_add(&tstats->rx_packets, pkts);\n+\t\t\tu64_stats_add(&tstats->rx_bytes, bytes);\n+\t\t} else {\n+\t\t\tu64_stats_add(&tstats->tx_packets, pkts);\n+\t\t\tu64_stats_add(&tstats->tx_bytes, bytes);\n+\t\t}\n+\t\tu64_stats_update_end(&tstats->syncp);\n+\t}\n+}\n+\n+/* Update sub-interface (VLAN, PPPoE) stats for hw-offloaded flows.\n+ *\n+ * The driver reports L3 (IP) bytes. Each sub-interface in the\n+ * software path sees the frame with the headers of all layers\n+ * BELOW it still present, so we add back inner-layer overhead.\n+ *\n+ * encap[] is ordered outermost to innermost, so walk from the\n+ * innermost layer outward, accumulating overhead as we go.\n+ */\n+static void flow_offload_update_encap_stats(struct flow_offload *flow,\n+\t\t\t\t\t    struct flow_offload_tuple *tuple,\n+\t\t\t\t\t    bool rx, u64 pkts, u64 bytes)\n+{\n+\tstruct net_device *dev;\n+\tint inner_hlen = 0;\n+\tint i;\n+\n+\tfor (i = tuple->encap_num - 1; i >= 0; i--) {\n+\t\tif (tuple->in_vlan_ingress & BIT(i))\n+\t\t\tcontinue;\n+\n+\t\tdev = dev_get_by_index_rcu(dev_net(flow->ct->ct_net),\n+\t\t\t\t\t   tuple->encap_ifidx[i]);\n+\t\tif (dev)\n+\t\t\tflow_offload_encap_netstats(dev,\n+\t\t\t\t\t\t    tuple->encap[i].proto, rx,\n+\t\t\t\t\t\t    pkts,\n+\t\t\t\t\t\t    bytes + inner_hlen * pkts);\n+\n+\t\tinner_hlen += flow_offload_encap_hlen(tuple, i);\n+\t}\n+\n+\t/* Bridge device sits outside all encap layers -- it sees\n+\t * L3 bytes plus the full encap overhead.\n+\t */\n+\tif (tuple->bridge_ifidx) {\n+\t\tdev = dev_get_by_index_rcu(dev_net(flow->ct->ct_net),\n+\t\t\t\t\t   tuple->bridge_ifidx);\n+\t\tif (dev && dev->tstats)\n+\t\t\tflow_offload_encap_netstats(dev, 0, rx, pkts,\n+\t\t\t\t\t\t    bytes + inner_hlen * pkts);\n+\t}\n+}\n+\n+/* Compute per-direction input overhead from the encap and tunnel\n+ * chains. Hardware flow counters report L2 frame bytes but\n+ * conntrack expects L3 (inner IP) bytes -- matching what the\n+ * software path sees after stripping all encap and tunnel headers.\n+ */\n+static int flow_offload_input_l2_overhead(struct flow_offload_tuple *tuple)\n+{\n+\tint overhead = ETH_HLEN;\n+\tint i;\n+\n+\tfor (i = 0; i < tuple->encap_num; i++) {\n+\t\tif (tuple->in_vlan_ingress & BIT(i))\n+\t\t\tcontinue;\n+\n+\t\toverhead += flow_offload_encap_hlen(tuple, i);\n+\t}\n+\n+\tif (tuple->tun_num) {\n+\t\tswitch (tuple->tun.l3_proto) {\n+\t\tcase IPPROTO_IPIP:\n+\t\t\toverhead += sizeof(struct iphdr);\n+\t\t\tbreak;\n+\t\tcase IPPROTO_IPV6:\n+\t\t\toverhead += sizeof(struct ipv6hdr);\n+\t\t\tbreak;\n+\t\t}\n+\t}\n+\n+\treturn overhead;\n+}\n+\n static void flow_offload_work_stats(struct flow_offload_work *offload)\n {\n+\tstruct flow_offload_tuple *tuple;\n \tstruct flow_stats stats[FLOW_OFFLOAD_DIR_MAX] = {};\n+\tu64 l3_bytes[FLOW_OFFLOAD_DIR_MAX];\n+\tint l2_overhead;\n \tu64 lastused;\n+\tint i;\n \n \tflow_offload_tuple_stats(offload, FLOW_OFFLOAD_DIR_ORIGINAL, &stats[0]);\n \tif (test_bit(NF_FLOW_HW_BIDIRECTIONAL, &offload->flow->flags))\n@@ -1022,16 +1149,59 @@ static void flow_offload_work_stats(struct flow_offload_work *offload)\n \toffload->flow->timeout = max_t(u64, offload->flow->timeout,\n \t\t\t\t       lastused + flow_offload_get_timeout(offload->flow));\n \n+\t/* Convert hardware byte counts to L3 based on what the driver\n+\t * reports.  Drivers that already report L3 (or do not set\n+\t * byte_type) need no conversion.\n+\t */\n+\tfor (i = 0; i < FLOW_OFFLOAD_DIR_MAX; i++) {\n+\t\tl2_overhead = 0;\n+\n+\t\tswitch (stats[i].byte_type) {\n+\t\tcase FLOW_STATS_BYTES_INGRESS_L2:\n+\t\t\ttuple = &offload->flow->tuplehash[i].tuple;\n+\t\t\tl2_overhead = flow_offload_input_l2_overhead(tuple);\n+\t\t\tbreak;\n+\t\tcase FLOW_STATS_BYTES_EGRESS_L2:\n+\t\t\ttuple = &offload->flow->tuplehash[!i].tuple;\n+\t\t\tl2_overhead = flow_offload_input_l2_overhead(tuple);\n+\t\t\tbreak;\n+\t\tdefault:\n+\t\t\tbreak;\n+\t\t}\n+\t\tl3_bytes[i] = stats[i].bytes - stats[i].pkts * l2_overhead;\n+\t}\n+\n \tif (offload->flowtable->flags & NF_FLOWTABLE_COUNTER) {\n \t\tif (stats[0].pkts)\n \t\t\tnf_ct_acct_add(offload->flow->ct,\n \t\t\t\t       FLOW_OFFLOAD_DIR_ORIGINAL,\n-\t\t\t\t       stats[0].pkts, stats[0].bytes);\n+\t\t\t\t       stats[0].pkts, l3_bytes[0]);\n \t\tif (stats[1].pkts)\n \t\t\tnf_ct_acct_add(offload->flow->ct,\n \t\t\t\t       FLOW_OFFLOAD_DIR_REPLY,\n-\t\t\t\t       stats[1].pkts, stats[1].bytes);\n+\t\t\t\t       stats[1].pkts, l3_bytes[1]);\n+\t}\n+\n+\trcu_read_lock();\n+\tfor (i = 0; i < FLOW_OFFLOAD_DIR_MAX; i++) {\n+\t\ttuple = &offload->flow->tuplehash[i].tuple;\n+\t\tif (!tuple->encap_num)\n+\t\t\tcontinue;\n+\n+\t\t/* Input-side encap devices get RX stats */\n+\t\tif (stats[i].pkts)\n+\t\t\tflow_offload_update_encap_stats(offload->flow,\n+\t\t\t\t\t\t\ttuple, true,\n+\t\t\t\t\t\t\tstats[i].pkts,\n+\t\t\t\t\t\t\tl3_bytes[i]);\n+\t\t/* Same devices get TX stats from the other direction */\n+\t\tif (stats[!i].pkts)\n+\t\t\tflow_offload_update_encap_stats(offload->flow,\n+\t\t\t\t\t\t\ttuple, false,\n+\t\t\t\t\t\t\tstats[!i].pkts,\n+\t\t\t\t\t\t\tl3_bytes[!i]);\n \t}\n+\trcu_read_unlock();\n }\n \n static void flow_offload_work_handler(struct work_struct *work)\n","prefixes":["RFC","net-next","3/4"]}