From patchwork Wed Nov 18 01:52:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Ramsay, Lincoln" X-Patchwork-Id: 1401930 Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=reject dis=none) header.from=digi.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=digi.com header.i=@digi.com header.a=rsa-sha256 header.s=selector1 header.b=LMOkGeuh; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4CbRXZ5H0Mz9sT6 for ; Wed, 18 Nov 2020 13:27:10 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726433AbgKRC0t (ORCPT ); Tue, 17 Nov 2020 21:26:49 -0500 Received: from outbound-ip24a.ess.barracuda.com ([209.222.82.206]:43922 "EHLO outbound-ip24a.ess.barracuda.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725790AbgKRC0s (ORCPT ); Tue, 17 Nov 2020 21:26:48 -0500 X-Greylist: delayed 2027 seconds by postgrey-1.27 at vger.kernel.org; Tue, 17 Nov 2020 21:26:43 EST Received: from NAM11-BN8-obe.outbound.protection.outlook.com (mail-bn8nam11lp2175.outbound.protection.outlook.com [104.47.58.175]) by mx4.us-east-2a.ess.aws.cudaops.com (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 18 Nov 2020 02:26:42 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=FbHPcPJYFYdWk3bBc7NEf0Bu92WOWPMKA/NmWAT9paU8/LWfIcKYSGAYgrWOaivnHXadxAN41cCPLWlDeOY0z4WYwTzIABfVK2Rp6aBjUPMTKXY2/BvCZHjcS9XL0uKFxBzigqdLTAXPBFW9hRa90Xry+Jgx5bNr6QHTSdadpguDr9AOCf6wMTznBofx5joxlRHHc5TvIY4Lu+mAzmjml6ulMIA1gDcNneVBBhgbWWA+t9N39uPMp1MLUKmbAslAkNYYpNbHAVFPW6sldAIdzktWY3Ftn6zScGhd9D86BFUxd0ElHh/wYR/UUydXiL16mZfFXJFXANBmFVu8dQEHpQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=qjOJ3ixApKYLyF9IN1uNkX57EwqNGvppEguiOiB7e5M=; b=b1Ejz7zfoW3XyLuxZlFabrP2sgFLha4WBH9Qj6xsvgcd2NQ6bho5HESQr3fT2x02KRfW5oSBrHXLwDtEG+RlRJMciFaYtp/2CJfrhZXaO6/is5t80NztPRaosRfcOdtciVaCI3hV0YxFW5S036OB54LndwdPrto2qGGy6u3dV+zXhHSFBxQvfj0UIvVie2XoS3Wc93DX+w3rwQHuMjeyvFgKkxtS2eRBA3D1UJMdUoVats9C2fXNbwIAyZ4MK91rwJOeLfJLOIaxR1RuOMgscdgLkmutQUC1Z5HgBEMfH1Ejt4o9x53uUe6Nk2t77hv0/6o4R9UEihOhhKEYUE/e9w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=digi.com; dmarc=pass action=none header.from=digi.com; dkim=pass header.d=digi.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=digi.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=qjOJ3ixApKYLyF9IN1uNkX57EwqNGvppEguiOiB7e5M=; b=LMOkGeuhjxiR+OV9DR3QUbicCeQOAjKGhmJsFx/yPGpEajRZxkeQLOJZVUVCcnHCV6NSX1SlU9UaK2Mji4Ym92DZ24W5rdambDeR8i2BwvuXqddJIzyJu3hMHYozpmFswhHFEvV64szTE2Yj7iejWHII6Z9WIOkY6NKyo4UQvbY= Received: from CY4PR1001MB2311.namprd10.prod.outlook.com (2603:10b6:910:44::24) by CY4PR10MB1942.namprd10.prod.outlook.com (2603:10b6:903:122::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3564.28; Wed, 18 Nov 2020 01:52:49 +0000 Received: from CY4PR1001MB2311.namprd10.prod.outlook.com ([fe80::a956:bdc0:5119:197]) by CY4PR1001MB2311.namprd10.prod.outlook.com ([fe80::a956:bdc0:5119:197%6]) with mapi id 15.20.3564.028; Wed, 18 Nov 2020 01:52:49 +0000 From: "Ramsay, Lincoln" To: Igor Russkikh , "David S. Miller" , Jakub Kicinski , "netdev@vger.kernel.org" Subject: [PATCH] aquantia: Reserve space when allocating an SKB Thread-Topic: [PATCH] aquantia: Reserve space when allocating an SKB Thread-Index: AQHWvUcksksVLDpkhU+MIoNHoOO3dA== Date: Wed, 18 Nov 2020 01:52:49 +0000 Message-ID: Accept-Language: en-AU, en-US Content-Language: en-AU X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: marvell.com; dkim=none (message not signed) header.d=none;marvell.com; dmarc=none action=none header.from=digi.com; x-originating-ip: [158.140.192.185] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: b89206dd-aba4-46ff-256e-08d88b64a792 x-ms-traffictypediagnostic: CY4PR10MB1942: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:195; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: QG9HCC2UremGzU6qylxYrl7QNkWxhQRZjQWammsn0ap8huvvCCnQQn1dMIoW+l1i/pGZVnOy1bCElwn0l2lQTbnOHQP59zLgMCbKCsZrkn48Wf68f4lDPJ2tVyW6lRGP1/wl6y2lW3QhUAiMXsV0kcsKtz3aQpMlxCC2VKvX9MeGQtwlWkMq+Q0ya42Oz1qe/4ji4QjS9OsEM0PW3pgXCpuzRDNmgh9IDYqAExfxJfbJzchB1qn23cq7onHDuOAqvdhee0OR8gLh3UoExgt1LnvPb6D+xnbewd6auEN1FmQYGBMhsi6ZC9tVA6roCbxfRJPYbjCoS7KXh9n4cKnbcQ== x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CY4PR1001MB2311.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(4636009)(396003)(366004)(39850400004)(136003)(346002)(376002)(66556008)(6506007)(55016002)(110136005)(2906002)(66476007)(52536014)(5660300002)(9686003)(33656002)(316002)(26005)(71200400001)(66446008)(64756008)(186003)(45080400002)(91956017)(478600001)(66946007)(83380400001)(76116006)(8936002)(8676002)(7696005)(4001150100001)(86362001);DIR:OUT;SFP:1102; x-ms-exchange-antispam-messagedata: wCjCv5IKdRzMLIJo9sKx0UlBzdeMmYh8HmoFC2p4hniFrur2Txrk4XFz+Fto9B0ZubFUPQGppz/ywQshNuDB+xK9N5uQd9b7YZZKL4YP83YlJRzCBqxtjEAT/REXFxWdhknEsvsxcmUsaUszCwWJ9b6+y/yN18CXdMq9+FaSa4qdsgnlo/dU7EjN8z6UyL/a6V24pF6xrlHx0VUpL6thuFj0rvWqTWUwmSPSCjOTUBvOhrJo1VzpzSbANB0pwRGsfq3161HXJYS0SquADC9VB8bLIddg2mznmt8QzQ0zseiBgw9KYh86+cCA+HudjvDVwYMq6ON64ZHAMfjnKmCNs9qdgyA/63LcErI0ZouZsz2uwRP1CydrnVSwYVy8dZtYserF+qF8tDDX0jrwZKJWjPjWhMonvkw7B6dTH6LTMjk70v7ViCLSLDoXM/h38eo3v2y7GFys6i5K9aST3XygRY9sjyfZq15gWDC+KxvVMR4fghxAorDpoa4MEJDj9gjqjNnioylwsOWaigZfHP2lmDC98vOQqWugWOM0Gi2BdhQ60r5jnjUKp9NvlIZ3RL8Mk19oFgHZGmdAd6JjzTJiqPZ0pzLd96V/0IgINk6E6De6qofm1gEM4fZe/1R+QjS0fFx39UBiJdUZA7uqHLf/rz+a7/Cm/tI5FbQVrPw/uB6Gh6WAJQuq3DCKDqlT/f7Sfdf4tRGuWpu28gJ+g81WC+or75GD5p1DSMH0Jpb8r6ave/FK3zhO3qUnGtw1d7nZdCc4OO6RCpL14+YvqdYGKVNGfGCnf20qZ3fvlI4REKgJKS0mUivtXdeP5rHLjVoc9H00mQfPVu4rC5x0Eeo0vIyEGB+IPbzi4i6BvG1czZRpiX4o4ikH9vUWSZajFJLc5p60KNmccXDA73RLKXKmLQ== x-ms-exchange-transport-forked: True MIME-Version: 1.0 X-OriginatorOrg: digi.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: CY4PR1001MB2311.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: b89206dd-aba4-46ff-256e-08d88b64a792 X-MS-Exchange-CrossTenant-originalarrivaltime: 18 Nov 2020 01:52:49.5479 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: abb4cdb7-1b7e-483e-a143-7ebfd1184b9e X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: lafP/Fzce5I/m5lCzGwuP5xKpsixsc5NCo+/6yPumWea7bgXDbMnKT+x6ohR+L/Txy9UG22SGRngypNnVJrChQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR10MB1942 X-BESS-ID: 1605666402-893006-31823-22598-1 X-BESS-VER: 2019.1_20201117.2109 X-BESS-Apparent-Source-IP: 104.47.58.175 X-BESS-Outbound-Spam-Score: 0.00 X-BESS-Outbound-Spam-Report: Code version 3.2, rules version 3.2.2.228268 [from cloudscan13-77.us-east-2a.ess.aws.cudaops.com] Rule breakdown below pts rule name description ---- ---------------------- -------------------------------- 0.00 BSF_BESS_OUTBOUND META: BESS Outbound X-BESS-Outbound-Spam-Status: SCORE=0.00 using account:ESS112744 scores of KILL_LEVEL=7.0 tests=BSF_BESS_OUTBOUND X-BESS-BRTS-Status: 1 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org When performing IPv6 forwarding, there is an expectation that SKBs will have some headroom. When forwarding a packet from the aquantia driver, this does not always happen, triggering a kernel warning. It was observed that napi_alloc_skb and other ethernet drivers reserve (NET_SKB_PAD + NET_IP_ALIGN) bytes in new SKBs. Do this when calling build_skb as well. Signed-off-by: Lincoln Ramsay --- We have an Aquantia 10G ethernet interface in one of our devices. While testing a new feature, we discovered a problem with it. The problem only shows up in a very specific situation however. We are using firewalld as a frontend to nftables. It sets up port forwarding (eg. incoming port 5022 -> other_machine:22). We also use masquerading on the outgoing packet, although I'm not sure this is relevant to the issue. IPv4 works fine, IPv6 is a problem. The bug is triggered by trying to hit this forwarded port (ssh -p 5022 addr). It is 100% reproducible. The problem is that we get a kernel warning. It is triggered by this line in neighbour.h:     if (WARN_ON_ONCE(skb_headroom(skb) < hh_alen)) { It seems that skb_headroom is only 14, when it is expected to be >= 16. 2020-10-19 21:24:24 DEBUG   [console] ------------[ cut here ]------------ 2020-10-19 21:24:24 DEBUG   [console] WARNING: CPU: 3 PID: 0 at include/net/neighbour.h:493 ip6_finish_output2+0x538/0x580 2020-10-19 21:24:24 DEBUG   [console] Modules linked in: xt_addrtype xt_MASQUERADE iptable_filter iptable_nat ip6table_raw ip6_tables xt_CT xt_tcpudp iptable_raw ip_tables nf_nat_tftp nft_nat nft_masq nft_objref nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_chain_nat nf_nat xfrm_user nf_conntrack_tftp nf_tables_set x_tables nft_ct nf_tables nfnetlink amd_spirom_nor(O) spi_nor(O) mtd(O) atlantic nct5104_wdt(O) gpio_amd(O) nct7491(O) sch_fq_codel tun qmi_wwan usbnet mii qcserial usb_wwan qcaux nsh nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 i2c_dev cdc_wdm br_netfilter bridge stp llc [last unloaded: nft_reject] 2020-10-19 21:24:24 DEBUG   [console] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G           O      5.4.65-og #1 2020-10-19 21:24:24 DEBUG   [console] RIP: 0010:ip6_finish_output2+0x538/0x580 2020-10-19 21:24:24 DEBUG   [console] Code: 87 e9 fc ff ff 44 89 fa 48 89 74 24 20 48 29 d7 e8 2d 4f 0c 00 48 8b 74 24 20 e9 cf fc ff ff 41 bf 10 00 00 00 e9 c4 fc ff ff <0f> 0b 4c 89 ef 41 bc 01 00 00 00 e8 d8 89 f0 ff e9 ee fc ff ff e8 2020-10-19 21:24:24 DEBUG   [console] RSP: 0018:ffffac2040114ab0 EFLAGS: 00010212 2020-10-19 21:24:24 DEBUG   [console] RAX: ffff9c041a0bf00e RBX: 000000000000000e RCX: ffff9c041a0bf00e 2020-10-19 21:24:24 DEBUG   [console] RDX: 000000000000000e RSI: ffff9c03ddf606c8 RDI: 0000000000000000 2020-10-19 21:24:24 DEBUG   [console] RBP: ffffac2040114b38 R08: 00000000f2000000 R09: 0000000002ec5955 2020-10-19 21:24:24 DEBUG   [console] R10: ffff9c041e57a440 R11: 000000000000000a R12: ffff9c03ddf60600 2020-10-19 21:24:24 DEBUG   [console] R13: ffff9c03dcf24800 R14: 0000000000000000 R15: 0000000000000010 2020-10-19 21:24:24 DEBUG   [console] FS:  0000000000000000(0000) GS:ffff9c0426b80000(0000) knlGS:0000000000000000 2020-10-19 21:24:24 DEBUG   [console] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2020-10-19 21:24:24 DEBUG   [console] CR2: 0000000000a0b4d8 CR3: 0000000222054000 CR4: 00000000000406e0 2020-10-19 21:24:24 DEBUG   [console] Call Trace: 2020-10-19 21:24:24 DEBUG   [console]   2020-10-19 21:24:24 DEBUG   [console]  ? ipv6_confirm+0x85/0xf0 [nf_conntrack] 2020-10-19 21:24:24 DEBUG   [console]  ip6_output+0x67/0x130 2020-10-19 21:24:24 DEBUG   [console]  ? __ip6_finish_output+0x110/0x110 2020-10-19 21:24:24 DEBUG   [console]  ip6_forward+0x582/0x920 2020-10-19 21:24:24 DEBUG   [console]  ? ip6_frag_init+0x40/0x40 2020-10-19 21:24:24 DEBUG   [console]  ip6_sublist_rcv_finish+0x33/0x50 2020-10-19 21:24:24 DEBUG   [console]  ip6_sublist_rcv+0x212/0x240 2020-10-19 21:24:24 DEBUG   [console]  ? ip6_rcv_finish_core.isra.0+0xc0/0xc0 2020-10-19 21:24:24 DEBUG   [console]  ipv6_list_rcv+0x116/0x140 2020-10-19 21:24:24 DEBUG   [console]  __netif_receive_skb_list_core+0x1b1/0x260 2020-10-19 21:24:24 DEBUG   [console]  netif_receive_skb_list_internal+0x1ba/0x2d0 2020-10-19 21:24:24 DEBUG   [console]  ? napi_gro_receive+0x50/0x90 2020-10-19 21:24:24 DEBUG   [console]  gro_normal_list.part.0+0x14/0x30 2020-10-19 21:24:24 DEBUG   [console]  napi_complete_done+0x81/0x100 2020-10-19 21:24:24 DEBUG   [console]  aq_vec_poll+0x166/0x190 [atlantic] 2020-10-19 21:24:24 DEBUG   [console]  net_rx_action+0x12b/0x2f0 2020-10-19 21:24:24 DEBUG   [console]  __do_softirq+0xd1/0x213 2020-10-19 21:24:24 DEBUG   [console]  irq_exit+0xc8/0xd0 2020-10-19 21:24:24 DEBUG   [console]  do_IRQ+0x48/0xd0 2020-10-19 21:24:24 DEBUG   [console]  common_interrupt+0xf/0xf 2020-10-19 21:24:24 DEBUG   [console]   2020-10-19 21:24:24 DEBUG   [console] ---[ end trace c1cba758301d342f ]--- After much hunting and debugging, I think I have figured out the issue here. aq_ring.c has this code (edited slightly for brevity): if (buff->is_eop && buff->len <= AQ_CFG_RX_FRAME_MAX - AQ_SKB_ALIGN) {     skb = build_skb(aq_buf_vaddr(&buff->rxdata), AQ_CFG_RX_FRAME_MAX);     skb_put(skb, buff->len); } else {     skb = napi_alloc_skb(napi, AQ_CFG_RX_HDR_SIZE); There is a significant difference between the SKB produced by these 2 code paths. When napi_alloc_skb creates an SKB, there is a certain amount of headroom reserved. The same pattern appears to be used in all of the other ethernet drivers I have looked at. However, this is not done in the build_skb codepath. I believe that this is the ultimate cause of the warning we are seeing. I have created a patch to create some headroom in the SKB. The logic is inspired by the igb driver. This was originally developed against Linux 5.4, then migrated to Linux 5.8. It has been tested on our product against both versions. The patch below was migrated to Linux master (some context changed, but otherwise it applied cleanly). -- 2.17.1 diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c index 4f913658eea4..57150e3d3257 100644 --- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c +++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c @@ -16,6 +16,8 @@ #include #include +#define AQ_SKB_PAD (NET_SKB_PAD + NET_IP_ALIGN) + static inline void aq_free_rxpage(struct aq_rxpage *rxpage, struct device *dev) { unsigned int len = PAGE_SIZE << rxpage->order; @@ -47,7 +49,7 @@ static int aq_get_rxpage(struct aq_rxpage *rxpage, unsigned int order, rxpage->page = page; rxpage->daddr = daddr; rxpage->order = order; - rxpage->pg_off = 0; + rxpage->pg_off = AQ_SKB_PAD; return 0; @@ -67,8 +69,8 @@ static int aq_get_rxpages(struct aq_ring_s *self, struct aq_ring_buff_s *rxbuf, /* One means ring is the only user and can reuse */ if (page_ref_count(rxbuf->rxdata.page) > 1) { /* Try reuse buffer */ - rxbuf->rxdata.pg_off += AQ_CFG_RX_FRAME_MAX; - if (rxbuf->rxdata.pg_off + AQ_CFG_RX_FRAME_MAX <= + rxbuf->rxdata.pg_off += AQ_CFG_RX_FRAME_MAX + AQ_SKB_PAD; + if (rxbuf->rxdata.pg_off + AQ_CFG_RX_FRAME_MAX + AQ_SKB_PAD <= (PAGE_SIZE << order)) { u64_stats_update_begin(&self->stats.rx.syncp); self->stats.rx.pg_flips++; @@ -84,7 +86,7 @@ static int aq_get_rxpages(struct aq_ring_s *self, struct aq_ring_buff_s *rxbuf, u64_stats_update_end(&self->stats.rx.syncp); } } else { - rxbuf->rxdata.pg_off = 0; + rxbuf->rxdata.pg_off = AQ_SKB_PAD; u64_stats_update_begin(&self->stats.rx.syncp); self->stats.rx.pg_reuses++; u64_stats_update_end(&self->stats.rx.syncp); @@ -416,8 +418,8 @@ int aq_ring_rx_clean(struct aq_ring_s *self, /* for single fragment packets use build_skb() */ if (buff->is_eop && buff->len <= AQ_CFG_RX_FRAME_MAX - AQ_SKB_ALIGN) { - skb = build_skb(aq_buf_vaddr(&buff->rxdata), - AQ_CFG_RX_FRAME_MAX); + skb = build_skb(aq_buf_vaddr(&buff->rxdata) - AQ_SKB_PAD, + AQ_CFG_RX_FRAME_MAX + AQ_SKB_PAD); if (unlikely(!skb)) { u64_stats_update_begin(&self->stats.rx.syncp); self->stats.rx.skb_alloc_fails++; @@ -425,6 +427,7 @@ int aq_ring_rx_clean(struct aq_ring_s *self, err = -ENOMEM; goto err_exit; } + skb_reserve(skb, AQ_SKB_PAD); if (is_ptp_ring) buff->len -= aq_ptp_extract_ts(self->aq_nic, skb,