From patchwork Wed Mar 21 20:36:49 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yonghong Song X-Patchwork-Id: 889065 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=fb.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=fb.com header.i=@fb.com header.b="CHstE0S4"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 4061nL2glgz9s0t for ; Thu, 22 Mar 2018 07:37:10 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753324AbeCUUhG (ORCPT ); Wed, 21 Mar 2018 16:37:06 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:44506 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753029AbeCUUgw (ORCPT ); Wed, 21 Mar 2018 16:36:52 -0400 Received: from pps.filterd (m0109334.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w2LKX3Pk025311 for ; Wed, 21 Mar 2018 13:36:52 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=nOb1K3LThUE6P/i10kQSu5HZE+WwPqfLKfQe9nC7jEw=; b=CHstE0S4ZtCiVUmqcVu79CNm0wE03VSXaQrheN1jDStLvklSbDA8uo2JLjn5Gbu/cpKX EWcjVmbU5HObi2yQ7ATNtVMDnragEh59RMzP12LZyw6qkQRd1+68vhRiuSk+eUbvZiMU PhNB6XIU7eLR3wj9d4RlkCdVAkVbJ+Jr8hU= Received: from mail.thefacebook.com ([199.201.64.23]) by mx0a-00082601.pphosted.com with ESMTP id 2gux7x82fd-1 (version=TLSv1 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Wed, 21 Mar 2018 13:36:52 -0700 Received: from mx-out.facebook.com (192.168.52.123) by PRN-CHUB14.TheFacebook.com (192.168.16.24) with Microsoft SMTP Server id 14.3.361.1; Wed, 21 Mar 2018 13:36:51 -0700 Received: by devbig474.prn1.facebook.com (Postfix, from userid 128203) id 2ACD6E40CFB; Wed, 21 Mar 2018 13:36:51 -0700 (PDT) Smtp-Origin-Hostprefix: devbig From: Yonghong Song Smtp-Origin-Hostname: devbig474.prn1.facebook.com To: , , , , CC: Smtp-Origin-Cluster: prn1c29 Subject: [PATCH net-next v5 1/2] net: permit skb_segment on head_frag frag_list skb Date: Wed, 21 Mar 2018 13:36:49 -0700 Message-ID: <20180321203650.1404106-2-yhs@fb.com> X-Mailer: git-send-email 2.9.5 In-Reply-To: <20180321203650.1404106-1-yhs@fb.com> References: <20180321203650.1404106-1-yhs@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2018-03-21_09:, , signatures=0 X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org One of our in-house projects, bpf-based NAT, hits a kernel BUG_ON at function skb_segment(), line 3667. The bpf program attaches to clsact ingress, calls bpf_skb_change_proto to change protocol from ipv4 to ipv6 or from ipv6 to ipv4, and then calls bpf_redirect to send the changed packet out. 3472 struct sk_buff *skb_segment(struct sk_buff *head_skb, 3473 netdev_features_t features) 3474 { 3475 struct sk_buff *segs = NULL; 3476 struct sk_buff *tail = NULL; ... 3665 while (pos < offset + len) { 3666 if (i >= nfrags) { 3667 BUG_ON(skb_headlen(list_skb)); 3668 3669 i = 0; 3670 nfrags = skb_shinfo(list_skb)->nr_frags; 3671 frag = skb_shinfo(list_skb)->frags; 3672 frag_skb = list_skb; ... call stack: ... #1 [ffff883ffef03558] __crash_kexec at ffffffff8110c525 #2 [ffff883ffef03620] crash_kexec at ffffffff8110d5cc #3 [ffff883ffef03640] oops_end at ffffffff8101d7e7 #4 [ffff883ffef03668] die at ffffffff8101deb2 #5 [ffff883ffef03698] do_trap at ffffffff8101a700 #6 [ffff883ffef036e8] do_error_trap at ffffffff8101abfe #7 [ffff883ffef037a0] do_invalid_op at ffffffff8101acd0 #8 [ffff883ffef037b0] invalid_op at ffffffff81a00bab [exception RIP: skb_segment+3044] RIP: ffffffff817e4dd4 RSP: ffff883ffef03860 RFLAGS: 00010216 RAX: 0000000000002bf6 RBX: ffff883feb7aaa00 RCX: 0000000000000011 RDX: ffff883fb87910c0 RSI: 0000000000000011 RDI: ffff883feb7ab500 RBP: ffff883ffef03928 R8: 0000000000002ce2 R9: 00000000000027da R10: 000001ea00000000 R11: 0000000000002d82 R12: ffff883f90a1ee80 R13: ffff883fb8791120 R14: ffff883feb7abc00 R15: 0000000000002ce2 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff883ffef03930] tcp_gso_segment at ffffffff818713e7 --- --- ... The triggering input skb has the following properties: list_skb = skb->frag_list; skb->nfrags != NULL && skb_headlen(list_skb) != 0 and skb_segment() is not able to handle a frag_list skb if its headlen (list_skb->len - list_skb->data_len) is not 0. This patch addressed the issue by handling skb_headlen(list_skb) != 0 case properly if list_skb->head_frag is true, which is expected in most cases. The head frag is processed before list_skb->frags are processed. Reported-by: Diptanu Gon Choudhury Signed-off-by: Yonghong Song --- net/core/skbuff.c | 26 ++++++++++++++++++++------ 1 file changed, 20 insertions(+), 6 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 715c134..23b317a 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3460,6 +3460,19 @@ void *skb_pull_rcsum(struct sk_buff *skb, unsigned int len) } EXPORT_SYMBOL_GPL(skb_pull_rcsum); +static inline skb_frag_t skb_head_frag_to_page_desc(struct sk_buff *frag_skb) +{ + skb_frag_t head_frag; + struct page *page; + + page = virt_to_head_page(frag_skb->head); + head_frag.page.p = page; + head_frag.page_offset = frag_skb->data - + (unsigned char *)page_address(page); + head_frag.size = skb_headlen(frag_skb); + return head_frag; +} + /** * skb_segment - Perform protocol segmentation on skb. * @head_skb: buffer to segment @@ -3664,15 +3677,16 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, while (pos < offset + len) { if (i >= nfrags) { - BUG_ON(skb_headlen(list_skb)); - i = 0; nfrags = skb_shinfo(list_skb)->nr_frags; frag = skb_shinfo(list_skb)->frags; - frag_skb = list_skb; - - BUG_ON(!nfrags); + if (skb_headlen(list_skb)) { + BUG_ON(!list_skb->head_frag); + /* to make room for head_frag. */ + i--; frag--; + } + frag_skb = list_skb; if (skb_orphan_frags(frag_skb, GFP_ATOMIC) || skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC)) @@ -3689,7 +3703,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, goto err; } - *nskb_frag = *frag; + *nskb_frag = (i < 0) ? skb_head_frag_to_page_desc(frag_skb) : *frag; __skb_frag_ref(nskb_frag); size = skb_frag_size(nskb_frag); From patchwork Wed Mar 21 20:36:50 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yonghong Song X-Patchwork-Id: 889066 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=fb.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=fb.com header.i=@fb.com header.b="PSNzawg6"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 4061nV55H2z9s0v for ; Thu, 22 Mar 2018 07:37:18 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753338AbeCUUhQ (ORCPT ); Wed, 21 Mar 2018 16:37:16 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:40478 "EHLO mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752954AbeCUUhM (ORCPT ); Wed, 21 Mar 2018 16:37:12 -0400 Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w2LKY4Bm002526 for ; Wed, 21 Mar 2018 13:37:11 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=3rdLPTgJNb0AER5PJWEVrYY3YAF9U/mmbgP74jEjHlc=; b=PSNzawg63AKEhOosWagYYA04U8yN/tStvHEa49IiBJjKbV/JI1bbcR7Mcxupb3dvRu5a i4FVaHMLDRyM2q3xU2tdSt0QgZdf+FkgBZRl1zAb0Z1RfyNLVR9OZj3e4rfcvKmV2MCh v96m2Uwv+n9CRMFGo54FOtNf98DIQFHrvlA= Received: from mail.thefacebook.com ([199.201.64.23]) by mx0a-00082601.pphosted.com with ESMTP id 2guwwb846t-3 (version=TLSv1 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Wed, 21 Mar 2018 13:37:11 -0700 Received: from mx-out.facebook.com (192.168.52.123) by PRN-CHUB06.TheFacebook.com (192.168.16.16) with Microsoft SMTP Server id 14.3.361.1; Wed, 21 Mar 2018 13:37:10 -0700 Received: by devbig474.prn1.facebook.com (Postfix, from userid 128203) id 1A60DE40B17; Wed, 21 Mar 2018 13:36:51 -0700 (PDT) Smtp-Origin-Hostprefix: devbig From: Yonghong Song Smtp-Origin-Hostname: devbig474.prn1.facebook.com To: , , , , CC: Smtp-Origin-Cluster: prn1c29 Subject: [PATCH net-next v5 2/2] net: bpf: add a test for skb_segment in test_bpf module Date: Wed, 21 Mar 2018 13:36:50 -0700 Message-ID: <20180321203650.1404106-3-yhs@fb.com> X-Mailer: git-send-email 2.9.5 In-Reply-To: <20180321203650.1404106-1-yhs@fb.com> References: <20180321203650.1404106-1-yhs@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2018-03-21_09:, , signatures=0 X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Without the previous commit, "modprobe test_bpf" will have the following errors: ... [ 98.149165] ------------[ cut here ]------------ [ 98.159362] kernel BUG at net/core/skbuff.c:3667! [ 98.169756] invalid opcode: 0000 [#1] SMP PTI [ 98.179370] Modules linked in: [ 98.179371] test_bpf(+) ... which triggers the bug the previous commit intends to fix. The skbs are constructed to mimic what mlx5 may generate. The packet size/header may not mimic real cases in production. But the processing flow is similar. Signed-off-by: Yonghong Song --- lib/test_bpf.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 91 insertions(+), 2 deletions(-) diff --git a/lib/test_bpf.c b/lib/test_bpf.c index 2efb213..a468b5c 100644 --- a/lib/test_bpf.c +++ b/lib/test_bpf.c @@ -6574,6 +6574,93 @@ static bool exclude_test(int test_id) return test_id < test_range[0] || test_id > test_range[1]; } +static __init struct sk_buff *build_test_skb(void) +{ + u32 headroom = NET_SKB_PAD + NET_IP_ALIGN + ETH_HLEN; + struct sk_buff *skb[2]; + struct page *page[2]; + int i, data_size = 8; + + for (i = 0; i < 2; i++) { + page[i] = alloc_page(GFP_KERNEL); + if (!page[i]) { + if (i == 0) + goto err_page0; + else + goto err_page1; + } + + /* this will set skb[i]->head_frag */ + skb[i] = dev_alloc_skb(headroom + data_size); + if (!skb[i]) { + if (i == 0) + goto err_skb0; + else + goto err_skb1; + } + + skb_reserve(skb[i], headroom); + skb_put(skb[i], data_size); + skb[i]->protocol = htons(ETH_P_IP); + skb_reset_network_header(skb[i]); + skb_set_mac_header(skb[i], -ETH_HLEN); + + skb_add_rx_frag(skb[i], 0, page[i], 0, 64, 64); + // skb_headlen(skb[i]): 8, skb[i]->head_frag = 1 + } + + /* setup shinfo */ + skb_shinfo(skb[0])->gso_size = 1448; + skb_shinfo(skb[0])->gso_type = SKB_GSO_TCPV4; + skb_shinfo(skb[0])->gso_type |= SKB_GSO_DODGY; + skb_shinfo(skb[0])->gso_segs = 0; + skb_shinfo(skb[0])->frag_list = skb[1]; + + /* adjust skb[0]'s len */ + skb[0]->len += skb[1]->len; + skb[0]->data_len += skb[1]->data_len; + skb[0]->truesize += skb[1]->truesize; + + return skb[0]; + +err_skb1: + __free_page(page[1]); +err_page1: + kfree_skb(skb[0]); +err_skb0: + __free_page(page[0]); +err_page0: + return NULL; +} + +static __init int test_skb_segment(void) +{ + netdev_features_t features; + struct sk_buff *skb, *segs; + int ret = -1; + + features = NETIF_F_SG | NETIF_F_GSO_PARTIAL | NETIF_F_IP_CSUM | + NETIF_F_IPV6_CSUM; + features |= NETIF_F_RXCSUM; + skb = build_test_skb(); + if (!skb) { + pr_info("%s: failed to build_test_skb", __func__); + goto done; + } + + segs = skb_segment(skb, features); + if (segs) { + kfree_skb_list(segs); + ret = 0; + pr_info("%s: success in skb_segment!", __func__); + } else { + pr_info("%s: failed in skb_segment!", __func__); + } + kfree_skb(skb); +done: + return ret; +} + static __init int test_bpf(void) { int i, err_cnt = 0, pass_cnt = 0; @@ -6632,9 +6719,11 @@ static int __init test_bpf_init(void) return ret; ret = test_bpf(); - destroy_bpf_tests(); - return ret; + if (ret) + return ret; + + return test_skb_segment(); } static void __exit test_bpf_exit(void)