[ovs-dev] datapath: Backport "skbuff: Fix skb checksum flag on skb pull"
diff mbox

Message ID 1443223510-1810-1-git-send-email-pshelar@nicira.com
State Accepted
Headers show

Commit Message

Pravin B Shelar Sept. 25, 2015, 11:25 p.m. UTC
Upstream commit:

    VXLAN device can receive skb with checksum partial. But the checksum
    offset could be in outer header which is pulled on receive. This results
    in negative checksum offset for the skb. Such skb can cause the assert
    failure in skb_checksum_help(). Following patch fixes the bug by setting
    checksum-none while pulling outer header.

    Following is the kernel panic msg from old kernel hitting the bug.

    ------------[ cut here ]------------
    kernel BUG at net/core/dev.c:1906!
    RIP: 0010:[<ffffffff81518034>] skb_checksum_help+0x144/0x150
    Call Trace:
    <IRQ>
    [<ffffffffa0164c28>] queue_userspace_packet+0x408/0x470 [openvswitch]
    [<ffffffffa016614d>] ovs_dp_upcall+0x5d/0x60 [openvswitch]
    [<ffffffffa0166236>] ovs_dp_process_packet_with_key+0xe6/0x100 [openvswitch]
    [<ffffffffa016629b>] ovs_dp_process_received_packet+0x4b/0x80 [openvswitch]
    [<ffffffffa016c51a>] ovs_vport_receive+0x2a/0x30 [openvswitch]
    [<ffffffffa0171383>] vxlan_rcv+0x53/0x60 [openvswitch]
    [<ffffffffa01734cb>] vxlan_udp_encap_recv+0x8b/0xf0 [openvswitch]
    [<ffffffff8157addc>] udp_queue_rcv_skb+0x2dc/0x3b0
    [<ffffffff8157b56f>] __udp4_lib_rcv+0x1cf/0x6c0
    [<ffffffff8157ba7a>] udp_rcv+0x1a/0x20
    [<ffffffff8154fdbd>] ip_local_deliver_finish+0xdd/0x280
    [<ffffffff81550128>] ip_local_deliver+0x88/0x90
    [<ffffffff8154fa7d>] ip_rcv_finish+0x10d/0x370
    [<ffffffff81550365>] ip_rcv+0x235/0x300
    [<ffffffff8151ba1d>] __netif_receive_skb+0x55d/0x620
    [<ffffffff8151c360>] netif_receive_skb+0x80/0x90
    [<ffffffff81459935>] virtnet_poll+0x555/0x6f0
    [<ffffffff8151cd04>] net_rx_action+0x134/0x290
    [<ffffffff810683d8>] __do_softirq+0xa8/0x210
    [<ffffffff8162fe6c>] call_softirq+0x1c/0x30
    [<ffffffff810161a5>] do_softirq+0x65/0xa0
    [<ffffffff810687be>] irq_exit+0x8e/0xb0
    [<ffffffff81630733>] do_IRQ+0x63/0xe0
    [<ffffffff81625f2e>] common_interrupt+0x6e/0x6e

    Reported-by: Anupam Chanda <achanda@vmware.com>
    Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
    Acked-by: Tom Herbert <tom@herbertland.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Upstream: 6ae459bdaae ("skbuff: Fix skb checksum flag on skb pull")
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
 datapath/linux/compat/include/linux/skbuff.h |   24 ++++++++++++++++++++++++
 1 files changed, 24 insertions(+), 0 deletions(-)

Comments

Jesse Gross Sept. 26, 2015, 1:05 a.m. UTC | #1
On Fri, Sep 25, 2015 at 4:25 PM, Pravin B Shelar <pshelar@nicira.com> wrote:
> Upstream commit:
>
>     VXLAN device can receive skb with checksum partial. But the checksum
>     offset could be in outer header which is pulled on receive. This results
>     in negative checksum offset for the skb. Such skb can cause the assert
>     failure in skb_checksum_help(). Following patch fixes the bug by setting
>     checksum-none while pulling outer header.
>
>     Following is the kernel panic msg from old kernel hitting the bug.
>
>     ------------[ cut here ]------------
>     kernel BUG at net/core/dev.c:1906!
>     RIP: 0010:[<ffffffff81518034>] skb_checksum_help+0x144/0x150
>     Call Trace:
>     <IRQ>
>     [<ffffffffa0164c28>] queue_userspace_packet+0x408/0x470 [openvswitch]
>     [<ffffffffa016614d>] ovs_dp_upcall+0x5d/0x60 [openvswitch]
>     [<ffffffffa0166236>] ovs_dp_process_packet_with_key+0xe6/0x100 [openvswitch]
>     [<ffffffffa016629b>] ovs_dp_process_received_packet+0x4b/0x80 [openvswitch]
>     [<ffffffffa016c51a>] ovs_vport_receive+0x2a/0x30 [openvswitch]
>     [<ffffffffa0171383>] vxlan_rcv+0x53/0x60 [openvswitch]
>     [<ffffffffa01734cb>] vxlan_udp_encap_recv+0x8b/0xf0 [openvswitch]
>     [<ffffffff8157addc>] udp_queue_rcv_skb+0x2dc/0x3b0
>     [<ffffffff8157b56f>] __udp4_lib_rcv+0x1cf/0x6c0
>     [<ffffffff8157ba7a>] udp_rcv+0x1a/0x20
>     [<ffffffff8154fdbd>] ip_local_deliver_finish+0xdd/0x280
>     [<ffffffff81550128>] ip_local_deliver+0x88/0x90
>     [<ffffffff8154fa7d>] ip_rcv_finish+0x10d/0x370
>     [<ffffffff81550365>] ip_rcv+0x235/0x300
>     [<ffffffff8151ba1d>] __netif_receive_skb+0x55d/0x620
>     [<ffffffff8151c360>] netif_receive_skb+0x80/0x90
>     [<ffffffff81459935>] virtnet_poll+0x555/0x6f0
>     [<ffffffff8151cd04>] net_rx_action+0x134/0x290
>     [<ffffffff810683d8>] __do_softirq+0xa8/0x210
>     [<ffffffff8162fe6c>] call_softirq+0x1c/0x30
>     [<ffffffff810161a5>] do_softirq+0x65/0xa0
>     [<ffffffff810687be>] irq_exit+0x8e/0xb0
>     [<ffffffff81630733>] do_IRQ+0x63/0xe0
>     [<ffffffff81625f2e>] common_interrupt+0x6e/0x6e
>
>     Reported-by: Anupam Chanda <achanda@vmware.com>
>     Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
>     Acked-by: Tom Herbert <tom@herbertland.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
>
> Upstream: 6ae459bdaae ("skbuff: Fix skb checksum flag on skb pull")
> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>

Acked-by: Jesse Gross <jesse@nicira.com>

That being said, I believe that we will still run into this bug on
kernels where we use upstream tunnel implementations (but before 4.3,
of course).
Pravin B Shelar Sept. 26, 2015, 3:21 a.m. UTC | #2
On Fri, Sep 25, 2015 at 6:05 PM, Jesse Gross <jesse@nicira.com> wrote:
> On Fri, Sep 25, 2015 at 4:25 PM, Pravin B Shelar <pshelar@nicira.com> wrote:
>> Upstream commit:
>>
>>     VXLAN device can receive skb with checksum partial. But the checksum
>>     offset could be in outer header which is pulled on receive. This results
>>     in negative checksum offset for the skb. Such skb can cause the assert
>>     failure in skb_checksum_help(). Following patch fixes the bug by setting
>>     checksum-none while pulling outer header.
>>
>>     Following is the kernel panic msg from old kernel hitting the bug.
>>
>>     ------------[ cut here ]------------
>>     kernel BUG at net/core/dev.c:1906!
>>     RIP: 0010:[<ffffffff81518034>] skb_checksum_help+0x144/0x150
>>     Call Trace:
>>     <IRQ>
>>     [<ffffffffa0164c28>] queue_userspace_packet+0x408/0x470 [openvswitch]
>>     [<ffffffffa016614d>] ovs_dp_upcall+0x5d/0x60 [openvswitch]
>>     [<ffffffffa0166236>] ovs_dp_process_packet_with_key+0xe6/0x100 [openvswitch]
>>     [<ffffffffa016629b>] ovs_dp_process_received_packet+0x4b/0x80 [openvswitch]
>>     [<ffffffffa016c51a>] ovs_vport_receive+0x2a/0x30 [openvswitch]
>>     [<ffffffffa0171383>] vxlan_rcv+0x53/0x60 [openvswitch]
>>     [<ffffffffa01734cb>] vxlan_udp_encap_recv+0x8b/0xf0 [openvswitch]
>>     [<ffffffff8157addc>] udp_queue_rcv_skb+0x2dc/0x3b0
>>     [<ffffffff8157b56f>] __udp4_lib_rcv+0x1cf/0x6c0
>>     [<ffffffff8157ba7a>] udp_rcv+0x1a/0x20
>>     [<ffffffff8154fdbd>] ip_local_deliver_finish+0xdd/0x280
>>     [<ffffffff81550128>] ip_local_deliver+0x88/0x90
>>     [<ffffffff8154fa7d>] ip_rcv_finish+0x10d/0x370
>>     [<ffffffff81550365>] ip_rcv+0x235/0x300
>>     [<ffffffff8151ba1d>] __netif_receive_skb+0x55d/0x620
>>     [<ffffffff8151c360>] netif_receive_skb+0x80/0x90
>>     [<ffffffff81459935>] virtnet_poll+0x555/0x6f0
>>     [<ffffffff8151cd04>] net_rx_action+0x134/0x290
>>     [<ffffffff810683d8>] __do_softirq+0xa8/0x210
>>     [<ffffffff8162fe6c>] call_softirq+0x1c/0x30
>>     [<ffffffff810161a5>] do_softirq+0x65/0xa0
>>     [<ffffffff810687be>] irq_exit+0x8e/0xb0
>>     [<ffffffff81630733>] do_IRQ+0x63/0xe0
>>     [<ffffffff81625f2e>] common_interrupt+0x6e/0x6e
>>
>>     Reported-by: Anupam Chanda <achanda@vmware.com>
>>     Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
>>     Acked-by: Tom Herbert <tom@herbertland.com>
>>     Signed-off-by: David S. Miller <davem@davemloft.net>
>>
>> Upstream: 6ae459bdaae ("skbuff: Fix skb checksum flag on skb pull")
>> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
>
> Acked-by: Jesse Gross <jesse@nicira.com>
>

Thanks. I pushed it to master and branch-2.3 and branch-2.4.

> That being said, I believe that we will still run into this bug on
> kernels where we use upstream tunnel implementations (but before 4.3,
> of course).

ok, I have asked for upstream patch to be queued for stable.

Patch
diff mbox

diff --git a/datapath/linux/compat/include/linux/skbuff.h b/datapath/linux/compat/include/linux/skbuff.h
index 1a576a0..23b13b8 100644
--- a/datapath/linux/compat/include/linux/skbuff.h
+++ b/datapath/linux/compat/include/linux/skbuff.h
@@ -372,4 +372,28 @@  int rpl_skb_vlan_push(struct sk_buff *skb, __be16 vlan_proto, u16 vlan_tci);
 void rpl_kfree_skb_list(struct sk_buff *segs);
 #define kfree_skb_list rpl_kfree_skb_list
 #endif
+
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4,3,0)
+#define skb_postpull_rcsum rpl_skb_postpull_rcsum
+static inline void skb_postpull_rcsum(struct sk_buff *skb,
+				      const void *start, unsigned int len)
+{
+	if (skb->ip_summed == CHECKSUM_COMPLETE)
+		skb->csum = csum_sub(skb->csum, csum_partial(start, len, 0));
+	else if (skb->ip_summed == CHECKSUM_PARTIAL &&
+			skb_checksum_start_offset(skb) <= len)
+		skb->ip_summed = CHECKSUM_NONE;
+}
+
+#define skb_pull_rcsum rpl_skb_pull_rcsum
+static inline unsigned char *skb_pull_rcsum(struct sk_buff *skb, unsigned int len)
+{
+	BUG_ON(len > skb->len);
+	skb->len -= len;
+	BUG_ON(skb->len < skb->data_len);
+	skb_postpull_rcsum(skb, skb->data, len);
+	return skb->data += len;
+}
+
+#endif
 #endif