diff mbox series

xdp and fragments with virtio

Message ID e35c149e-cb15-bb8c-2e03-d1641d21d694@gmail.com
State RFC, archived
Delegated to: David Miller
Headers show
Series xdp and fragments with virtio | expand

Commit Message

David Ahern May 16, 2018, 3:51 a.m. UTC
Hi Jason:

I am trying to test MTU changes to the BPF fib_lookup helper and seeing
something odd. Hoping you can help.

I have a VM with multiple virtio based NICs and tap backends. I install
the xdp program on eth1 and eth2 to do forwarding. In the host I send a
large packet to eth1:

$ ping -s 1500 9.9.9.9


The tap device in the host sees 2 packets:

$ sudo tcpdump -nv -i vm02-eth1
20:44:33.943160 IP (tos 0x0, ttl 64, id 58746, offset 0, flags [+],
proto ICMP (1), length 1500)
    10.100.1.254 > 9.9.9.9: ICMP echo request, id 17917, seq 1, length 1480
20:44:33.943172 IP (tos 0x0, ttl 64, id 58746, offset 1480, flags
[none], proto ICMP (1), length 48)
    10.100.1.254 > 9.9.9.9: ip-proto-1


In the VM, the XDP program only sees the first packet, not the fragment.
I added a printk to the program (see diff below):

$ cat trace_pipe
          <idle>-0     [003] ..s2   254.436467: 0: packet length 1514


Anything come to mind in the virtio xdp implementation that affects
fragment packets? I see this with both IPv4 and v6.

Thanks,
David

[1] xdp program diff showing printk that dumps packet length:

                return XDP_DROP;

Comments

Jason Wang May 16, 2018, 7:24 a.m. UTC | #1
On 2018年05月16日 11:51, David Ahern wrote:
> Hi Jason:
>
> I am trying to test MTU changes to the BPF fib_lookup helper and seeing
> something odd. Hoping you can help.
>
> I have a VM with multiple virtio based NICs and tap backends. I install
> the xdp program on eth1 and eth2 to do forwarding. In the host I send a
> large packet to eth1:
>
> $ ping -s 1500 9.9.9.9
>
>
> The tap device in the host sees 2 packets:
>
> $ sudo tcpdump -nv -i vm02-eth1
> 20:44:33.943160 IP (tos 0x0, ttl 64, id 58746, offset 0, flags [+],
> proto ICMP (1), length 1500)
>      10.100.1.254 > 9.9.9.9: ICMP echo request, id 17917, seq 1, length 1480
> 20:44:33.943172 IP (tos 0x0, ttl 64, id 58746, offset 1480, flags
> [none], proto ICMP (1), length 48)
>      10.100.1.254 > 9.9.9.9: ip-proto-1
>
>
> In the VM, the XDP program only sees the first packet, not the fragment.
> I added a printk to the program (see diff below):
>
> $ cat trace_pipe
>            <idle>-0     [003] ..s2   254.436467: 0: packet length 1514
>
>
> Anything come to mind in the virtio xdp implementation that affects
> fragment packets? I see this with both IPv4 and v6.

Not yet. But we do turn of tap gso when virtio has XDP set, but it 
shouldn't matter this case.

Will try to see what's wrong.

Thanks

>
> Thanks,
> David
>
> [1] xdp program diff showing printk that dumps packet length:
>
> diff --git a/samples/bpf/xdp_fwd_kern.c b/samples/bpf/xdp_fwd_kern.c
> index 4a6be0f87505..f119b506e782 100644
> --- a/samples/bpf/xdp_fwd_kern.c
> +++ b/samples/bpf/xdp_fwd_kern.c
> @@ -52,6 +52,11 @@ static __always_inline int xdp_fwd_flags(struct
> xdp_md *ctx, u32 flags)
>          u16 h_proto;
>          u64 nh_off;
>
> +       {
> +               char fmt[] = "packet length %u\n";
> +
> +               bpf_trace_printk(fmt, sizeof(fmt), ctx->data_end-ctx->data);
> +       }
>          nh_off = sizeof(*eth);
>          if (data + nh_off > data_end)
>                  return XDP_DROP;
>
David Ahern May 17, 2018, 2:55 a.m. UTC | #2
On 5/16/18 1:24 AM, Jason Wang wrote:
> 
> 
> On 2018年05月16日 11:51, David Ahern wrote:
>> Hi Jason:
>>
>> I am trying to test MTU changes to the BPF fib_lookup helper and seeing
>> something odd. Hoping you can help.
>>
>> I have a VM with multiple virtio based NICs and tap backends. I install
>> the xdp program on eth1 and eth2 to do forwarding. In the host I send a
>> large packet to eth1:
>>
>> $ ping -s 1500 9.9.9.9
>>
>>
>> The tap device in the host sees 2 packets:
>>
>> $ sudo tcpdump -nv -i vm02-eth1
>> 20:44:33.943160 IP (tos 0x0, ttl 64, id 58746, offset 0, flags [+],
>> proto ICMP (1), length 1500)
>>      10.100.1.254 > 9.9.9.9: ICMP echo request, id 17917, seq 1,
>> length 1480
>> 20:44:33.943172 IP (tos 0x0, ttl 64, id 58746, offset 1480, flags
>> [none], proto ICMP (1), length 48)
>>      10.100.1.254 > 9.9.9.9: ip-proto-1
>>
>>
>> In the VM, the XDP program only sees the first packet, not the fragment.
>> I added a printk to the program (see diff below):
>>
>> $ cat trace_pipe
>>            <idle>-0     [003] ..s2   254.436467: 0: packet length 1514
>>
>>
>> Anything come to mind in the virtio xdp implementation that affects
>> fragment packets? I see this with both IPv4 and v6.
> 
> Not yet. But we do turn of tap gso when virtio has XDP set, but it
> shouldn't matter this case.
> 
> Will try to see what's wrong.
> 

I added this to the command line for the NICs and it works:

"mrg_rxbuf=off,guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off"

XDP program sees the full size packet and the fragment.

Fun fact: only adding mrg_rxbuf=off so that mergeable_rx_bufs is false
but big_packets is true generates a panic when it receives large packets.
Jason Wang May 17, 2018, 9:24 a.m. UTC | #3
On 2018年05月17日 10:55, David Ahern wrote:
> On 5/16/18 1:24 AM, Jason Wang wrote:
>>
>> On 2018年05月16日 11:51, David Ahern wrote:
>>> Hi Jason:
>>>
>>> I am trying to test MTU changes to the BPF fib_lookup helper and seeing
>>> something odd. Hoping you can help.
>>>
>>> I have a VM with multiple virtio based NICs and tap backends. I install
>>> the xdp program on eth1 and eth2 to do forwarding. In the host I send a
>>> large packet to eth1:
>>>
>>> $ ping -s 1500 9.9.9.9
>>>
>>>
>>> The tap device in the host sees 2 packets:
>>>
>>> $ sudo tcpdump -nv -i vm02-eth1
>>> 20:44:33.943160 IP (tos 0x0, ttl 64, id 58746, offset 0, flags [+],
>>> proto ICMP (1), length 1500)
>>>       10.100.1.254 > 9.9.9.9: ICMP echo request, id 17917, seq 1,
>>> length 1480
>>> 20:44:33.943172 IP (tos 0x0, ttl 64, id 58746, offset 1480, flags
>>> [none], proto ICMP (1), length 48)
>>>       10.100.1.254 > 9.9.9.9: ip-proto-1
>>>
>>>
>>> In the VM, the XDP program only sees the first packet, not the fragment.
>>> I added a printk to the program (see diff below):
>>>
>>> $ cat trace_pipe
>>>             <idle>-0     [003] ..s2   254.436467: 0: packet length 1514
>>>
>>>
>>> Anything come to mind in the virtio xdp implementation that affects
>>> fragment packets? I see this with both IPv4 and v6.
>> Not yet. But we do turn of tap gso when virtio has XDP set, but it
>> shouldn't matter this case.
>>
>> Will try to see what's wrong.
>>
> I added this to the command line for the NICs and it works:
>
> "mrg_rxbuf=off,guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off"
>
> XDP program sees the full size packet and the fragment.
>
> Fun fact: only adding mrg_rxbuf=off so that mergeable_rx_bufs is false
> but big_packets is true generates a panic when it receives large packets.

It looks like we wrongly drop packets after linearizing the packets 
during XDP_REDIRECT.

Please try the patch (but I do spot some other issues, will post a series):

Thanks

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index f34794a..59702f9 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -800,7 +800,7 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
                         }
                         *xdp_xmit = true;
                         if (unlikely(xdp_page != page))
-                               goto err_xdp;
+                               put_page(page);
                         rcu_read_unlock();
                         goto xdp_xmit;
                 default:
David Ahern May 18, 2018, 4:42 p.m. UTC | #4
On 5/17/18 3:24 AM, Jason Wang wrote:
> It looks like we wrongly drop packets after linearizing the packets
> during XDP_REDIRECT.
> 
> Please try the patch (but I do spot some other issues, will post a series):
> 
> Thanks
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index f34794a..59702f9 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -800,7 +800,7 @@ static struct sk_buff *receive_mergeable(struct
> net_device *dev,
>                         }
>                         *xdp_xmit = true;
>                         if (unlikely(xdp_page != page))
> -                               goto err_xdp;
> +                               put_page(page);
>                         rcu_read_unlock();
>                         goto xdp_xmit;
>                 default:

Yes, that does solve the problem of fragments getting lost.
diff mbox series

Patch

diff --git a/samples/bpf/xdp_fwd_kern.c b/samples/bpf/xdp_fwd_kern.c
index 4a6be0f87505..f119b506e782 100644
--- a/samples/bpf/xdp_fwd_kern.c
+++ b/samples/bpf/xdp_fwd_kern.c
@@ -52,6 +52,11 @@  static __always_inline int xdp_fwd_flags(struct
xdp_md *ctx, u32 flags)
        u16 h_proto;
        u64 nh_off;

+       {
+               char fmt[] = "packet length %u\n";
+
+               bpf_trace_printk(fmt, sizeof(fmt), ctx->data_end-ctx->data);
+       }
        nh_off = sizeof(*eth);
        if (data + nh_off > data_end)