diff mbox

[net-next,v2,3/5] virtio_net: Add XDP support

Message ID 20161128052211-mutt-send-email-mst@kernel.org
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Michael S. Tsirkin Nov. 28, 2016, 3:36 a.m. UTC
On Fri, Nov 25, 2016 at 01:24:03PM -0800, John Fastabend wrote:
> On 16-11-22 06:58 AM, Michael S. Tsirkin wrote:
> > On Tue, Nov 22, 2016 at 12:27:03AM -0800, John Fastabend wrote:
> >> On 16-11-21 03:20 PM, Michael S. Tsirkin wrote:
> >>> On Sat, Nov 19, 2016 at 06:50:33PM -0800, John Fastabend wrote:
> >>>> From: Shrijeet Mukherjee <shrijeet@gmail.com>
> >>>>
> >>>> This adds XDP support to virtio_net. Some requirements must be
> >>>> met for XDP to be enabled depending on the mode. First it will
> >>>> only be supported with LRO disabled so that data is not pushed
> >>>> across multiple buffers. The MTU must be less than a page size
> >>>> to avoid having to handle XDP across multiple pages.
> >>>>
> >>>> If mergeable receive is enabled this first series only supports
> >>>> the case where header and data are in the same buf which we can
> >>>> check when a packet is received by looking at num_buf. If the
> >>>> num_buf is greater than 1 and a XDP program is loaded the packet
> >>>> is dropped and a warning is thrown. When any_header_sg is set this
> >>>> does not happen and both header and data is put in a single buffer
> >>>> as expected so we check this when XDP programs are loaded. Note I
> >>>> have only tested this with Linux vhost backend.
> >>>>
> >>>> If big packets mode is enabled and MTU/LRO conditions above are
> >>>> met then XDP is allowed.
> >>>>
> >>>> A follow on patch can be generated to solve the mergeable receive
> >>>> case with num_bufs equal to 2. Buffers greater than two may not
> >>>> be handled has easily.
> >>>
> >>>
> >>> I would very much prefer support for other layouts without drops
> >>> before merging this.
> >>> header by itself can certainly be handled by skipping it.
> >>> People wanted to use that e.g. for zero copy.
> >>
> >> OK fair enough I'll do this now rather than push it out.
> >>
> 
> Hi Michael,
> 
> The header skip logic however complicates the xmit handling a fair
> amount. Specifically when we release the buffers after xmit then
> both the hdr and data portions need to be released which requires
> some tracking.

I thought you disable all checksum offloads so why not discard the
header immediately?

> Is the header split logic actually in use somewhere today? It looks
> like its not being used in Linux case. And zero copy RX is currently as
> best I can tell not supported anywhere so I would prefer not to
> complicate the XDP path at the moment with a possible future feature.

Well it's part of the documented interface so we never
know who implemented it. Normally if we want to make
restrictions we would do the reverse and add a feature.

We can do this easily, but I'd like to first look into
just handling all possible inputs as the spec asks us to.
I'm a bit too busy with other stuff next week but will
look into this a week after that if you don't beat me to it.

> >>>
> >>> Anything else can be handled by copying the packet.
> 
> Any idea how to test this? At the moment I have some code to linearize
> the data in all cases with more than a single buffer. But wasn't clear
> to me which features I could negotiate with vhost/qemu to get more than
> a single buffer in the receive path.
> 
> Thanks,
> John

ATM you need to hack qemu. Here's a hack to make header completely
separate.

Comments

John Fastabend Nov. 28, 2016, 3:56 a.m. UTC | #1
On 16-11-27 07:36 PM, Michael S. Tsirkin wrote:
> On Fri, Nov 25, 2016 at 01:24:03PM -0800, John Fastabend wrote:
>> On 16-11-22 06:58 AM, Michael S. Tsirkin wrote:
>>> On Tue, Nov 22, 2016 at 12:27:03AM -0800, John Fastabend wrote:
>>>> On 16-11-21 03:20 PM, Michael S. Tsirkin wrote:
>>>>> On Sat, Nov 19, 2016 at 06:50:33PM -0800, John Fastabend wrote:
>>>>>> From: Shrijeet Mukherjee <shrijeet@gmail.com>
>>>>>>
>>>>>> This adds XDP support to virtio_net. Some requirements must be
>>>>>> met for XDP to be enabled depending on the mode. First it will
>>>>>> only be supported with LRO disabled so that data is not pushed
>>>>>> across multiple buffers. The MTU must be less than a page size
>>>>>> to avoid having to handle XDP across multiple pages.
>>>>>>
>>>>>> If mergeable receive is enabled this first series only supports
>>>>>> the case where header and data are in the same buf which we can
>>>>>> check when a packet is received by looking at num_buf. If the
>>>>>> num_buf is greater than 1 and a XDP program is loaded the packet
>>>>>> is dropped and a warning is thrown. When any_header_sg is set this
>>>>>> does not happen and both header and data is put in a single buffer
>>>>>> as expected so we check this when XDP programs are loaded. Note I
>>>>>> have only tested this with Linux vhost backend.
>>>>>>
>>>>>> If big packets mode is enabled and MTU/LRO conditions above are
>>>>>> met then XDP is allowed.
>>>>>>
>>>>>> A follow on patch can be generated to solve the mergeable receive
>>>>>> case with num_bufs equal to 2. Buffers greater than two may not
>>>>>> be handled has easily.
>>>>>
>>>>>
>>>>> I would very much prefer support for other layouts without drops
>>>>> before merging this.
>>>>> header by itself can certainly be handled by skipping it.
>>>>> People wanted to use that e.g. for zero copy.
>>>>
>>>> OK fair enough I'll do this now rather than push it out.
>>>>
>>
>> Hi Michael,
>>
>> The header skip logic however complicates the xmit handling a fair
>> amount. Specifically when we release the buffers after xmit then
>> both the hdr and data portions need to be released which requires
>> some tracking.
> 
> I thought you disable all checksum offloads so why not discard the
> header immediately?

Well in the "normal" case where the header is part of the same buffer
we keep it to use the same space for the header on the TX path.

If we discard it in the header split case we have to push the header
somewhere else. In the skb case the cb[] region is used it looks like.
In our case I guess free space at the end of the page could be used.

My thinking is if we handle the general case of more than one buffer
being used with a copy we can handle the case above using the same
logic and no need to handle it as a special case. It seems to be an odd
case that doesn't really exist anyways. At least not in qemu/Linux. I
have not tested anything else.

> 
>> Is the header split logic actually in use somewhere today? It looks
>> like its not being used in Linux case. And zero copy RX is currently as
>> best I can tell not supported anywhere so I would prefer not to
>> complicate the XDP path at the moment with a possible future feature.
> 
> Well it's part of the documented interface so we never
> know who implemented it. Normally if we want to make
> restrictions we would do the reverse and add a feature.
> 
> We can do this easily, but I'd like to first look into
> just handling all possible inputs as the spec asks us to.
> I'm a bit too busy with other stuff next week but will
> look into this a week after that if you don't beat me to it.
> 

Well I've almost got it working now with some logic to copy everything
into a single page if we hit this case so should be OK but slow. I'll
finish testing this and send it out hopefully in the next few days.

>>>>>
>>>>> Anything else can be handled by copying the packet.
>>
>> Any idea how to test this? At the moment I have some code to linearize
>> the data in all cases with more than a single buffer. But wasn't clear
>> to me which features I could negotiate with vhost/qemu to get more than
>> a single buffer in the receive path.
>>
>> Thanks,
>> John
> 
> ATM you need to hack qemu. Here's a hack to make header completely
> separate.
> 

Perfect! hacking qemu for testing is no problem this helps a lot thanks
and saves me time trying to figure out how to get qemu to do this.

> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index b68c69d..4866144 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -1164,6 +1164,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
>              offset = n->host_hdr_len;
>              total += n->guest_hdr_len;
>              guest_offset = n->guest_hdr_len;
> +            continue;
>          } else {
>              guest_offset = 0;
>          }
> 
> 
> 
> here's one that should cap the 1st s/g to 100 bytes:
> 
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index b68c69d..7943004 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -1164,6 +1164,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
>              offset = n->host_hdr_len;
>              total += n->guest_hdr_len;
>              guest_offset = n->guest_hdr_len;
> +            sg.iov_len = MIN(sg.iov_len, 100);
>          } else {
>              guest_offset = 0;
>          }
>
Michael S. Tsirkin Nov. 28, 2016, 4:07 a.m. UTC | #2
On Sun, Nov 27, 2016 at 07:56:09PM -0800, John Fastabend wrote:
> On 16-11-27 07:36 PM, Michael S. Tsirkin wrote:
> > On Fri, Nov 25, 2016 at 01:24:03PM -0800, John Fastabend wrote:
> >> On 16-11-22 06:58 AM, Michael S. Tsirkin wrote:
> >>> On Tue, Nov 22, 2016 at 12:27:03AM -0800, John Fastabend wrote:
> >>>> On 16-11-21 03:20 PM, Michael S. Tsirkin wrote:
> >>>>> On Sat, Nov 19, 2016 at 06:50:33PM -0800, John Fastabend wrote:
> >>>>>> From: Shrijeet Mukherjee <shrijeet@gmail.com>
> >>>>>>
> >>>>>> This adds XDP support to virtio_net. Some requirements must be
> >>>>>> met for XDP to be enabled depending on the mode. First it will
> >>>>>> only be supported with LRO disabled so that data is not pushed
> >>>>>> across multiple buffers. The MTU must be less than a page size
> >>>>>> to avoid having to handle XDP across multiple pages.
> >>>>>>
> >>>>>> If mergeable receive is enabled this first series only supports
> >>>>>> the case where header and data are in the same buf which we can
> >>>>>> check when a packet is received by looking at num_buf. If the
> >>>>>> num_buf is greater than 1 and a XDP program is loaded the packet
> >>>>>> is dropped and a warning is thrown. When any_header_sg is set this
> >>>>>> does not happen and both header and data is put in a single buffer
> >>>>>> as expected so we check this when XDP programs are loaded. Note I
> >>>>>> have only tested this with Linux vhost backend.
> >>>>>>
> >>>>>> If big packets mode is enabled and MTU/LRO conditions above are
> >>>>>> met then XDP is allowed.
> >>>>>>
> >>>>>> A follow on patch can be generated to solve the mergeable receive
> >>>>>> case with num_bufs equal to 2. Buffers greater than two may not
> >>>>>> be handled has easily.
> >>>>>
> >>>>>
> >>>>> I would very much prefer support for other layouts without drops
> >>>>> before merging this.
> >>>>> header by itself can certainly be handled by skipping it.
> >>>>> People wanted to use that e.g. for zero copy.
> >>>>
> >>>> OK fair enough I'll do this now rather than push it out.
> >>>>
> >>
> >> Hi Michael,
> >>
> >> The header skip logic however complicates the xmit handling a fair
> >> amount. Specifically when we release the buffers after xmit then
> >> both the hdr and data portions need to be released which requires
> >> some tracking.
> > 
> > I thought you disable all checksum offloads so why not discard the
> > header immediately?
> 
> Well in the "normal" case where the header is part of the same buffer
> we keep it to use the same space for the header on the TX path.
> 
> If we discard it in the header split case we have to push the header
> somewhere else. In the skb case the cb[] region is used it looks like.
> In our case I guess free space at the end of the page could be used.

You don't have to put start of page in a buffer, you
can put an offset there. Will result in some waste in the
common case, but it's just several bytes so likely not a big deal.

> My thinking is if we handle the general case of more than one buffer
> being used with a copy we can handle the case above using the same
> logic and no need to handle it as a special case. It seems to be an odd
> case that doesn't really exist anyways. At least not in qemu/Linux. I
> have not tested anything else.

OK

> > 
> >> Is the header split logic actually in use somewhere today? It looks
> >> like its not being used in Linux case. And zero copy RX is currently as
> >> best I can tell not supported anywhere so I would prefer not to
> >> complicate the XDP path at the moment with a possible future feature.
> > 
> > Well it's part of the documented interface so we never
> > know who implemented it. Normally if we want to make
> > restrictions we would do the reverse and add a feature.
> > 
> > We can do this easily, but I'd like to first look into
> > just handling all possible inputs as the spec asks us to.
> > I'm a bit too busy with other stuff next week but will
> > look into this a week after that if you don't beat me to it.
> > 
> 
> Well I've almost got it working now with some logic to copy everything
> into a single page if we hit this case so should be OK but slow. I'll
> finish testing this and send it out hopefully in the next few days.
> 
> >>>>>
> >>>>> Anything else can be handled by copying the packet.
> >>
> >> Any idea how to test this? At the moment I have some code to linearize
> >> the data in all cases with more than a single buffer. But wasn't clear
> >> to me which features I could negotiate with vhost/qemu to get more than
> >> a single buffer in the receive path.
> >>
> >> Thanks,
> >> John
> > 
> > ATM you need to hack qemu. Here's a hack to make header completely
> > separate.
> > 
> 
> Perfect! hacking qemu for testing is no problem this helps a lot thanks
> and saves me time trying to figure out how to get qemu to do this.

Pls note I didn't try this at all, so might not work, but should
give you the idea.

> > 
> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > index b68c69d..4866144 100644
> > --- a/hw/net/virtio-net.c
> > +++ b/hw/net/virtio-net.c
> > @@ -1164,6 +1164,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
> >              offset = n->host_hdr_len;
> >              total += n->guest_hdr_len;
> >              guest_offset = n->guest_hdr_len;
> > +            continue;
> >          } else {
> >              guest_offset = 0;
> >          }
> > 
> > 
> > 
> > here's one that should cap the 1st s/g to 100 bytes:
> > 
> > 
> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > index b68c69d..7943004 100644
> > --- a/hw/net/virtio-net.c
> > +++ b/hw/net/virtio-net.c
> > @@ -1164,6 +1164,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
> >              offset = n->host_hdr_len;
> >              total += n->guest_hdr_len;
> >              guest_offset = n->guest_hdr_len;
> > +            sg.iov_len = MIN(sg.iov_len, 100);
> >          } else {
> >              guest_offset = 0;
> >          }
> >
diff mbox

Patch

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index b68c69d..4866144 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1164,6 +1164,7 @@  static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
             offset = n->host_hdr_len;
             total += n->guest_hdr_len;
             guest_offset = n->guest_hdr_len;
+            continue;
         } else {
             guest_offset = 0;
         }



here's one that should cap the 1st s/g to 100 bytes:


diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index b68c69d..7943004 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1164,6 +1164,7 @@  static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
             offset = n->host_hdr_len;
             total += n->guest_hdr_len;
             guest_offset = n->guest_hdr_len;
+            sg.iov_len = MIN(sg.iov_len, 100);
         } else {
             guest_offset = 0;
         }