diff mbox

[net-next,v3,6/6] virtio_net: xdp, add slowpath case for non contiguous buffers

Message ID 20161129201133.26851.31803.stgit@john-Precision-Tower-5810
State Deferred, archived
Delegated to: David Miller
Headers show

Commit Message

John Fastabend Nov. 29, 2016, 8:11 p.m. UTC
virtio_net XDP support expects receive buffers to be contiguous.
If this is not the case we enable a slowpath to allow connectivity
to continue but at a significan performance overhead associated with
linearizing data. To make it painfully aware to users that XDP is
running in a degraded mode we throw an xdp buffer error.

To linearize packets we allocate a page and copy the segments of
the data, including the header, into it. After this the page can be
handled by XDP code flow as normal.

Then depending on the return code the page is either freed or sent
to the XDP xmit path. There is no attempt to optimize this path.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/virtio_net.c |   70 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 68 insertions(+), 2 deletions(-)

Comments

Alexei Starovoitov Nov. 30, 2016, 12:37 a.m. UTC | #1
On Tue, Nov 29, 2016 at 12:11:33PM -0800, John Fastabend wrote:
> virtio_net XDP support expects receive buffers to be contiguous.
> If this is not the case we enable a slowpath to allow connectivity
> to continue but at a significan performance overhead associated with
> linearizing data. To make it painfully aware to users that XDP is
> running in a degraded mode we throw an xdp buffer error.
> 
> To linearize packets we allocate a page and copy the segments of
> the data, including the header, into it. After this the page can be
> handled by XDP code flow as normal.
> 
> Then depending on the return code the page is either freed or sent
> to the XDP xmit path. There is no attempt to optimize this path.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
...
> +/* The conditions to enable XDP should preclude the underlying device from
> + * sending packets across multiple buffers (num_buf > 1). However per spec
> + * it does not appear to be illegal to do so but rather just against convention.
> + * So in order to avoid making a system unresponsive the packets are pushed
> + * into a page and the XDP program is run. This will be extremely slow and we
> + * push a warning to the user to fix this as soon as possible. Fixing this may
> + * require resolving the underlying hardware to determine why multiple buffers
> + * are being received or simply loading the XDP program in the ingress stack
> + * after the skb is built because there is no advantage to running it here
> + * anymore.
> + */
...
>  		if (num_buf > 1) {
>  			bpf_warn_invalid_xdp_buffer();
> -			goto err_xdp;
> +
> +			/* linearize data for XDP */
> +			xdp_page = xdp_linearize_page(rq, num_buf,
> +						      page, offset, &len);
> +			if (!xdp_page)
> +				goto err_xdp;

in case when we're 'lucky' the performance will silently be bad.
Can we do warn_once here? so at least something in dmesg points out
that performance is not as expected. Am I reading it correctly that
you had to do a special kernel hack to trigger this situation and
in all normal cases it's not the case?
John Fastabend Nov. 30, 2016, 2:50 a.m. UTC | #2
On 16-11-29 04:37 PM, Alexei Starovoitov wrote:
> On Tue, Nov 29, 2016 at 12:11:33PM -0800, John Fastabend wrote:
>> virtio_net XDP support expects receive buffers to be contiguous.
>> If this is not the case we enable a slowpath to allow connectivity
>> to continue but at a significan performance overhead associated with
>> linearizing data. To make it painfully aware to users that XDP is
>> running in a degraded mode we throw an xdp buffer error.
>>
>> To linearize packets we allocate a page and copy the segments of
>> the data, including the header, into it. After this the page can be
>> handled by XDP code flow as normal.
>>
>> Then depending on the return code the page is either freed or sent
>> to the XDP xmit path. There is no attempt to optimize this path.
>>
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ...
>> +/* The conditions to enable XDP should preclude the underlying device from
>> + * sending packets across multiple buffers (num_buf > 1). However per spec
>> + * it does not appear to be illegal to do so but rather just against convention.
>> + * So in order to avoid making a system unresponsive the packets are pushed
>> + * into a page and the XDP program is run. This will be extremely slow and we
>> + * push a warning to the user to fix this as soon as possible. Fixing this may
>> + * require resolving the underlying hardware to determine why multiple buffers
>> + * are being received or simply loading the XDP program in the ingress stack
>> + * after the skb is built because there is no advantage to running it here
>> + * anymore.
>> + */
> ...
>>  		if (num_buf > 1) {
>>  			bpf_warn_invalid_xdp_buffer();
			^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Here is the warn once call. I made it a defined bpf warning so that we
can easily identify if needed and it can be used by other drivers as
well.

>> -			goto err_xdp;
>> +
>> +			/* linearize data for XDP */
>> +			xdp_page = xdp_linearize_page(rq, num_buf,
>> +						      page, offset, &len);
>> +			if (!xdp_page)
>> +				goto err_xdp;
> 
> in case when we're 'lucky' the performance will silently be bad.

Were you specifically worried about the alloc failing here and no
indication? I was thinking a statistic counter could be added as a
follow on series to catch this and other such cases in non XDP paths
if needed.

> Can we do warn_once here? so at least something in dmesg points out
> that performance is not as expected. Am I reading it correctly that
> you had to do a special kernel hack to trigger this situation and
> in all normal cases it's not the case?
> 

Correct the only way to produce this with upstream qemu and Linux is to
write a kernel hack to build these types of buffers. AFAIK I caught all
the cases where it could happen otherwise in the setup xdp prog call and
required user to configure device driver so they can not happen e.g.
disable LRO, set MTU < PAGE_SIZE, etc.

If I missed anything, I don't see it now, I would call it a bug and fix
it. Again AFAIK everything should be covered though. As Micheal pointed
out earlier although unexpected its not strictly against virtio spec to
build such packets so we should handle it at least in a degraded mode
even though it is not expected in any qemu cases.

.John
Alexei Starovoitov Nov. 30, 2016, 5:23 a.m. UTC | #3
On Tue, Nov 29, 2016 at 06:50:18PM -0800, John Fastabend wrote:
> On 16-11-29 04:37 PM, Alexei Starovoitov wrote:
> > On Tue, Nov 29, 2016 at 12:11:33PM -0800, John Fastabend wrote:
> >> virtio_net XDP support expects receive buffers to be contiguous.
> >> If this is not the case we enable a slowpath to allow connectivity
> >> to continue but at a significan performance overhead associated with
> >> linearizing data. To make it painfully aware to users that XDP is
> >> running in a degraded mode we throw an xdp buffer error.
> >>
> >> To linearize packets we allocate a page and copy the segments of
> >> the data, including the header, into it. After this the page can be
> >> handled by XDP code flow as normal.
> >>
> >> Then depending on the return code the page is either freed or sent
> >> to the XDP xmit path. There is no attempt to optimize this path.
> >>
> >> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> > ...
> >> +/* The conditions to enable XDP should preclude the underlying device from
> >> + * sending packets across multiple buffers (num_buf > 1). However per spec
> >> + * it does not appear to be illegal to do so but rather just against convention.
> >> + * So in order to avoid making a system unresponsive the packets are pushed
> >> + * into a page and the XDP program is run. This will be extremely slow and we
> >> + * push a warning to the user to fix this as soon as possible. Fixing this may
> >> + * require resolving the underlying hardware to determine why multiple buffers
> >> + * are being received or simply loading the XDP program in the ingress stack
> >> + * after the skb is built because there is no advantage to running it here
> >> + * anymore.
> >> + */
> > ...
> >>  		if (num_buf > 1) {
> >>  			bpf_warn_invalid_xdp_buffer();
> 			^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> Here is the warn once call. I made it a defined bpf warning so that we
> can easily identify if needed and it can be used by other drivers as
> well.

ahh. I thought it has - in front of it :)
and you're deleting it in this patch.

> >> -			goto err_xdp;
> >> +
> >> +			/* linearize data for XDP */
> >> +			xdp_page = xdp_linearize_page(rq, num_buf,
> >> +						      page, offset, &len);
> >> +			if (!xdp_page)
> >> +				goto err_xdp;
> > 
> > in case when we're 'lucky' the performance will silently be bad.
> 
> Were you specifically worried about the alloc failing here and no
> indication? I was thinking a statistic counter could be added as a
> follow on series to catch this and other such cases in non XDP paths
> if needed.
> 
> > Can we do warn_once here? so at least something in dmesg points out
> > that performance is not as expected. Am I reading it correctly that
> > you had to do a special kernel hack to trigger this situation and
> > in all normal cases it's not the case?
> > 
> 
> Correct the only way to produce this with upstream qemu and Linux is to
> write a kernel hack to build these types of buffers. AFAIK I caught all
> the cases where it could happen otherwise in the setup xdp prog call and
> required user to configure device driver so they can not happen e.g.
> disable LRO, set MTU < PAGE_SIZE, etc.
> 
> If I missed anything, I don't see it now, I would call it a bug and fix
> it. Again AFAIK everything should be covered though. As Micheal pointed
> out earlier although unexpected its not strictly against virtio spec to
> build such packets so we should handle it at least in a degraded mode
> even though it is not expected in any qemu cases.

Ok. Makes sense. The above wasn't clear in the commit log.
Thanks
Jakub Kicinski Nov. 30, 2016, 2:30 p.m. UTC | #4
[add MST]

On Tue, 29 Nov 2016 12:11:33 -0800, John Fastabend wrote:
> virtio_net XDP support expects receive buffers to be contiguous.
> If this is not the case we enable a slowpath to allow connectivity
> to continue but at a significan performance overhead associated with
> linearizing data. To make it painfully aware to users that XDP is
> running in a degraded mode we throw an xdp buffer error.
> 
> To linearize packets we allocate a page and copy the segments of
> the data, including the header, into it. After this the page can be
> handled by XDP code flow as normal.
> 
> Then depending on the return code the page is either freed or sent
> to the XDP xmit path. There is no attempt to optimize this path.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>  drivers/net/virtio_net.c |   70 +++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 68 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 9604e55..b0ce4ef 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -449,6 +449,56 @@ static struct sk_buff *receive_big(struct net_device *dev,
>  	return NULL;
>  }
>  
> +/* The conditions to enable XDP should preclude the underlying device from
> + * sending packets across multiple buffers (num_buf > 1). However per spec
> + * it does not appear to be illegal to do so but rather just against convention.
> + * So in order to avoid making a system unresponsive the packets are pushed
> + * into a page and the XDP program is run. This will be extremely slow and we
> + * push a warning to the user to fix this as soon as possible. Fixing this may
> + * require resolving the underlying hardware to determine why multiple buffers
> + * are being received or simply loading the XDP program in the ingress stack
> + * after the skb is built because there is no advantage to running it here
> + * anymore.
> + */
> +static struct page *xdp_linearize_page(struct receive_queue *rq,
> +				       u16 num_buf,
> +				       struct page *p,
> +				       int offset,
> +				       unsigned int *len)
> +{
> +	struct page *page = alloc_page(GFP_ATOMIC);
> +	unsigned int page_off = 0;
> +
> +	if (!page)
> +		return NULL;
> +
> +	memcpy(page_address(page) + page_off, page_address(p) + offset, *len);
> +	while (--num_buf) {
> +		unsigned int buflen;
> +		unsigned long ctx;
> +		void *buf;
> +		int off;
> +
> +		ctx = (unsigned long)virtqueue_get_buf(rq->vq, &buflen);
> +		if (unlikely(!ctx))
> +			goto err_buf;
> +
> +		buf = mergeable_ctx_to_buf_address(ctx);
> +		p = virt_to_head_page(buf);
> +		off = buf - page_address(p);
> +
> +		memcpy(page_address(page) + page_off,
> +		       page_address(p) + off, buflen);
> +		page_off += buflen;

Could malicious user potentially submit a frame bigger than MTU?

> +	}
> +
> +	*len = page_off;
> +	return page;
> +err_buf:
> +	__free_pages(page, 0);
> +	return NULL;
> +}
> +
>  static struct sk_buff *receive_mergeable(struct net_device *dev,
>  					 struct virtnet_info *vi,
>  					 struct receive_queue *rq,
> @@ -469,21 +519,37 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  	rcu_read_lock();
>  	xdp_prog = rcu_dereference(rq->xdp_prog);
>  	if (xdp_prog) {
> +		struct page *xdp_page;
>  		u32 act;
>  
>  		if (num_buf > 1) {
>  			bpf_warn_invalid_xdp_buffer();
> -			goto err_xdp;
> +
> +			/* linearize data for XDP */
> +			xdp_page = xdp_linearize_page(rq, num_buf,
> +						      page, offset, &len);
> +			if (!xdp_page)
> +				goto err_xdp;
> +			offset = len;
> +		} else {
> +			xdp_page = page;
>  		}
>  
> -		act = do_xdp_prog(vi, xdp_prog, page, offset, len);
> +		act = do_xdp_prog(vi, xdp_prog, xdp_page, offset, len);
>  		switch (act) {
>  		case XDP_PASS:
> +			if (unlikely(xdp_page != page))
> +				__free_pages(xdp_page, 0);
>  			break;
>  		case XDP_TX:
> +			if (unlikely(xdp_page != page))
> +				goto err_xdp;
> +			rcu_read_unlock();

Only if there is a reason for v4 - this unlock could go to the previous
patch.
John Fastabend Nov. 30, 2016, 4:50 p.m. UTC | #5
On 16-11-30 06:30 AM, Jakub Kicinski wrote:
> [add MST]
> 

Thanks sorry MST. I did a cut'n'paste of an old list of CC's and missed
you were not on the list.

[...]

>> +	memcpy(page_address(page) + page_off, page_address(p) + offset, *len);
>> +	while (--num_buf) {
>> +		unsigned int buflen;
>> +		unsigned long ctx;
>> +		void *buf;
>> +		int off;
>> +
>> +		ctx = (unsigned long)virtqueue_get_buf(rq->vq, &buflen);
>> +		if (unlikely(!ctx))
>> +			goto err_buf;
>> +
>> +		buf = mergeable_ctx_to_buf_address(ctx);
>> +		p = virt_to_head_page(buf);
>> +		off = buf - page_address(p);
>> +
>> +		memcpy(page_address(page) + page_off,
>> +		       page_address(p) + off, buflen);
>> +		page_off += buflen;
> 
> Could malicious user potentially submit a frame bigger than MTU?

Well presumably if the MTU is greater than PAGE_SIZE the xdp program
would not have been loaded. And the malicious user in this case would
have to be qemu which seems like everything is already lost if qemu
is trying to attack its VM.

But this is a good point because it looks like there is nothing in
virtio or qemu that drops frames with MTU greater than the virtio
configured setting. Maybe Michael can confirm this or I'll poke at it
more. I think qemu should drop these frames in general.

So I think adding a guard here is sensible I'll go ahead and do that.
Also the MTU guard at set_xdp time needs to account for header length.

Thanks nice catch.

> 
>> +	}
>> +
>> +	*len = page_off;
>> +	return page;
>> +err_buf:
>> +	__free_pages(page, 0);
>> +	return NULL;
>> +}
>> +
>>  static struct sk_buff *receive_mergeable(struct net_device *dev,
>>  					 struct virtnet_info *vi,
>>  					 struct receive_queue *rq,
>> @@ -469,21 +519,37 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>  	rcu_read_lock();
>>  	xdp_prog = rcu_dereference(rq->xdp_prog);
>>  	if (xdp_prog) {
>> +		struct page *xdp_page;
>>  		u32 act;
>>  
>>  		if (num_buf > 1) {
>>  			bpf_warn_invalid_xdp_buffer();
>> -			goto err_xdp;
>> +
>> +			/* linearize data for XDP */
>> +			xdp_page = xdp_linearize_page(rq, num_buf,
>> +						      page, offset, &len);
>> +			if (!xdp_page)
>> +				goto err_xdp;
>> +			offset = len;
>> +		} else {
>> +			xdp_page = page;
>>  		}
>>  
>> -		act = do_xdp_prog(vi, xdp_prog, page, offset, len);
>> +		act = do_xdp_prog(vi, xdp_prog, xdp_page, offset, len);
>>  		switch (act) {
>>  		case XDP_PASS:
>> +			if (unlikely(xdp_page != page))
>> +				__free_pages(xdp_page, 0);
>>  			break;
>>  		case XDP_TX:
>> +			if (unlikely(xdp_page != page))
>> +				goto err_xdp;
>> +			rcu_read_unlock();
> 
> Only if there is a reason for v4 - this unlock could go to the previous
> patch.
> 

Sure will do this.
Michael S. Tsirkin Nov. 30, 2016, 6:49 p.m. UTC | #6
On Wed, Nov 30, 2016 at 08:50:41AM -0800, John Fastabend wrote:
> On 16-11-30 06:30 AM, Jakub Kicinski wrote:
> > [add MST]
> > 
> 
> Thanks sorry MST. I did a cut'n'paste of an old list of CC's and missed
> you were not on the list.
> 
> [...]
> 
> >> +	memcpy(page_address(page) + page_off, page_address(p) + offset, *len);
> >> +	while (--num_buf) {
> >> +		unsigned int buflen;
> >> +		unsigned long ctx;
> >> +		void *buf;
> >> +		int off;
> >> +
> >> +		ctx = (unsigned long)virtqueue_get_buf(rq->vq, &buflen);
> >> +		if (unlikely(!ctx))
> >> +			goto err_buf;
> >> +
> >> +		buf = mergeable_ctx_to_buf_address(ctx);
> >> +		p = virt_to_head_page(buf);
> >> +		off = buf - page_address(p);
> >> +
> >> +		memcpy(page_address(page) + page_off,
> >> +		       page_address(p) + off, buflen);
> >> +		page_off += buflen;
> > 
> > Could malicious user potentially submit a frame bigger than MTU?
> 
> Well presumably if the MTU is greater than PAGE_SIZE the xdp program
> would not have been loaded. And the malicious user in this case would
> have to be qemu which seems like everything is already lost if qemu
> is trying to attack its VM.
> 
> But this is a good point because it looks like there is nothing in
> virtio or qemu that drops frames with MTU greater than the virtio
> configured setting. Maybe Michael can confirm this or I'll poke at it
> more. I think qemu should drop these frames in general.
> 
> So I think adding a guard here is sensible I'll go ahead and do that.
> Also the MTU guard at set_xdp time needs to account for header length.

I agree. Further, offloads are disabled dynamically and we could
get a packet that was processed with LRO.

> Thanks nice catch.
> 
> > 
> >> +	}
> >> +
> >> +	*len = page_off;
> >> +	return page;
> >> +err_buf:
> >> +	__free_pages(page, 0);
> >> +	return NULL;
> >> +}
> >> +
> >>  static struct sk_buff *receive_mergeable(struct net_device *dev,
> >>  					 struct virtnet_info *vi,
> >>  					 struct receive_queue *rq,
> >> @@ -469,21 +519,37 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >>  	rcu_read_lock();
> >>  	xdp_prog = rcu_dereference(rq->xdp_prog);
> >>  	if (xdp_prog) {
> >> +		struct page *xdp_page;
> >>  		u32 act;
> >>  
> >>  		if (num_buf > 1) {
> >>  			bpf_warn_invalid_xdp_buffer();
> >> -			goto err_xdp;
> >> +
> >> +			/* linearize data for XDP */
> >> +			xdp_page = xdp_linearize_page(rq, num_buf,
> >> +						      page, offset, &len);
> >> +			if (!xdp_page)
> >> +				goto err_xdp;
> >> +			offset = len;
> >> +		} else {
> >> +			xdp_page = page;
> >>  		}
> >>  
> >> -		act = do_xdp_prog(vi, xdp_prog, page, offset, len);
> >> +		act = do_xdp_prog(vi, xdp_prog, xdp_page, offset, len);
> >>  		switch (act) {
> >>  		case XDP_PASS:
> >> +			if (unlikely(xdp_page != page))
> >> +				__free_pages(xdp_page, 0);
> >>  			break;
> >>  		case XDP_TX:
> >> +			if (unlikely(xdp_page != page))
> >> +				goto err_xdp;
> >> +			rcu_read_unlock();
> > 
> > Only if there is a reason for v4 - this unlock could go to the previous
> > patch.
> > 
> 
> Sure will do this.
diff mbox

Patch

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 9604e55..b0ce4ef 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -449,6 +449,56 @@  static struct sk_buff *receive_big(struct net_device *dev,
 	return NULL;
 }
 
+/* The conditions to enable XDP should preclude the underlying device from
+ * sending packets across multiple buffers (num_buf > 1). However per spec
+ * it does not appear to be illegal to do so but rather just against convention.
+ * So in order to avoid making a system unresponsive the packets are pushed
+ * into a page and the XDP program is run. This will be extremely slow and we
+ * push a warning to the user to fix this as soon as possible. Fixing this may
+ * require resolving the underlying hardware to determine why multiple buffers
+ * are being received or simply loading the XDP program in the ingress stack
+ * after the skb is built because there is no advantage to running it here
+ * anymore.
+ */
+static struct page *xdp_linearize_page(struct receive_queue *rq,
+				       u16 num_buf,
+				       struct page *p,
+				       int offset,
+				       unsigned int *len)
+{
+	struct page *page = alloc_page(GFP_ATOMIC);
+	unsigned int page_off = 0;
+
+	if (!page)
+		return NULL;
+
+	memcpy(page_address(page) + page_off, page_address(p) + offset, *len);
+	while (--num_buf) {
+		unsigned int buflen;
+		unsigned long ctx;
+		void *buf;
+		int off;
+
+		ctx = (unsigned long)virtqueue_get_buf(rq->vq, &buflen);
+		if (unlikely(!ctx))
+			goto err_buf;
+
+		buf = mergeable_ctx_to_buf_address(ctx);
+		p = virt_to_head_page(buf);
+		off = buf - page_address(p);
+
+		memcpy(page_address(page) + page_off,
+		       page_address(p) + off, buflen);
+		page_off += buflen;
+	}
+
+	*len = page_off;
+	return page;
+err_buf:
+	__free_pages(page, 0);
+	return NULL;
+}
+
 static struct sk_buff *receive_mergeable(struct net_device *dev,
 					 struct virtnet_info *vi,
 					 struct receive_queue *rq,
@@ -469,21 +519,37 @@  static struct sk_buff *receive_mergeable(struct net_device *dev,
 	rcu_read_lock();
 	xdp_prog = rcu_dereference(rq->xdp_prog);
 	if (xdp_prog) {
+		struct page *xdp_page;
 		u32 act;
 
 		if (num_buf > 1) {
 			bpf_warn_invalid_xdp_buffer();
-			goto err_xdp;
+
+			/* linearize data for XDP */
+			xdp_page = xdp_linearize_page(rq, num_buf,
+						      page, offset, &len);
+			if (!xdp_page)
+				goto err_xdp;
+			offset = len;
+		} else {
+			xdp_page = page;
 		}
 
-		act = do_xdp_prog(vi, xdp_prog, page, offset, len);
+		act = do_xdp_prog(vi, xdp_prog, xdp_page, offset, len);
 		switch (act) {
 		case XDP_PASS:
+			if (unlikely(xdp_page != page))
+				__free_pages(xdp_page, 0);
 			break;
 		case XDP_TX:
+			if (unlikely(xdp_page != page))
+				goto err_xdp;
+			rcu_read_unlock();
 			goto xdp_xmit;
 		case XDP_DROP:
 		default:
+			if (unlikely(xdp_page != page))
+				__free_pages(xdp_page, 0);
 			goto err_xdp;
 		}
 	}