Message ID | 20161129201133.26851.31803.stgit@john-Precision-Tower-5810 |
---|---|
State | Deferred, archived |
Delegated to: | David Miller |
Headers | show |
On Tue, Nov 29, 2016 at 12:11:33PM -0800, John Fastabend wrote: > virtio_net XDP support expects receive buffers to be contiguous. > If this is not the case we enable a slowpath to allow connectivity > to continue but at a significan performance overhead associated with > linearizing data. To make it painfully aware to users that XDP is > running in a degraded mode we throw an xdp buffer error. > > To linearize packets we allocate a page and copy the segments of > the data, including the header, into it. After this the page can be > handled by XDP code flow as normal. > > Then depending on the return code the page is either freed or sent > to the XDP xmit path. There is no attempt to optimize this path. > > Signed-off-by: John Fastabend <john.r.fastabend@intel.com> ... > +/* The conditions to enable XDP should preclude the underlying device from > + * sending packets across multiple buffers (num_buf > 1). However per spec > + * it does not appear to be illegal to do so but rather just against convention. > + * So in order to avoid making a system unresponsive the packets are pushed > + * into a page and the XDP program is run. This will be extremely slow and we > + * push a warning to the user to fix this as soon as possible. Fixing this may > + * require resolving the underlying hardware to determine why multiple buffers > + * are being received or simply loading the XDP program in the ingress stack > + * after the skb is built because there is no advantage to running it here > + * anymore. > + */ ... > if (num_buf > 1) { > bpf_warn_invalid_xdp_buffer(); > - goto err_xdp; > + > + /* linearize data for XDP */ > + xdp_page = xdp_linearize_page(rq, num_buf, > + page, offset, &len); > + if (!xdp_page) > + goto err_xdp; in case when we're 'lucky' the performance will silently be bad. Can we do warn_once here? so at least something in dmesg points out that performance is not as expected. Am I reading it correctly that you had to do a special kernel hack to trigger this situation and in all normal cases it's not the case?
On 16-11-29 04:37 PM, Alexei Starovoitov wrote: > On Tue, Nov 29, 2016 at 12:11:33PM -0800, John Fastabend wrote: >> virtio_net XDP support expects receive buffers to be contiguous. >> If this is not the case we enable a slowpath to allow connectivity >> to continue but at a significan performance overhead associated with >> linearizing data. To make it painfully aware to users that XDP is >> running in a degraded mode we throw an xdp buffer error. >> >> To linearize packets we allocate a page and copy the segments of >> the data, including the header, into it. After this the page can be >> handled by XDP code flow as normal. >> >> Then depending on the return code the page is either freed or sent >> to the XDP xmit path. There is no attempt to optimize this path. >> >> Signed-off-by: John Fastabend <john.r.fastabend@intel.com> > ... >> +/* The conditions to enable XDP should preclude the underlying device from >> + * sending packets across multiple buffers (num_buf > 1). However per spec >> + * it does not appear to be illegal to do so but rather just against convention. >> + * So in order to avoid making a system unresponsive the packets are pushed >> + * into a page and the XDP program is run. This will be extremely slow and we >> + * push a warning to the user to fix this as soon as possible. Fixing this may >> + * require resolving the underlying hardware to determine why multiple buffers >> + * are being received or simply loading the XDP program in the ingress stack >> + * after the skb is built because there is no advantage to running it here >> + * anymore. >> + */ > ... >> if (num_buf > 1) { >> bpf_warn_invalid_xdp_buffer(); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Here is the warn once call. I made it a defined bpf warning so that we can easily identify if needed and it can be used by other drivers as well. >> - goto err_xdp; >> + >> + /* linearize data for XDP */ >> + xdp_page = xdp_linearize_page(rq, num_buf, >> + page, offset, &len); >> + if (!xdp_page) >> + goto err_xdp; > > in case when we're 'lucky' the performance will silently be bad. Were you specifically worried about the alloc failing here and no indication? I was thinking a statistic counter could be added as a follow on series to catch this and other such cases in non XDP paths if needed. > Can we do warn_once here? so at least something in dmesg points out > that performance is not as expected. Am I reading it correctly that > you had to do a special kernel hack to trigger this situation and > in all normal cases it's not the case? > Correct the only way to produce this with upstream qemu and Linux is to write a kernel hack to build these types of buffers. AFAIK I caught all the cases where it could happen otherwise in the setup xdp prog call and required user to configure device driver so they can not happen e.g. disable LRO, set MTU < PAGE_SIZE, etc. If I missed anything, I don't see it now, I would call it a bug and fix it. Again AFAIK everything should be covered though. As Micheal pointed out earlier although unexpected its not strictly against virtio spec to build such packets so we should handle it at least in a degraded mode even though it is not expected in any qemu cases. .John
On Tue, Nov 29, 2016 at 06:50:18PM -0800, John Fastabend wrote: > On 16-11-29 04:37 PM, Alexei Starovoitov wrote: > > On Tue, Nov 29, 2016 at 12:11:33PM -0800, John Fastabend wrote: > >> virtio_net XDP support expects receive buffers to be contiguous. > >> If this is not the case we enable a slowpath to allow connectivity > >> to continue but at a significan performance overhead associated with > >> linearizing data. To make it painfully aware to users that XDP is > >> running in a degraded mode we throw an xdp buffer error. > >> > >> To linearize packets we allocate a page and copy the segments of > >> the data, including the header, into it. After this the page can be > >> handled by XDP code flow as normal. > >> > >> Then depending on the return code the page is either freed or sent > >> to the XDP xmit path. There is no attempt to optimize this path. > >> > >> Signed-off-by: John Fastabend <john.r.fastabend@intel.com> > > ... > >> +/* The conditions to enable XDP should preclude the underlying device from > >> + * sending packets across multiple buffers (num_buf > 1). However per spec > >> + * it does not appear to be illegal to do so but rather just against convention. > >> + * So in order to avoid making a system unresponsive the packets are pushed > >> + * into a page and the XDP program is run. This will be extremely slow and we > >> + * push a warning to the user to fix this as soon as possible. Fixing this may > >> + * require resolving the underlying hardware to determine why multiple buffers > >> + * are being received or simply loading the XDP program in the ingress stack > >> + * after the skb is built because there is no advantage to running it here > >> + * anymore. > >> + */ > > ... > >> if (num_buf > 1) { > >> bpf_warn_invalid_xdp_buffer(); > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > Here is the warn once call. I made it a defined bpf warning so that we > can easily identify if needed and it can be used by other drivers as > well. ahh. I thought it has - in front of it :) and you're deleting it in this patch. > >> - goto err_xdp; > >> + > >> + /* linearize data for XDP */ > >> + xdp_page = xdp_linearize_page(rq, num_buf, > >> + page, offset, &len); > >> + if (!xdp_page) > >> + goto err_xdp; > > > > in case when we're 'lucky' the performance will silently be bad. > > Were you specifically worried about the alloc failing here and no > indication? I was thinking a statistic counter could be added as a > follow on series to catch this and other such cases in non XDP paths > if needed. > > > Can we do warn_once here? so at least something in dmesg points out > > that performance is not as expected. Am I reading it correctly that > > you had to do a special kernel hack to trigger this situation and > > in all normal cases it's not the case? > > > > Correct the only way to produce this with upstream qemu and Linux is to > write a kernel hack to build these types of buffers. AFAIK I caught all > the cases where it could happen otherwise in the setup xdp prog call and > required user to configure device driver so they can not happen e.g. > disable LRO, set MTU < PAGE_SIZE, etc. > > If I missed anything, I don't see it now, I would call it a bug and fix > it. Again AFAIK everything should be covered though. As Micheal pointed > out earlier although unexpected its not strictly against virtio spec to > build such packets so we should handle it at least in a degraded mode > even though it is not expected in any qemu cases. Ok. Makes sense. The above wasn't clear in the commit log. Thanks
[add MST] On Tue, 29 Nov 2016 12:11:33 -0800, John Fastabend wrote: > virtio_net XDP support expects receive buffers to be contiguous. > If this is not the case we enable a slowpath to allow connectivity > to continue but at a significan performance overhead associated with > linearizing data. To make it painfully aware to users that XDP is > running in a degraded mode we throw an xdp buffer error. > > To linearize packets we allocate a page and copy the segments of > the data, including the header, into it. After this the page can be > handled by XDP code flow as normal. > > Then depending on the return code the page is either freed or sent > to the XDP xmit path. There is no attempt to optimize this path. > > Signed-off-by: John Fastabend <john.r.fastabend@intel.com> > --- > drivers/net/virtio_net.c | 70 +++++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 68 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 9604e55..b0ce4ef 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -449,6 +449,56 @@ static struct sk_buff *receive_big(struct net_device *dev, > return NULL; > } > > +/* The conditions to enable XDP should preclude the underlying device from > + * sending packets across multiple buffers (num_buf > 1). However per spec > + * it does not appear to be illegal to do so but rather just against convention. > + * So in order to avoid making a system unresponsive the packets are pushed > + * into a page and the XDP program is run. This will be extremely slow and we > + * push a warning to the user to fix this as soon as possible. Fixing this may > + * require resolving the underlying hardware to determine why multiple buffers > + * are being received or simply loading the XDP program in the ingress stack > + * after the skb is built because there is no advantage to running it here > + * anymore. > + */ > +static struct page *xdp_linearize_page(struct receive_queue *rq, > + u16 num_buf, > + struct page *p, > + int offset, > + unsigned int *len) > +{ > + struct page *page = alloc_page(GFP_ATOMIC); > + unsigned int page_off = 0; > + > + if (!page) > + return NULL; > + > + memcpy(page_address(page) + page_off, page_address(p) + offset, *len); > + while (--num_buf) { > + unsigned int buflen; > + unsigned long ctx; > + void *buf; > + int off; > + > + ctx = (unsigned long)virtqueue_get_buf(rq->vq, &buflen); > + if (unlikely(!ctx)) > + goto err_buf; > + > + buf = mergeable_ctx_to_buf_address(ctx); > + p = virt_to_head_page(buf); > + off = buf - page_address(p); > + > + memcpy(page_address(page) + page_off, > + page_address(p) + off, buflen); > + page_off += buflen; Could malicious user potentially submit a frame bigger than MTU? > + } > + > + *len = page_off; > + return page; > +err_buf: > + __free_pages(page, 0); > + return NULL; > +} > + > static struct sk_buff *receive_mergeable(struct net_device *dev, > struct virtnet_info *vi, > struct receive_queue *rq, > @@ -469,21 +519,37 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, > rcu_read_lock(); > xdp_prog = rcu_dereference(rq->xdp_prog); > if (xdp_prog) { > + struct page *xdp_page; > u32 act; > > if (num_buf > 1) { > bpf_warn_invalid_xdp_buffer(); > - goto err_xdp; > + > + /* linearize data for XDP */ > + xdp_page = xdp_linearize_page(rq, num_buf, > + page, offset, &len); > + if (!xdp_page) > + goto err_xdp; > + offset = len; > + } else { > + xdp_page = page; > } > > - act = do_xdp_prog(vi, xdp_prog, page, offset, len); > + act = do_xdp_prog(vi, xdp_prog, xdp_page, offset, len); > switch (act) { > case XDP_PASS: > + if (unlikely(xdp_page != page)) > + __free_pages(xdp_page, 0); > break; > case XDP_TX: > + if (unlikely(xdp_page != page)) > + goto err_xdp; > + rcu_read_unlock(); Only if there is a reason for v4 - this unlock could go to the previous patch.
On 16-11-30 06:30 AM, Jakub Kicinski wrote: > [add MST] > Thanks sorry MST. I did a cut'n'paste of an old list of CC's and missed you were not on the list. [...] >> + memcpy(page_address(page) + page_off, page_address(p) + offset, *len); >> + while (--num_buf) { >> + unsigned int buflen; >> + unsigned long ctx; >> + void *buf; >> + int off; >> + >> + ctx = (unsigned long)virtqueue_get_buf(rq->vq, &buflen); >> + if (unlikely(!ctx)) >> + goto err_buf; >> + >> + buf = mergeable_ctx_to_buf_address(ctx); >> + p = virt_to_head_page(buf); >> + off = buf - page_address(p); >> + >> + memcpy(page_address(page) + page_off, >> + page_address(p) + off, buflen); >> + page_off += buflen; > > Could malicious user potentially submit a frame bigger than MTU? Well presumably if the MTU is greater than PAGE_SIZE the xdp program would not have been loaded. And the malicious user in this case would have to be qemu which seems like everything is already lost if qemu is trying to attack its VM. But this is a good point because it looks like there is nothing in virtio or qemu that drops frames with MTU greater than the virtio configured setting. Maybe Michael can confirm this or I'll poke at it more. I think qemu should drop these frames in general. So I think adding a guard here is sensible I'll go ahead and do that. Also the MTU guard at set_xdp time needs to account for header length. Thanks nice catch. > >> + } >> + >> + *len = page_off; >> + return page; >> +err_buf: >> + __free_pages(page, 0); >> + return NULL; >> +} >> + >> static struct sk_buff *receive_mergeable(struct net_device *dev, >> struct virtnet_info *vi, >> struct receive_queue *rq, >> @@ -469,21 +519,37 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, >> rcu_read_lock(); >> xdp_prog = rcu_dereference(rq->xdp_prog); >> if (xdp_prog) { >> + struct page *xdp_page; >> u32 act; >> >> if (num_buf > 1) { >> bpf_warn_invalid_xdp_buffer(); >> - goto err_xdp; >> + >> + /* linearize data for XDP */ >> + xdp_page = xdp_linearize_page(rq, num_buf, >> + page, offset, &len); >> + if (!xdp_page) >> + goto err_xdp; >> + offset = len; >> + } else { >> + xdp_page = page; >> } >> >> - act = do_xdp_prog(vi, xdp_prog, page, offset, len); >> + act = do_xdp_prog(vi, xdp_prog, xdp_page, offset, len); >> switch (act) { >> case XDP_PASS: >> + if (unlikely(xdp_page != page)) >> + __free_pages(xdp_page, 0); >> break; >> case XDP_TX: >> + if (unlikely(xdp_page != page)) >> + goto err_xdp; >> + rcu_read_unlock(); > > Only if there is a reason for v4 - this unlock could go to the previous > patch. > Sure will do this.
On Wed, Nov 30, 2016 at 08:50:41AM -0800, John Fastabend wrote: > On 16-11-30 06:30 AM, Jakub Kicinski wrote: > > [add MST] > > > > Thanks sorry MST. I did a cut'n'paste of an old list of CC's and missed > you were not on the list. > > [...] > > >> + memcpy(page_address(page) + page_off, page_address(p) + offset, *len); > >> + while (--num_buf) { > >> + unsigned int buflen; > >> + unsigned long ctx; > >> + void *buf; > >> + int off; > >> + > >> + ctx = (unsigned long)virtqueue_get_buf(rq->vq, &buflen); > >> + if (unlikely(!ctx)) > >> + goto err_buf; > >> + > >> + buf = mergeable_ctx_to_buf_address(ctx); > >> + p = virt_to_head_page(buf); > >> + off = buf - page_address(p); > >> + > >> + memcpy(page_address(page) + page_off, > >> + page_address(p) + off, buflen); > >> + page_off += buflen; > > > > Could malicious user potentially submit a frame bigger than MTU? > > Well presumably if the MTU is greater than PAGE_SIZE the xdp program > would not have been loaded. And the malicious user in this case would > have to be qemu which seems like everything is already lost if qemu > is trying to attack its VM. > > But this is a good point because it looks like there is nothing in > virtio or qemu that drops frames with MTU greater than the virtio > configured setting. Maybe Michael can confirm this or I'll poke at it > more. I think qemu should drop these frames in general. > > So I think adding a guard here is sensible I'll go ahead and do that. > Also the MTU guard at set_xdp time needs to account for header length. I agree. Further, offloads are disabled dynamically and we could get a packet that was processed with LRO. > Thanks nice catch. > > > > >> + } > >> + > >> + *len = page_off; > >> + return page; > >> +err_buf: > >> + __free_pages(page, 0); > >> + return NULL; > >> +} > >> + > >> static struct sk_buff *receive_mergeable(struct net_device *dev, > >> struct virtnet_info *vi, > >> struct receive_queue *rq, > >> @@ -469,21 +519,37 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, > >> rcu_read_lock(); > >> xdp_prog = rcu_dereference(rq->xdp_prog); > >> if (xdp_prog) { > >> + struct page *xdp_page; > >> u32 act; > >> > >> if (num_buf > 1) { > >> bpf_warn_invalid_xdp_buffer(); > >> - goto err_xdp; > >> + > >> + /* linearize data for XDP */ > >> + xdp_page = xdp_linearize_page(rq, num_buf, > >> + page, offset, &len); > >> + if (!xdp_page) > >> + goto err_xdp; > >> + offset = len; > >> + } else { > >> + xdp_page = page; > >> } > >> > >> - act = do_xdp_prog(vi, xdp_prog, page, offset, len); > >> + act = do_xdp_prog(vi, xdp_prog, xdp_page, offset, len); > >> switch (act) { > >> case XDP_PASS: > >> + if (unlikely(xdp_page != page)) > >> + __free_pages(xdp_page, 0); > >> break; > >> case XDP_TX: > >> + if (unlikely(xdp_page != page)) > >> + goto err_xdp; > >> + rcu_read_unlock(); > > > > Only if there is a reason for v4 - this unlock could go to the previous > > patch. > > > > Sure will do this.
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 9604e55..b0ce4ef 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -449,6 +449,56 @@ static struct sk_buff *receive_big(struct net_device *dev, return NULL; } +/* The conditions to enable XDP should preclude the underlying device from + * sending packets across multiple buffers (num_buf > 1). However per spec + * it does not appear to be illegal to do so but rather just against convention. + * So in order to avoid making a system unresponsive the packets are pushed + * into a page and the XDP program is run. This will be extremely slow and we + * push a warning to the user to fix this as soon as possible. Fixing this may + * require resolving the underlying hardware to determine why multiple buffers + * are being received or simply loading the XDP program in the ingress stack + * after the skb is built because there is no advantage to running it here + * anymore. + */ +static struct page *xdp_linearize_page(struct receive_queue *rq, + u16 num_buf, + struct page *p, + int offset, + unsigned int *len) +{ + struct page *page = alloc_page(GFP_ATOMIC); + unsigned int page_off = 0; + + if (!page) + return NULL; + + memcpy(page_address(page) + page_off, page_address(p) + offset, *len); + while (--num_buf) { + unsigned int buflen; + unsigned long ctx; + void *buf; + int off; + + ctx = (unsigned long)virtqueue_get_buf(rq->vq, &buflen); + if (unlikely(!ctx)) + goto err_buf; + + buf = mergeable_ctx_to_buf_address(ctx); + p = virt_to_head_page(buf); + off = buf - page_address(p); + + memcpy(page_address(page) + page_off, + page_address(p) + off, buflen); + page_off += buflen; + } + + *len = page_off; + return page; +err_buf: + __free_pages(page, 0); + return NULL; +} + static struct sk_buff *receive_mergeable(struct net_device *dev, struct virtnet_info *vi, struct receive_queue *rq, @@ -469,21 +519,37 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, rcu_read_lock(); xdp_prog = rcu_dereference(rq->xdp_prog); if (xdp_prog) { + struct page *xdp_page; u32 act; if (num_buf > 1) { bpf_warn_invalid_xdp_buffer(); - goto err_xdp; + + /* linearize data for XDP */ + xdp_page = xdp_linearize_page(rq, num_buf, + page, offset, &len); + if (!xdp_page) + goto err_xdp; + offset = len; + } else { + xdp_page = page; } - act = do_xdp_prog(vi, xdp_prog, page, offset, len); + act = do_xdp_prog(vi, xdp_prog, xdp_page, offset, len); switch (act) { case XDP_PASS: + if (unlikely(xdp_page != page)) + __free_pages(xdp_page, 0); break; case XDP_TX: + if (unlikely(xdp_page != page)) + goto err_xdp; + rcu_read_unlock(); goto xdp_xmit; case XDP_DROP: default: + if (unlikely(xdp_page != page)) + __free_pages(xdp_page, 0); goto err_xdp; } }
virtio_net XDP support expects receive buffers to be contiguous. If this is not the case we enable a slowpath to allow connectivity to continue but at a significan performance overhead associated with linearizing data. To make it painfully aware to users that XDP is running in a degraded mode we throw an xdp buffer error. To linearize packets we allocate a page and copy the segments of the data, including the header, into it. After this the page can be handled by XDP code flow as normal. Then depending on the return code the page is either freed or sent to the XDP xmit path. There is no attempt to optimize this path. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> --- drivers/net/virtio_net.c | 70 +++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 68 insertions(+), 2 deletions(-)