diff mbox series

[v4] aquantia: Remove the build_skb path

Message ID CY4PR1001MB2311844FE8390F00A3363DEEE8E00@CY4PR1001MB2311.namprd10.prod.outlook.com
State Superseded
Headers show
Series [v4] aquantia: Remove the build_skb path | expand

Commit Message

Ramsay, Lincoln Nov. 19, 2020, 11:52 p.m. UTC
When performing IPv6 forwarding, there is an expectation that SKBs
will have some headroom. When forwarding a packet from the aquantia
driver, this does not always happen, triggering a kernel warning.

aq_ring.c has this code (edited slightly for brevity):

if (buff->is_eop && buff->len <= AQ_CFG_RX_FRAME_MAX - AQ_SKB_ALIGN) {
    skb = build_skb(aq_buf_vaddr(&buff->rxdata), AQ_CFG_RX_FRAME_MAX);
} else {
    skb = napi_alloc_skb(napi, AQ_CFG_RX_HDR_SIZE);

There is a significant difference between the SKB produced by these
2 code paths. When napi_alloc_skb creates an SKB, there is a certain
amount of headroom reserved. However, this is not done in the
build_skb codepath.

As the hardware buffer that build_skb is built around does not
handle the presence of the SKB header, this code path is being
removed and the napi_alloc_skb path will always be used. This code
path does have to copy the packet header into the SKB, but it adds
the packet data as a frag.

Signed-off-by: Lincoln Ramsay <lincoln.ramsay@opengear.com>
---

> For build_skb path to work the buffer scheme would need to be changed
> to reserve headroom, so yes, I think that the proposed patch is the
> most convenient solution.

I don't know about benefits/feasibility, but I did wonder if (in the event that the "fast path" is possible), the dma_mapping could use an offset? The page would include the skb header but the dma mapping would not. If that was done though, only 1 RX frame would fit into the page (at least on my system, where the RX frame seems to be 2k and the page is 4k). Also, there's a possibility to set the "order" variable, so that multiple pages are created at once and I'm not sure if this would work in that case.

> This only copies the initial part and then the rest is added as a frag.

Oh yeah. That's not as bad as I had thought then :)

I wonder though... if the "fast path" is possible, could the whole packet (including header) be added as a frag, avoiding the header copy? Or is that not how SKBs work?


 .../net/ethernet/aquantia/atlantic/aq_ring.c  | 127 ++++++++----------
 1 file changed, 53 insertions(+), 74 deletions(-)

Comments

Florian Westphal Nov. 20, 2020, 12:17 a.m. UTC | #1
Ramsay, Lincoln <Lincoln.Ramsay@digi.com> wrote:

[ patch looks good to me, I have no further comments ]

> > For build_skb path to work the buffer scheme would need to be changed
> > to reserve headroom, so yes, I think that the proposed patch is the
> > most convenient solution.
> 
> I don't know about benefits/feasibility, but I did wonder if (in the event that the "fast path" is possible), the dma_mapping could use an offset? The page would include the skb header but the dma mapping would not. If that was done though, only 1 RX frame would fit into the page (at least on my system, where the RX frame seems to be 2k and the page is 4k). Also, there's a possibility to set the "order" variable, so that multiple pages are created at once and I'm not sure if this would work in that case.

Yes, this is what some drivers do, they allocate a page, pass
pageaddr + headroom_offset everywhere, except build_skb() which gets the
pageaddr followed by skb_reserve(skb, headroom_offset).

> > This only copies the initial part and then the rest is added as a frag.
> 
> Oh yeah. That's not as bad as I had thought then :)
> 
> I wonder though... if the "fast path" is possible, could the whole packet (including header) be added as a frag, avoiding the header copy? Or is that not how SKBs work?

No, you can either have skb->head point to the page (build_skb), or
skb->head needs to be kmalloc'd (napi_alloc_skb for example).

Both can have page frags. In the second case, at least L2 header
needs to be in skb->head (and ip stack would pull in L3 header as well
later on anyway).
Ramsay, Lincoln Nov. 20, 2020, 12:23 a.m. UTC | #2
> > I don't know about benefits/feasibility, but I did wonder if (in the event that the "fast path" is possible), the dma_mapping could use an offset? The page would include the skb header but the dma mapping would not. If that was done though, only 1 RX frame would fit into the page (at least on my system, where the RX frame seems to be 2k and the page is 4k). Also, there's a possibility to set the "order" variable, so that multiple pages are created at once and I'm not sure if this would work in that case.
> 
> Yes, this is what some drivers do, they allocate a page, pass
> pageaddr + headroom_offset everywhere, except build_skb() which gets the
> pageaddr followed by skb_reserve(skb, headroom_offset).

If everyone's happy with the patch as-is, I might just leave it and let people more knowledgeable decide if it's worth adding this optimization back in with this offset logic ;)

Lincoln
Jakub Kicinski Nov. 21, 2020, 9:22 p.m. UTC | #3
On Thu, 19 Nov 2020 23:52:55 +0000 Ramsay, Lincoln wrote:
> When performing IPv6 forwarding, there is an expectation that SKBs
> will have some headroom. When forwarding a packet from the aquantia
> driver, this does not always happen, triggering a kernel warning.
> 
> aq_ring.c has this code (edited slightly for brevity):
> 
> if (buff->is_eop && buff->len <= AQ_CFG_RX_FRAME_MAX - AQ_SKB_ALIGN) {
>     skb = build_skb(aq_buf_vaddr(&buff->rxdata), AQ_CFG_RX_FRAME_MAX);
> } else {
>     skb = napi_alloc_skb(napi, AQ_CFG_RX_HDR_SIZE);
> 
> There is a significant difference between the SKB produced by these
> 2 code paths. When napi_alloc_skb creates an SKB, there is a certain
> amount of headroom reserved. However, this is not done in the
> build_skb codepath.
> 
> As the hardware buffer that build_skb is built around does not
> handle the presence of the SKB header, this code path is being
> removed and the napi_alloc_skb path will always be used. This code
> path does have to copy the packet header into the SKB, but it adds
> the packet data as a frag.
> 
> Signed-off-by: Lincoln Ramsay <lincoln.ramsay@opengear.com>

I was going to apply as a fix to net and stable but too many small nits
here to pass. First of all please add a From: line at the beginning of
the mail which matches the signoff (or use git-send-email, it'll get it
right).

> diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
> index 4f913658eea4..425e8e5afec7 100644
> --- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
> +++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
> @@ -413,85 +413,64 @@ int aq_ring_rx_clean(struct aq_ring_s *self,
>  					      buff->rxdata.pg_off,
>  					      buff->len, DMA_FROM_DEVICE);
>  
> -		/* for single fragment packets use build_skb() */
> -		if (buff->is_eop &&
> -		    buff->len <= AQ_CFG_RX_FRAME_MAX - AQ_SKB_ALIGN) {
> -			skb = build_skb(aq_buf_vaddr(&buff->rxdata),
> +		skb = napi_alloc_skb(napi, AQ_CFG_RX_HDR_SIZE);
> +		if (unlikely(!skb)) {
> +			u64_stats_update_begin(&self->stats.rx.syncp);
> +			self->stats.rx.skb_alloc_fails++;
> +			u64_stats_update_end(&self->stats.rx.syncp);
> +			err = -ENOMEM;
> +			goto err_exit;
> +		}
> +		if (is_ptp_ring)
> +			buff->len -=
> +				aq_ptp_extract_ts(self->aq_nic, skb,
> +					aq_buf_vaddr(&buff->rxdata),
> +					buff->len);

Align continuations of the lines under '(' like: 

  ./scripts/checkpatch.pl --max-line-length=80 --strict 

is asking you to.

> +		hdr_len = buff->len;
> +		if (hdr_len > AQ_CFG_RX_HDR_SIZE)
> +			hdr_len = eth_get_headlen(skb->dev,
> +							aq_buf_vaddr(&buff->rxdata),
> +							AQ_CFG_RX_HDR_SIZE);

ditto

> +		memcpy(__skb_put(skb, hdr_len), aq_buf_vaddr(&buff->rxdata),
> +			ALIGN(hdr_len, sizeof(long)));

And here, etc.

> +		if (buff->len - hdr_len > 0) {
> +			skb_add_rx_frag(skb, 0, buff->rxdata.page,
> +					buff->rxdata.pg_off + hdr_len,
> +					buff->len - hdr_len,

> +		if (!buff->is_eop) {
> +			buff_ = buff;
> +			i = 1U;
> +			do {
> +				next_ = buff_->next,

The end of this line should be a semicolon.

> +				buff_ = &self->buff_ring[next_];

Thanks!
Jakub Kicinski Nov. 21, 2020, 9:23 p.m. UTC | #4
On Sat, 21 Nov 2020 13:22:04 -0800 Jakub Kicinski wrote:
> On Thu, 19 Nov 2020 23:52:55 +0000 Ramsay, Lincoln wrote:
> > When performing IPv6 forwarding, there is an expectation that SKBs
> > will have some headroom. When forwarding a packet from the aquantia
> > driver, this does not always happen, triggering a kernel warning.
> > 
> > aq_ring.c has this code (edited slightly for brevity):
> > 
> > if (buff->is_eop && buff->len <= AQ_CFG_RX_FRAME_MAX - AQ_SKB_ALIGN) {
> >     skb = build_skb(aq_buf_vaddr(&buff->rxdata), AQ_CFG_RX_FRAME_MAX);
> > } else {
> >     skb = napi_alloc_skb(napi, AQ_CFG_RX_HDR_SIZE);
> > 
> > There is a significant difference between the SKB produced by these
> > 2 code paths. When napi_alloc_skb creates an SKB, there is a certain
> > amount of headroom reserved. However, this is not done in the
> > build_skb codepath.
> > 
> > As the hardware buffer that build_skb is built around does not
> > handle the presence of the SKB header, this code path is being
> > removed and the napi_alloc_skb path will always be used. This code
> > path does have to copy the packet header into the SKB, but it adds
> > the packet data as a frag.
> > 
> > Signed-off-by: Lincoln Ramsay <lincoln.ramsay@opengear.com>  
> 
> I was going to apply as a fix to net and stable but too many small nits
> here to pass. First of all please add a From: line at the beginning of
> the mail which matches the signoff (or use git-send-email, it'll get it
> right).

Ah, one more thing, this is the correct fixes tag, right?

Fixes: 018423e90bee ("net: ethernet: aquantia: Add ring support code")

Please add it right before the signoff line.
Ramsay, Lincoln Nov. 22, 2020, 9:55 p.m. UTC | #5
> Align continuations of the lines under '(' like:

Oh... I didn't run the patch checker over this revised patch. In this case, I am only changing the leading indent. Am I still expected to satisfy the patch checker?

The current patch is very clear about what is happening if you do a diff -w but if I start changing other things to satisfy the checker, that goes away.

I can do that... just double checking that it's wanted...

Lincoln
Ramsay, Lincoln Nov. 22, 2020, 10:36 p.m. UTC | #6
> (Next time please include in the subject the tree that you're targetting
> the patch)

I guess you mean like [PATCH master v5] ? Should I be targeting something other than the master branch on the main git repo? (https://github.com/torvalds/linux.git)

> please add a From: line at the beginning of the mail which matches
> the signoff (or use git-send-email, it'll get it right).

Sure.

> Ah, one more thing, this is the correct fixes tag, right?
> Fixes: 018423e90bee ("net: ethernet: aquantia: Add ring support code")
> Please add it right before the signoff line.

I didn't quite understand this header... but yeah, I guess that's the commit that adds the fast path I am removing.

> > Align continuations of the lines under '(' like:
> 
> I am only changing the leading indent. Am I still expected to satisfy the patch checker?
> 
> The current patch is very clear about what is happening if you do a diff -w but if I start
> changing other things to satisfy the checker, that goes away.

Some of the patch checker complaints are only leading whitespace (obviously not a problem for diff -w), but 2 of them involve actual changes (changing , to ; and moving the first argument from the line below to the line above).

Lincoln
Jakub Kicinski Nov. 23, 2020, 4:42 p.m. UTC | #7
On Sun, 22 Nov 2020 22:36:22 +0000 Ramsay, Lincoln wrote:
> > (Next time please include in the subject the tree that you're targetting
> > the patch)  
> 
> I guess you mean like [PATCH master v5] ? Should I be targeting
> something other than the master branch on the main git repo?
> (https://github.com/torvalds/linux.git)

In this case the patch will be merged into the networking tree, and
then travel downstream to Linus. So you want to target this tree:

https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/

IOW [PATCH net v5].

> > please add a From: line at the beginning of the mail which matches
> > the signoff (or use git-send-email, it'll get it right).  
> 
> Sure.
> 
> > Ah, one more thing, this is the correct fixes tag, right?
> > Fixes: 018423e90bee ("net: ethernet: aquantia: Add ring support code")
> > Please add it right before the signoff line.  
> 
> I didn't quite understand this header... but yeah, I guess that's the
> commit that adds the fast path I am removing.

Yup, it points to the oldest revision of the code where the bug is
present. In your case oldest revision where:

    When performing IPv6 forwarding, there is an expectation that SKBs
    will have some headroom. When forwarding a packet from the aquantia
    driver, this does not always happen, triggering a kernel warning.

> > > Align continuations of the lines under '(' like:  
> > 
> > I am only changing the leading indent. Am I still expected to satisfy the patch checker?
> > 
> > The current patch is very clear about what is happening if you do a diff -w but if I start
> > changing other things to satisfy the checker, that goes away.  
> 
> Some of the patch checker complaints are only leading whitespace
> (obviously not a problem for diff -w), but 2 of them involve actual
> changes (changing , to ; and moving the first argument from the line
> below to the line above).

I don't think it'll make a huge difference for the review-ability of
this change to heed checkpatch's warnings here.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
index 4f913658eea4..425e8e5afec7 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
@@ -413,85 +413,64 @@  int aq_ring_rx_clean(struct aq_ring_s *self,
 					      buff->rxdata.pg_off,
 					      buff->len, DMA_FROM_DEVICE);
 
-		/* for single fragment packets use build_skb() */
-		if (buff->is_eop &&
-		    buff->len <= AQ_CFG_RX_FRAME_MAX - AQ_SKB_ALIGN) {
-			skb = build_skb(aq_buf_vaddr(&buff->rxdata),
+		skb = napi_alloc_skb(napi, AQ_CFG_RX_HDR_SIZE);
+		if (unlikely(!skb)) {
+			u64_stats_update_begin(&self->stats.rx.syncp);
+			self->stats.rx.skb_alloc_fails++;
+			u64_stats_update_end(&self->stats.rx.syncp);
+			err = -ENOMEM;
+			goto err_exit;
+		}
+		if (is_ptp_ring)
+			buff->len -=
+				aq_ptp_extract_ts(self->aq_nic, skb,
+					aq_buf_vaddr(&buff->rxdata),
+					buff->len);
+
+		hdr_len = buff->len;
+		if (hdr_len > AQ_CFG_RX_HDR_SIZE)
+			hdr_len = eth_get_headlen(skb->dev,
+							aq_buf_vaddr(&buff->rxdata),
+							AQ_CFG_RX_HDR_SIZE);
+
+		memcpy(__skb_put(skb, hdr_len), aq_buf_vaddr(&buff->rxdata),
+			ALIGN(hdr_len, sizeof(long)));
+
+		if (buff->len - hdr_len > 0) {
+			skb_add_rx_frag(skb, 0, buff->rxdata.page,
+					buff->rxdata.pg_off + hdr_len,
+					buff->len - hdr_len,
 					AQ_CFG_RX_FRAME_MAX);
-			if (unlikely(!skb)) {
-				u64_stats_update_begin(&self->stats.rx.syncp);
-				self->stats.rx.skb_alloc_fails++;
-				u64_stats_update_end(&self->stats.rx.syncp);
-				err = -ENOMEM;
-				goto err_exit;
-			}
-			if (is_ptp_ring)
-				buff->len -=
-					aq_ptp_extract_ts(self->aq_nic, skb,
-						aq_buf_vaddr(&buff->rxdata),
-						buff->len);
-			skb_put(skb, buff->len);
 			page_ref_inc(buff->rxdata.page);
-		} else {
-			skb = napi_alloc_skb(napi, AQ_CFG_RX_HDR_SIZE);
-			if (unlikely(!skb)) {
-				u64_stats_update_begin(&self->stats.rx.syncp);
-				self->stats.rx.skb_alloc_fails++;
-				u64_stats_update_end(&self->stats.rx.syncp);
-				err = -ENOMEM;
-				goto err_exit;
-			}
-			if (is_ptp_ring)
-				buff->len -=
-					aq_ptp_extract_ts(self->aq_nic, skb,
-						aq_buf_vaddr(&buff->rxdata),
-						buff->len);
-
-			hdr_len = buff->len;
-			if (hdr_len > AQ_CFG_RX_HDR_SIZE)
-				hdr_len = eth_get_headlen(skb->dev,
-							  aq_buf_vaddr(&buff->rxdata),
-							  AQ_CFG_RX_HDR_SIZE);
-
-			memcpy(__skb_put(skb, hdr_len), aq_buf_vaddr(&buff->rxdata),
-			       ALIGN(hdr_len, sizeof(long)));
-
-			if (buff->len - hdr_len > 0) {
-				skb_add_rx_frag(skb, 0, buff->rxdata.page,
-						buff->rxdata.pg_off + hdr_len,
-						buff->len - hdr_len,
-						AQ_CFG_RX_FRAME_MAX);
-				page_ref_inc(buff->rxdata.page);
-			}
+		}
 
-			if (!buff->is_eop) {
-				buff_ = buff;
-				i = 1U;
-				do {
-					next_ = buff_->next,
-					buff_ = &self->buff_ring[next_];
+		if (!buff->is_eop) {
+			buff_ = buff;
+			i = 1U;
+			do {
+				next_ = buff_->next,
+				buff_ = &self->buff_ring[next_];
 
-					dma_sync_single_range_for_cpu(
-							aq_nic_get_dev(self->aq_nic),
-							buff_->rxdata.daddr,
-							buff_->rxdata.pg_off,
-							buff_->len,
-							DMA_FROM_DEVICE);
-					skb_add_rx_frag(skb, i++,
-							buff_->rxdata.page,
-							buff_->rxdata.pg_off,
-							buff_->len,
-							AQ_CFG_RX_FRAME_MAX);
-					page_ref_inc(buff_->rxdata.page);
-					buff_->is_cleaned = 1;
-
-					buff->is_ip_cso &= buff_->is_ip_cso;
-					buff->is_udp_cso &= buff_->is_udp_cso;
-					buff->is_tcp_cso &= buff_->is_tcp_cso;
-					buff->is_cso_err |= buff_->is_cso_err;
+				dma_sync_single_range_for_cpu(
+						aq_nic_get_dev(self->aq_nic),
+						buff_->rxdata.daddr,
+						buff_->rxdata.pg_off,
+						buff_->len,
+						DMA_FROM_DEVICE);
+				skb_add_rx_frag(skb, i++,
+						buff_->rxdata.page,
+						buff_->rxdata.pg_off,
+						buff_->len,
+						AQ_CFG_RX_FRAME_MAX);
+				page_ref_inc(buff_->rxdata.page);
+				buff_->is_cleaned = 1;
 
-				} while (!buff_->is_eop);
-			}
+				buff->is_ip_cso &= buff_->is_ip_cso;
+				buff->is_udp_cso &= buff_->is_udp_cso;
+				buff->is_tcp_cso &= buff_->is_tcp_cso;
+				buff->is_cso_err |= buff_->is_cso_err;
+
+			} while (!buff_->is_eop);
 		}
 
 		if (buff->is_vlan)