diff mbox

[PATCHv2,net] sunvnet: set queue mapping when doing packet copies

Message ID 54CA8D33.6020006@oracle.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

David L Stevens Jan. 29, 2015, 7:42 p.m. UTC
This patch fixes a bug where vnet_skb_shape() didn't set the already-selected
queue mapping when a packet copy was required. This results in using the
wrong queue index for stops/starts, hung tx queues and watchdog timeouts
under heavy load. It also transfers the destructor, truesize, and the owning
socket for flow control (code adapted from skb_segment).

Signed-off-by: David L Stevens <david.stevens@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>

--
Changes since v1:
 move truesize, destructor and socket per Eric Dumazet <eric.dumazet@gmail.com>

---
 drivers/net/ethernet/sun/sunvnet.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

Comments

Eric Dumazet Jan. 29, 2015, 8:39 p.m. UTC | #1
On Thu, 2015-01-29 at 14:42 -0500, David L Stevens wrote:
> This patch fixes a bug where vnet_skb_shape() didn't set the already-selected
> queue mapping when a packet copy was required. This results in using the
> wrong queue index for stops/starts, hung tx queues and watchdog timeouts
> under heavy load. It also transfers the destructor, truesize, and the owning
> socket for flow control (code adapted from skb_segment).
> 
> Signed-off-by: David L Stevens <david.stevens@oracle.com>
> Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
> 
> --
> Changes since v1:
>  move truesize, destructor and socket per Eric Dumazet <eric.dumazet@gmail.com>
> 
> ---
>  drivers/net/ethernet/sun/sunvnet.c |   10 ++++++++++
>  1 files changed, 10 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/net/ethernet/sun/sunvnet.c b/drivers/net/ethernet/sun/sunvnet.c
> index 2b719cc..0a1d3eb 100644
> --- a/drivers/net/ethernet/sun/sunvnet.c
> +++ b/drivers/net/ethernet/sun/sunvnet.c
> @@ -1123,6 +1123,16 @@ static inline struct sk_buff *vnet_skb_shape(struct sk_buff *skb, int ncookies)
>  			skb_shinfo(nskb)->gso_size = skb_shinfo(skb)->gso_size;
>  			skb_shinfo(nskb)->gso_type = skb_shinfo(skb)->gso_type;
>  		}
> +		nskb->queue_mapping = skb->queue_mapping;
> +		/* Following permits correct back-pressure, for protocols
> +		 * using skb_set_owner_w().
> +		 * Idea is to transfer ownership from skb to nskb.
> +		 */
> +		if (skb->destructor == sock_wfree) {

Sorry, but you should remove this test.

(TCP uses another destructor, not sock_wfree())

All sent packets will support these swap() operations,
regardless of destructor.


> +			swap(nskb->truesize, skb->truesize);
> +			swap(nskb->destructor, skb->destructor);
> +			swap(nskb->sk, skb->sk);
> +		}
>  		dev_kfree_skb(skb);
>  		skb = nskb;
>  	}


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David L Stevens Jan. 29, 2015, 9:36 p.m. UTC | #2
On 01/29/2015 03:39 PM, Eric Dumazet wrote:

> Sorry, but you should remove this test.
> 
> (TCP uses another destructor, not sock_wfree())
> 
> All sent packets will support these swap() operations,
> regardless of destructor.

In general the destructor should match the allocator, right? So it bothers me
here that we'd be replacing it in an skb allocated with alloc_and_align_skb(),
with an an arbitrary destructor from an skb NOT allocated with alloc_and_align_skb().
If it has a different destructor that does special handling related to the allocator,
it is the original skb, not the new one, that needs the old destructor. This TCP
accounting has less to do with the buffer destructor than with the freeing of the
contents of the buffer, but that isn't necessarily true for all destructors.

Checking for a known, specific destructor is less troubling, so I don't want
to remove the test entirely.

Since the concern here is specifically TCP flow control, do you think it's sufficient
to substitute tcp_wfree for the sock_wfree here?

							+-DLS


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Jan. 29, 2015, 10:15 p.m. UTC | #3
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 29 Jan 2015 12:39:56 -0800

> On Thu, 2015-01-29 at 14:42 -0500, David L Stevens wrote:
>> @@ -1123,6 +1123,16 @@ static inline struct sk_buff *vnet_skb_shape(struct sk_buff *skb, int ncookies)
>>  			skb_shinfo(nskb)->gso_size = skb_shinfo(skb)->gso_size;
>>  			skb_shinfo(nskb)->gso_type = skb_shinfo(skb)->gso_type;
>>  		}
>> +		nskb->queue_mapping = skb->queue_mapping;
>> +		/* Following permits correct back-pressure, for protocols
>> +		 * using skb_set_owner_w().
>> +		 * Idea is to transfer ownership from skb to nskb.
>> +		 */
>> +		if (skb->destructor == sock_wfree) {
> 
> Sorry, but you should remove this test.
> 
> (TCP uses another destructor, not sock_wfree())
> 
> All sent packets will support these swap() operations,
> regardless of destructor.

Then we need to fix skb_segment() too.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Jan. 29, 2015, 10:19 p.m. UTC | #4
On Thu, 2015-01-29 at 16:36 -0500, David L Stevens wrote:

> In general the destructor should match the allocator, right? So it bothers me
> here that we'd be replacing it in an skb allocated with alloc_and_align_skb(),
> with an an arbitrary destructor from an skb NOT allocated with alloc_and_align_skb().
> If it has a different destructor that does special handling related to the allocator,
> it is the original skb, not the new one, that needs the old destructor. This TCP
> accounting has less to do with the buffer destructor than with the freeing of the
> contents of the buffer, but that isn't necessarily true for all destructors.
> 
> Checking for a known, specific destructor is less troubling, so I don't want
> to remove the test entirely.
> 
> Since the concern here is specifically TCP flow control, do you think it's sufficient
> to substitute tcp_wfree for the sock_wfree here?

The concern is also for UDP.

Right now a single UDP flow can flood your network, even if application
or admin cared to set a low SO_SNDBUF.

I guess you can extend the test to sock_wfree and tcp_wfree, but :

You have to EXPORT_SYMBOL(tcp_wfree)

Add an #ifdef CONFIG_INET   (take a look at skb_orphan_partial())

Thanks







--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Jan. 29, 2015, 10:22 p.m. UTC | #5
On Thu, 2015-01-29 at 14:19 -0800, Eric Dumazet wrote:

> I guess you can extend the test to sock_wfree and tcp_wfree, but :
> 
> You have to EXPORT_SYMBOL(tcp_wfree)
> 
> Add an #ifdef CONFIG_INET   (take a look at skb_orphan_partial())

Or even better, add a new helper in the same spirit than
skb_orphan_partial() : You would EXPORT_SYMBOL() it and leave
tcp_wfree() as non exported.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Jan. 29, 2015, 10:30 p.m. UTC | #6
On Thu, 2015-01-29 at 14:15 -0800, David Miller wrote:

> Then we need to fix skb_segment() too.

My concern was that skb_segment() could be used in rx paths, with quite
different destructor semantics.

While from ndo_start_xmit(), I believe sane destructors should not
depend on skb being invariant, sk 'refcount' must be using
skb->truesize.





--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ethernet/sun/sunvnet.c b/drivers/net/ethernet/sun/sunvnet.c
index 2b719cc..0a1d3eb 100644
--- a/drivers/net/ethernet/sun/sunvnet.c
+++ b/drivers/net/ethernet/sun/sunvnet.c
@@ -1123,6 +1123,16 @@  static inline struct sk_buff *vnet_skb_shape(struct sk_buff *skb, int ncookies)
 			skb_shinfo(nskb)->gso_size = skb_shinfo(skb)->gso_size;
 			skb_shinfo(nskb)->gso_type = skb_shinfo(skb)->gso_type;
 		}
+		nskb->queue_mapping = skb->queue_mapping;
+		/* Following permits correct back-pressure, for protocols
+		 * using skb_set_owner_w().
+		 * Idea is to transfer ownership from skb to nskb.
+		 */
+		if (skb->destructor == sock_wfree) {
+			swap(nskb->truesize, skb->truesize);
+			swap(nskb->destructor, skb->destructor);
+			swap(nskb->sk, skb->sk);
+		}
 		dev_kfree_skb(skb);
 		skb = nskb;
 	}