diff mbox

[net-next,08/12] r8152: support TSO

Message ID 1393934464-23675-9-git-send-email-hayeswang@realtek.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Hayes Wang March 4, 2014, 12:01 p.m. UTC
Support scatter gather and TSO.

Adjust the tx checksum function and set the max gso size to fix the
size of the tx aggregation buffer.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
 drivers/net/usb/r8152.c | 133 +++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 103 insertions(+), 30 deletions(-)

Comments

David Laight March 4, 2014, 12:11 p.m. UTC | #1
From: Hayes Wang
> Support scatter gather and TSO.
> 
> Adjust the tx checksum function and set the max gso size to fix the
> size of the tx aggregation buffer.

There is little point supporting TSO unless the usb host controller
supports arbitrary aligned scatter-gather.
All you do is require that a large skb be allocated (that may well
fail), and add it another data copy.

The xhci controller is almost capable of arbitrary scatter-gather,
but it is currently disabled because the data must be aligned at
the end of the transfer ring (the hardware designers have made it
almost impossible to write efficient software).

Note that the various xhci controllers behave subtly differently.

	David



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hayes Wang March 4, 2014, 1:11 p.m. UTC | #2
David Laight [mailto:David.Laight@ACULAB.COM] 
> Sent: Tuesday, March 04, 2014 8:12 PM
> To: 'Hayes Wang'; netdev@vger.kernel.org
> Cc: nic_swsd@realtek.com; linux-kernel@vger.kernel.org; 
> linux-usb@vger.kernel.org
> Subject: RE: [PATCH net-next 08/12] r8152: support TSO
> 
> From: Hayes Wang
> > Support scatter gather and TSO.
> > 
> > Adjust the tx checksum function and set the max gso size to fix the
> > size of the tx aggregation buffer.
> 
> There is little point supporting TSO unless the usb host controller
> supports arbitrary aligned scatter-gather.
> All you do is require that a large skb be allocated (that may well
> fail), and add it another data copy.

I think I have done it. For also working for EHCI usb host controller,
I allocate 16 KB continuous buffer and copy the fragmented packets to
it and bulk out the buffer.
 
Best Regards,
Hayes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Laight March 4, 2014, 2:35 p.m. UTC | #3
From: hayeswang 
>  David Laight [mailto:David.Laight@ACULAB.COM]
> > Sent: Tuesday, March 04, 2014 8:12 PM
> > To: 'Hayes Wang'; netdev@vger.kernel.org
> > Cc: nic_swsd@realtek.com; linux-kernel@vger.kernel.org;
> > linux-usb@vger.kernel.org
> > Subject: RE: [PATCH net-next 08/12] r8152: support TSO
> >
> > From: Hayes Wang
> > > Support scatter gather and TSO.
> > >
> > > Adjust the tx checksum function and set the max gso size to fix the
> > > size of the tx aggregation buffer.
> >
> > There is little point supporting TSO unless the usb host controller
> > supports arbitrary aligned scatter-gather.
> > All you do is require that a large skb be allocated (that may well
> > fail), and add it another data copy.
> 
> I think I have done it. For also working for EHCI usb host controller,
> I allocate 16 KB continuous buffer and copy the fragmented packets to
> it and bulk out the buffer.

Does that mean you are splitting the 64k 'ethernet packet' from TCP
is software? I've looked at the ax88179 where the hardware can do it.

Is there really a gain doing segmentation here if you have to do the
extra data copy?

It might be worth packing multiple short packets into a single USB bulk
data packet, but that might require a CBU (crystal ball unit).

I did measure a maximum transmit ethernet frame rate of (IIRC) 250000
frames/sec for the ax88179 - probably limited by the USB3 frame rate.
Exceeding that would probably require putting multiple tx packets into
a single URB. OTOH that limit probably doesn't matter

What might be more useful is to set the rx side up to receive into
page-aligned 2k (or 4k) buffers and then separate out the ethernet
frames into the required skb - probably as page list fragments
(I'm not entirely sure how such skb can be created).

I don't know what the r8152 does, but the usbnet code encourages it to
allocate long skb, pass them to the usb stack to fill, and then clone
them if they contain multiple frames.

This isn't really good use of memory or cpu cycles.
The ax88179 driver also lies badly about the skb's truesize.

	David



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet March 4, 2014, 3:02 p.m. UTC | #4
On Tue, 2014-03-04 at 14:35 +0000, David Laight wrote:

> Does that mean you are splitting the 64k 'ethernet packet' from TCP
> is software? I've looked at the ax88179 where the hardware can do it.
> 
> Is there really a gain doing segmentation here if you have to do the
> extra data copy?

There is no 'extra' copy.

There is _already_ a copy, so what's the deal of doing this copy in a SG
enabled way ?



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet March 4, 2014, 4:11 p.m. UTC | #5
On Tue, 2014-03-04 at 20:01 +0800, Hayes Wang wrote:

> +static u32 r8152_xmit_frags(struct r8152 *tp, struct sk_buff *skb, u8 *data)
> +{
> +	struct skb_shared_info *info = skb_shinfo(skb);
> +	unsigned int cur_frag;
> +	u32 total = skb_headlen(skb);
> +
> +	memcpy(data, skb->data, total);
> +	data += total;
> +
> +	for (cur_frag = 0; cur_frag < info->nr_frags; cur_frag++) {
> +		const skb_frag_t *frag = info->frags + cur_frag;
> +		void *addr;
> +		u32 len;
> +
> +		len = skb_frag_size(frag);
> +		addr = skb_frag_address(frag);
> +		memcpy(data, addr, len);
> +		data += len;
> +		total += len;
>  	}
> +
> +	return total;
>  }
>  

I would rather use skb_copy_bits(), because it correctly handles
kmap() case. (If a frag resides in high memory)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet March 4, 2014, 4:52 p.m. UTC | #6
On Tue, 2014-03-04 at 20:01 +0800, Hayes Wang wrote:
> Support scatter gather and TSO.

>  
> -	netdev->features |= NETIF_F_RXCSUM | NETIF_F_IP_CSUM;
> -	netdev->hw_features = NETIF_F_RXCSUM | NETIF_F_IP_CSUM;
> +	netdev->features |= NETIF_F_RXCSUM | NETIF_F_IP_CSUM | NETIF_F_SG |
> +			    NETIF_F_TSO;
> +	netdev->hw_features = NETIF_F_RXCSUM | NETIF_F_IP_CSUM | NETIF_F_SG |
> +			      NETIF_F_TSO;
>  

Minor nit :

If you use skb_copy_bits(), then you also can add NETIF_F_FRAGLIST here.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 5fdf0af..774a19a 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -23,7 +23,7 @@ 
 #include <linux/ipv6.h>
 
 /* Version Information */
-#define DRIVER_VERSION "v1.05.0 (2014/02/18)"
+#define DRIVER_VERSION "v1.06.0 (2014/03/03)"
 #define DRIVER_AUTHOR "Realtek linux nic maintainers <nic_swsd@realtek.com>"
 #define DRIVER_DESC "Realtek RTL8152/RTL8153 Based USB Ethernet Adapters"
 #define MODULENAME "r8152"
@@ -490,13 +490,18 @@  struct tx_desc {
 	__le32 opts1;
 #define TX_FS			(1 << 31) /* First segment of a packet */
 #define TX_LS			(1 << 30) /* Final segment of a packet */
-#define TX_LEN_MASK		0x3ffff
+#define GTSENDV4		(1 << 28)
+#define GTTCPHO_SHIFT		18
+#define TX_LEN_MAX		0x3ffffU
 
 	__le32 opts2;
 #define UDP_CS			(1 << 31) /* Calculate UDP/IP checksum */
 #define TCP_CS			(1 << 30) /* Calculate TCP/IP checksum */
 #define IPV4_CS			(1 << 29) /* Calculate IPv4 checksum */
 #define IPV6_CS			(1 << 28) /* Calculate IPv6 checksum */
+#define MSS_SHIFT		17
+#define MSS_MAX			0x7ffU
+#define TCPHO_SHIFT		17
 };
 
 struct r8152;
@@ -563,12 +568,21 @@  enum rtl_version {
 	RTL_VER_MAX
 };
 
+enum tx_csum_stat {
+	TX_CSUM_SUCCESS = 0,
+	TX_CSUM_TSO,
+	TX_CSUM_NONE
+};
+
 /* Maximum number of multicast addresses to filter (vs. Rx-all-multicast).
  * The RTL chips use a 64 element hash table based on the Ethernet CRC.
  */
 static const int multicast_filter_limit = 32;
 static unsigned int rx_buf_sz = 16384;
 
+#define RTL_LIMITED_TSO_SIZE	(rx_buf_sz - sizeof(struct tx_desc) - \
+				 VLAN_ETH_HLEN - VLAN_HLEN)
+
 static
 int get_registers(struct r8152 *tp, u16 value, u16 index, u16 size, void *data)
 {
@@ -1295,24 +1309,46 @@  static struct tx_agg *r8152_get_tx_agg(struct r8152 *tp)
 	return agg;
 }
 
-static void
-r8152_tx_csum(struct r8152 *tp, struct tx_desc *desc, struct sk_buff *skb)
+static inline __be16 get_protocol(struct sk_buff *skb)
 {
-	memset(desc, 0, sizeof(*desc));
+	__be16 protocol;
 
-	desc->opts1 = cpu_to_le32((skb->len & TX_LEN_MASK) | TX_FS | TX_LS);
+	if (skb->protocol == htons(ETH_P_8021Q))
+		protocol = vlan_eth_hdr(skb)->h_vlan_encapsulated_proto;
+	else
+		protocol = skb->protocol;
 
-	if (skb->ip_summed == CHECKSUM_PARTIAL) {
-		__be16 protocol;
-		u8 ip_protocol;
-		u32 opts2 = 0;
+	return protocol;
+}
 
-		if (skb->protocol == htons(ETH_P_8021Q))
-			protocol = vlan_eth_hdr(skb)->h_vlan_encapsulated_proto;
-		else
-			protocol = skb->protocol;
+static int r8152_tx_csum(struct r8152 *tp, struct tx_desc *desc,
+			 struct sk_buff *skb, u32 len, u32 transport_offset)
+{
+	u32 mss = skb_shinfo(skb)->gso_size;
+	u32 opts1, opts2 = 0;
+	int ret = TX_CSUM_SUCCESS;
+
+	WARN_ON_ONCE(len > TX_LEN_MAX);
+
+	opts1 = len | TX_FS | TX_LS;
+
+	if (mss) {
+		switch (get_protocol(skb)) {
+		case htons(ETH_P_IP):
+			opts1 |= GTSENDV4;
+			break;
 
-		switch (protocol) {
+		default:
+			WARN_ON_ONCE(1);
+			break;
+		}
+
+		opts1 |= transport_offset << GTTCPHO_SHIFT;
+		opts2 |= min(mss, MSS_MAX) << MSS_SHIFT;
+	} else if (skb->ip_summed == CHECKSUM_PARTIAL) {
+		u8 ip_protocol;
+
+		switch (get_protocol(skb)) {
 		case htons(ETH_P_IP):
 			opts2 |= IPV4_CS;
 			ip_protocol = ip_hdr(skb)->protocol;
@@ -1328,17 +1364,44 @@  r8152_tx_csum(struct r8152 *tp, struct tx_desc *desc, struct sk_buff *skb)
 			break;
 		}
 
-		if (ip_protocol == IPPROTO_TCP) {
+		if (ip_protocol == IPPROTO_TCP)
 			opts2 |= TCP_CS;
-			opts2 |= (skb_transport_offset(skb) & 0x7fff) << 17;
-		} else if (ip_protocol == IPPROTO_UDP) {
+		else if (ip_protocol == IPPROTO_UDP)
 			opts2 |= UDP_CS;
-		} else {
+		else
 			WARN_ON_ONCE(1);
-		}
 
-		desc->opts2 = cpu_to_le32(opts2);
+		opts2 |= transport_offset << TCPHO_SHIFT;
+	}
+
+	desc->opts2 = cpu_to_le32(opts2);
+	desc->opts1 = cpu_to_le32(opts1);
+
+	return ret;
+}
+
+static u32 r8152_xmit_frags(struct r8152 *tp, struct sk_buff *skb, u8 *data)
+{
+	struct skb_shared_info *info = skb_shinfo(skb);
+	unsigned int cur_frag;
+	u32 total = skb_headlen(skb);
+
+	memcpy(data, skb->data, total);
+	data += total;
+
+	for (cur_frag = 0; cur_frag < info->nr_frags; cur_frag++) {
+		const skb_frag_t *frag = info->frags + cur_frag;
+		void *addr;
+		u32 len;
+
+		len = skb_frag_size(frag);
+		addr = skb_frag_address(frag);
+		memcpy(data, addr, len);
+		data += len;
+		total += len;
 	}
+
+	return total;
 }
 
 static int r8152_tx_agg_fill(struct r8152 *tp, struct tx_agg *agg)
@@ -1360,29 +1423,36 @@  static int r8152_tx_agg_fill(struct r8152 *tp, struct tx_agg *agg)
 		struct tx_desc *tx_desc;
 		struct sk_buff *skb;
 		unsigned int len;
+		u32 offset;
 
 		skb = __skb_dequeue(&skb_head);
 		if (!skb)
 			break;
 
-		remain -= sizeof(*tx_desc);
-		len = skb->len;
-		if (remain < len) {
+		len = skb->len + sizeof(*tx_desc);
+
+		if (len > remain) {
 			__skb_queue_head(&skb_head, skb);
 			break;
 		}
 
 		tx_data = tx_agg_align(tx_data);
 		tx_desc = (struct tx_desc *)tx_data;
+
+		offset = (u32)skb_transport_offset(skb);
+
+		r8152_tx_csum(tp, tx_desc, skb, skb->len, offset);
+
 		tx_data += sizeof(*tx_desc);
 
-		r8152_tx_csum(tp, tx_desc, skb);
-		memcpy(tx_data, skb->data, len);
-		agg->skb_num++;
+		len = r8152_xmit_frags(tp, skb, tx_data);
+
+		tx_data += len;
 		agg->skb_len += len;
+		agg->skb_num++;
+
 		dev_kfree_skb_any(skb);
 
-		tx_data += len;
 		remain = rx_buf_sz - (int)(tx_agg_align(tx_data) - agg->head);
 	}
 
@@ -3149,10 +3219,13 @@  static int rtl8152_probe(struct usb_interface *intf,
 	netdev->netdev_ops = &rtl8152_netdev_ops;
 	netdev->watchdog_timeo = RTL8152_TX_TIMEOUT;
 
-	netdev->features |= NETIF_F_RXCSUM | NETIF_F_IP_CSUM;
-	netdev->hw_features = NETIF_F_RXCSUM | NETIF_F_IP_CSUM;
+	netdev->features |= NETIF_F_RXCSUM | NETIF_F_IP_CSUM | NETIF_F_SG |
+			    NETIF_F_TSO;
+	netdev->hw_features = NETIF_F_RXCSUM | NETIF_F_IP_CSUM | NETIF_F_SG |
+			      NETIF_F_TSO;
 
 	SET_ETHTOOL_OPS(netdev, &ops);
+	netif_set_gso_max_size(netdev, RTL_LIMITED_TSO_SIZE);
 
 	tp->mii.dev = netdev;
 	tp->mii.mdio_read = read_mii_word;