Message ID | 1393934464-23675-9-git-send-email-hayeswang@realtek.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
From: Hayes Wang > Support scatter gather and TSO. > > Adjust the tx checksum function and set the max gso size to fix the > size of the tx aggregation buffer. There is little point supporting TSO unless the usb host controller supports arbitrary aligned scatter-gather. All you do is require that a large skb be allocated (that may well fail), and add it another data copy. The xhci controller is almost capable of arbitrary scatter-gather, but it is currently disabled because the data must be aligned at the end of the transfer ring (the hardware designers have made it almost impossible to write efficient software). Note that the various xhci controllers behave subtly differently. David -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Laight [mailto:David.Laight@ACULAB.COM] > Sent: Tuesday, March 04, 2014 8:12 PM > To: 'Hayes Wang'; netdev@vger.kernel.org > Cc: nic_swsd@realtek.com; linux-kernel@vger.kernel.org; > linux-usb@vger.kernel.org > Subject: RE: [PATCH net-next 08/12] r8152: support TSO > > From: Hayes Wang > > Support scatter gather and TSO. > > > > Adjust the tx checksum function and set the max gso size to fix the > > size of the tx aggregation buffer. > > There is little point supporting TSO unless the usb host controller > supports arbitrary aligned scatter-gather. > All you do is require that a large skb be allocated (that may well > fail), and add it another data copy. I think I have done it. For also working for EHCI usb host controller, I allocate 16 KB continuous buffer and copy the fragmented packets to it and bulk out the buffer. Best Regards, Hayes -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: hayeswang > David Laight [mailto:David.Laight@ACULAB.COM] > > Sent: Tuesday, March 04, 2014 8:12 PM > > To: 'Hayes Wang'; netdev@vger.kernel.org > > Cc: nic_swsd@realtek.com; linux-kernel@vger.kernel.org; > > linux-usb@vger.kernel.org > > Subject: RE: [PATCH net-next 08/12] r8152: support TSO > > > > From: Hayes Wang > > > Support scatter gather and TSO. > > > > > > Adjust the tx checksum function and set the max gso size to fix the > > > size of the tx aggregation buffer. > > > > There is little point supporting TSO unless the usb host controller > > supports arbitrary aligned scatter-gather. > > All you do is require that a large skb be allocated (that may well > > fail), and add it another data copy. > > I think I have done it. For also working for EHCI usb host controller, > I allocate 16 KB continuous buffer and copy the fragmented packets to > it and bulk out the buffer. Does that mean you are splitting the 64k 'ethernet packet' from TCP is software? I've looked at the ax88179 where the hardware can do it. Is there really a gain doing segmentation here if you have to do the extra data copy? It might be worth packing multiple short packets into a single USB bulk data packet, but that might require a CBU (crystal ball unit). I did measure a maximum transmit ethernet frame rate of (IIRC) 250000 frames/sec for the ax88179 - probably limited by the USB3 frame rate. Exceeding that would probably require putting multiple tx packets into a single URB. OTOH that limit probably doesn't matter What might be more useful is to set the rx side up to receive into page-aligned 2k (or 4k) buffers and then separate out the ethernet frames into the required skb - probably as page list fragments (I'm not entirely sure how such skb can be created). I don't know what the r8152 does, but the usbnet code encourages it to allocate long skb, pass them to the usb stack to fill, and then clone them if they contain multiple frames. This isn't really good use of memory or cpu cycles. The ax88179 driver also lies badly about the skb's truesize. David -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2014-03-04 at 14:35 +0000, David Laight wrote: > Does that mean you are splitting the 64k 'ethernet packet' from TCP > is software? I've looked at the ax88179 where the hardware can do it. > > Is there really a gain doing segmentation here if you have to do the > extra data copy? There is no 'extra' copy. There is _already_ a copy, so what's the deal of doing this copy in a SG enabled way ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2014-03-04 at 20:01 +0800, Hayes Wang wrote: > +static u32 r8152_xmit_frags(struct r8152 *tp, struct sk_buff *skb, u8 *data) > +{ > + struct skb_shared_info *info = skb_shinfo(skb); > + unsigned int cur_frag; > + u32 total = skb_headlen(skb); > + > + memcpy(data, skb->data, total); > + data += total; > + > + for (cur_frag = 0; cur_frag < info->nr_frags; cur_frag++) { > + const skb_frag_t *frag = info->frags + cur_frag; > + void *addr; > + u32 len; > + > + len = skb_frag_size(frag); > + addr = skb_frag_address(frag); > + memcpy(data, addr, len); > + data += len; > + total += len; > } > + > + return total; > } > I would rather use skb_copy_bits(), because it correctly handles kmap() case. (If a frag resides in high memory) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2014-03-04 at 20:01 +0800, Hayes Wang wrote: > Support scatter gather and TSO. > > - netdev->features |= NETIF_F_RXCSUM | NETIF_F_IP_CSUM; > - netdev->hw_features = NETIF_F_RXCSUM | NETIF_F_IP_CSUM; > + netdev->features |= NETIF_F_RXCSUM | NETIF_F_IP_CSUM | NETIF_F_SG | > + NETIF_F_TSO; > + netdev->hw_features = NETIF_F_RXCSUM | NETIF_F_IP_CSUM | NETIF_F_SG | > + NETIF_F_TSO; > Minor nit : If you use skb_copy_bits(), then you also can add NETIF_F_FRAGLIST here. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c index 5fdf0af..774a19a 100644 --- a/drivers/net/usb/r8152.c +++ b/drivers/net/usb/r8152.c @@ -23,7 +23,7 @@ #include <linux/ipv6.h> /* Version Information */ -#define DRIVER_VERSION "v1.05.0 (2014/02/18)" +#define DRIVER_VERSION "v1.06.0 (2014/03/03)" #define DRIVER_AUTHOR "Realtek linux nic maintainers <nic_swsd@realtek.com>" #define DRIVER_DESC "Realtek RTL8152/RTL8153 Based USB Ethernet Adapters" #define MODULENAME "r8152" @@ -490,13 +490,18 @@ struct tx_desc { __le32 opts1; #define TX_FS (1 << 31) /* First segment of a packet */ #define TX_LS (1 << 30) /* Final segment of a packet */ -#define TX_LEN_MASK 0x3ffff +#define GTSENDV4 (1 << 28) +#define GTTCPHO_SHIFT 18 +#define TX_LEN_MAX 0x3ffffU __le32 opts2; #define UDP_CS (1 << 31) /* Calculate UDP/IP checksum */ #define TCP_CS (1 << 30) /* Calculate TCP/IP checksum */ #define IPV4_CS (1 << 29) /* Calculate IPv4 checksum */ #define IPV6_CS (1 << 28) /* Calculate IPv6 checksum */ +#define MSS_SHIFT 17 +#define MSS_MAX 0x7ffU +#define TCPHO_SHIFT 17 }; struct r8152; @@ -563,12 +568,21 @@ enum rtl_version { RTL_VER_MAX }; +enum tx_csum_stat { + TX_CSUM_SUCCESS = 0, + TX_CSUM_TSO, + TX_CSUM_NONE +}; + /* Maximum number of multicast addresses to filter (vs. Rx-all-multicast). * The RTL chips use a 64 element hash table based on the Ethernet CRC. */ static const int multicast_filter_limit = 32; static unsigned int rx_buf_sz = 16384; +#define RTL_LIMITED_TSO_SIZE (rx_buf_sz - sizeof(struct tx_desc) - \ + VLAN_ETH_HLEN - VLAN_HLEN) + static int get_registers(struct r8152 *tp, u16 value, u16 index, u16 size, void *data) { @@ -1295,24 +1309,46 @@ static struct tx_agg *r8152_get_tx_agg(struct r8152 *tp) return agg; } -static void -r8152_tx_csum(struct r8152 *tp, struct tx_desc *desc, struct sk_buff *skb) +static inline __be16 get_protocol(struct sk_buff *skb) { - memset(desc, 0, sizeof(*desc)); + __be16 protocol; - desc->opts1 = cpu_to_le32((skb->len & TX_LEN_MASK) | TX_FS | TX_LS); + if (skb->protocol == htons(ETH_P_8021Q)) + protocol = vlan_eth_hdr(skb)->h_vlan_encapsulated_proto; + else + protocol = skb->protocol; - if (skb->ip_summed == CHECKSUM_PARTIAL) { - __be16 protocol; - u8 ip_protocol; - u32 opts2 = 0; + return protocol; +} - if (skb->protocol == htons(ETH_P_8021Q)) - protocol = vlan_eth_hdr(skb)->h_vlan_encapsulated_proto; - else - protocol = skb->protocol; +static int r8152_tx_csum(struct r8152 *tp, struct tx_desc *desc, + struct sk_buff *skb, u32 len, u32 transport_offset) +{ + u32 mss = skb_shinfo(skb)->gso_size; + u32 opts1, opts2 = 0; + int ret = TX_CSUM_SUCCESS; + + WARN_ON_ONCE(len > TX_LEN_MAX); + + opts1 = len | TX_FS | TX_LS; + + if (mss) { + switch (get_protocol(skb)) { + case htons(ETH_P_IP): + opts1 |= GTSENDV4; + break; - switch (protocol) { + default: + WARN_ON_ONCE(1); + break; + } + + opts1 |= transport_offset << GTTCPHO_SHIFT; + opts2 |= min(mss, MSS_MAX) << MSS_SHIFT; + } else if (skb->ip_summed == CHECKSUM_PARTIAL) { + u8 ip_protocol; + + switch (get_protocol(skb)) { case htons(ETH_P_IP): opts2 |= IPV4_CS; ip_protocol = ip_hdr(skb)->protocol; @@ -1328,17 +1364,44 @@ r8152_tx_csum(struct r8152 *tp, struct tx_desc *desc, struct sk_buff *skb) break; } - if (ip_protocol == IPPROTO_TCP) { + if (ip_protocol == IPPROTO_TCP) opts2 |= TCP_CS; - opts2 |= (skb_transport_offset(skb) & 0x7fff) << 17; - } else if (ip_protocol == IPPROTO_UDP) { + else if (ip_protocol == IPPROTO_UDP) opts2 |= UDP_CS; - } else { + else WARN_ON_ONCE(1); - } - desc->opts2 = cpu_to_le32(opts2); + opts2 |= transport_offset << TCPHO_SHIFT; + } + + desc->opts2 = cpu_to_le32(opts2); + desc->opts1 = cpu_to_le32(opts1); + + return ret; +} + +static u32 r8152_xmit_frags(struct r8152 *tp, struct sk_buff *skb, u8 *data) +{ + struct skb_shared_info *info = skb_shinfo(skb); + unsigned int cur_frag; + u32 total = skb_headlen(skb); + + memcpy(data, skb->data, total); + data += total; + + for (cur_frag = 0; cur_frag < info->nr_frags; cur_frag++) { + const skb_frag_t *frag = info->frags + cur_frag; + void *addr; + u32 len; + + len = skb_frag_size(frag); + addr = skb_frag_address(frag); + memcpy(data, addr, len); + data += len; + total += len; } + + return total; } static int r8152_tx_agg_fill(struct r8152 *tp, struct tx_agg *agg) @@ -1360,29 +1423,36 @@ static int r8152_tx_agg_fill(struct r8152 *tp, struct tx_agg *agg) struct tx_desc *tx_desc; struct sk_buff *skb; unsigned int len; + u32 offset; skb = __skb_dequeue(&skb_head); if (!skb) break; - remain -= sizeof(*tx_desc); - len = skb->len; - if (remain < len) { + len = skb->len + sizeof(*tx_desc); + + if (len > remain) { __skb_queue_head(&skb_head, skb); break; } tx_data = tx_agg_align(tx_data); tx_desc = (struct tx_desc *)tx_data; + + offset = (u32)skb_transport_offset(skb); + + r8152_tx_csum(tp, tx_desc, skb, skb->len, offset); + tx_data += sizeof(*tx_desc); - r8152_tx_csum(tp, tx_desc, skb); - memcpy(tx_data, skb->data, len); - agg->skb_num++; + len = r8152_xmit_frags(tp, skb, tx_data); + + tx_data += len; agg->skb_len += len; + agg->skb_num++; + dev_kfree_skb_any(skb); - tx_data += len; remain = rx_buf_sz - (int)(tx_agg_align(tx_data) - agg->head); } @@ -3149,10 +3219,13 @@ static int rtl8152_probe(struct usb_interface *intf, netdev->netdev_ops = &rtl8152_netdev_ops; netdev->watchdog_timeo = RTL8152_TX_TIMEOUT; - netdev->features |= NETIF_F_RXCSUM | NETIF_F_IP_CSUM; - netdev->hw_features = NETIF_F_RXCSUM | NETIF_F_IP_CSUM; + netdev->features |= NETIF_F_RXCSUM | NETIF_F_IP_CSUM | NETIF_F_SG | + NETIF_F_TSO; + netdev->hw_features = NETIF_F_RXCSUM | NETIF_F_IP_CSUM | NETIF_F_SG | + NETIF_F_TSO; SET_ETHTOOL_OPS(netdev, &ops); + netif_set_gso_max_size(netdev, RTL_LIMITED_TSO_SIZE); tp->mii.dev = netdev; tp->mii.mdio_read = read_mii_word;
Support scatter gather and TSO. Adjust the tx checksum function and set the max gso size to fix the size of the tx aggregation buffer. Signed-off-by: Hayes Wang <hayeswang@realtek.com> --- drivers/net/usb/r8152.c | 133 +++++++++++++++++++++++++++++++++++++----------- 1 file changed, 103 insertions(+), 30 deletions(-)