Patchwork [v4,06/10] e1000e: Support for byte queue limits

login
register
mail settings
Submitter Tom Herbert
Date Nov. 29, 2011, 2:33 a.m.
Message ID <alpine.DEB.2.00.1111281819350.24413@pokey.mtv.corp.google.com>
Download mbox | patch
Permalink /patch/128203/
State Accepted
Delegated to: David Miller
Headers show

Comments

Tom Herbert - Nov. 29, 2011, 2:33 a.m.
Changes to e1000e to use byte queue limits.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 drivers/net/ethernet/intel/e1000e/netdev.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)
Jesse Brandeburg - Nov. 29, 2011, 9:01 p.m.
On Mon, 28 Nov 2011 18:33:16 -0800
Tom Herbert <therbert@google.com> wrote:

> Changes to e1000e to use byte queue limits.

First:  thanks Tom for looking into e1000e with this work.

> @@ -1096,6 +1097,10 @@ static bool e1000_clean_tx_irq(struct e1000_adapter *adapter)
>  			if (cleaned) {
>  				total_tx_packets += buffer_info->segs;
>  				total_tx_bytes += buffer_info->bytecount;
> +				if (buffer_info->skb) {
> +					bytes_compl += buffer_info->skb->len;

whats wrong with using total_tx_bytes or buffer_info->bytecount?  it
contains the "bytes on the wire" value which will be slightly larger
than skb->len, but avoids warming the skb->len cacheline unnecessarily.

the rest of the patch to e1000e looks okay.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tom Herbert - Dec. 1, 2011, 1:04 a.m.
> whats wrong with using total_tx_bytes or buffer_info->bytecount?  it
> contains the "bytes on the wire" value which will be slightly larger
> than skb->len, but avoids warming the skb->len cacheline unnecessarily.
>

This makes sense.  I made the changes, but the limits computed are
much higher than with using sbk->len.  I suspect there may be a bug in
GSO path.

For instance, in the driver at:

  bytecount = ((segs - 1) * skb_headlen(skb)) + skb->len;

I see cases like:

  segs=34, skb_header_len(skb)=70, skb->len=49298 so bytecount=51608

Which seems reasonable... but, I also see things like:

  segs=45, skb_header_len(skb)=1522, skb->len=65226, so bytecount= 132194
                                                      ^^^^
Which doesn't seem right at all.  I am thinking there may be a bug
setting skb->data_len improperly.

Tom
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - Dec. 1, 2011, 10:12 a.m.
Le mercredi 30 novembre 2011 à 17:04 -0800, Tom Herbert a écrit :
> > whats wrong with using total_tx_bytes or buffer_info->bytecount?  it
> > contains the "bytes on the wire" value which will be slightly larger
> > than skb->len, but avoids warming the skb->len cacheline unnecessarily.
> >
> 
> This makes sense.  I made the changes, but the limits computed are
> much higher than with using sbk->len.  I suspect there may be a bug in
> GSO path.
> 
> For instance, in the driver at:
> 
>   bytecount = ((segs - 1) * skb_headlen(skb)) + skb->len;
> 
> I see cases like:
> 
>   segs=34, skb_header_len(skb)=70, skb->len=49298 so bytecount=51608
> 
> Which seems reasonable... but, I also see things like:
> 
>   segs=45, skb_header_len(skb)=1522, skb->len=65226, so bytecount= 132194
>                                                       ^^^^
> Which doesn't seem right at all.  I am thinking there may be a bug
> setting skb->data_len improperly.
> 

OK, as stated on your other thread, its obvious this driver (and
probably other intel drivers) made assumptions that are now obsolete,
since skb head can contain some data payload, not only (MAC+IP+TCP)
headers.

If Intel guys cannot afford approximate the bytecount by skb->len, I
suggest to use same trick found in 
drivers/net/ethernet/intel/igb/igb_main.c

static int igb_tso(struct igb_ring *tx_ring, 
...
        /* compute header lengths */
        l4len = tcp_hdrlen(skb);
        *hdr_len = skb_transport_offset(skb) + l4len;

        /* update gso size and bytecount with header size */
        first->gso_segs = skb_shinfo(skb)->gso_segs;
        first->bytecount += (first->gso_segs - 1) * *hdr_len;




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tom Herbert - Dec. 1, 2011, 4:32 p.m.
> OK, as stated on your other thread, its obvious this driver (and
> probably other intel drivers) made assumptions that are now obsolete,
> since skb head can contain some data payload, not only (MAC+IP+TCP)
> headers.
>
Looks like similar is in several Intel drivers.  Only other driver I
found that is using the false assumption is tg3 IPv6 gso path.

> If Intel guys cannot afford approximate the bytecount by skb->len, I
> suggest to use same trick found in
> drivers/net/ethernet/intel/igb/igb_main.c
>
> static int igb_tso(struct igb_ring *tx_ring,
> ...
>        /* compute header lengths */
>        l4len = tcp_hdrlen(skb);
>        *hdr_len = skb_transport_offset(skb) + l4len;
>
I noticed a similar technique used by several drivers.  it's fine if
it already knew the packet was TCP.  Just assuming that a GSO packet
is TCP might be a new bug, so in the e1000e driver they would need to
check the type (gso_type == SKB_GSO_TCPV4 or SKB_GSO_TCPV6).

I'm not sure that tcp_hdrlen is even being used correctly in the other
drivers.  For instance in bnx2x, tcp_hdrlen is called after checking
skb_is_gso_v6(skb) and skb_is_gso(skb).  If the former is true then in
fact it is a TCP packet (gso_type is SKB_GSO_TCPV6) , but the if the
latter is true it only proves that is is GSO packet (gso_type not
checked, only gso_size>0)--  the packet might be a fragment UDP packet
for instance, so tcphdr_len is not valid on those skbs.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - Dec. 1, 2011, 4:40 p.m.
Le jeudi 01 décembre 2011 à 08:32 -0800, Tom Herbert a écrit :
> > OK, as stated on your other thread, its obvious this driver (and
> > probably other intel drivers) made assumptions that are now obsolete,
> > since skb head can contain some data payload, not only (MAC+IP+TCP)
> > headers.
> >
> Looks like similar is in several Intel drivers.  Only other driver I
> found that is using the false assumption is tg3 IPv6 gso path.
> 
> > If Intel guys cannot afford approximate the bytecount by skb->len, I
> > suggest to use same trick found in
> > drivers/net/ethernet/intel/igb/igb_main.c
> >
> > static int igb_tso(struct igb_ring *tx_ring,
> > ...
> >        /* compute header lengths */
> >        l4len = tcp_hdrlen(skb);
> >        *hdr_len = skb_transport_offset(skb) + l4len;
> >
> I noticed a similar technique used by several drivers.  it's fine if
> it already knew the packet was TCP.  Just assuming that a GSO packet
> is TCP might be a new bug, so in the e1000e driver they would need to
> check the type (gso_type == SKB_GSO_TCPV4 or SKB_GSO_TCPV6).
> 
> I'm not sure that tcp_hdrlen is even being used correctly in the other
> drivers.  For instance in bnx2x, tcp_hdrlen is called after checking
> skb_is_gso_v6(skb) and skb_is_gso(skb).  If the former is true then in
> fact it is a TCP packet (gso_type is SKB_GSO_TCPV6) , but the if the
> latter is true it only proves that is is GSO packet (gso_type not
> checked, only gso_size>0)--  the packet might be a fragment UDP packet
> for instance, so tcphdr_len is not valid on those skbs.

To my knowledge, only one NIC (neterion/s2io) supports NETIF_F_UFO.

So all others only receive tcp gso frames.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Dec. 1, 2011, 5:48 p.m.
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 01 Dec 2011 11:12:14 +0100

> OK, as stated on your other thread, its obvious this driver (and
> probably other intel drivers) made assumptions that are now obsolete,
> since skb head can contain some data payload, not only (MAC+IP+TCP)
> headers.

They were always wrong assumptions even before your patch Eric.

Any Netfilter or similar module can pull some data into the linear
area before the driver sees the packet on transmit.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index a5bd7a3..c6e9763 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -1079,6 +1079,7 @@  static bool e1000_clean_tx_irq(struct e1000_adapter *adapter)
 	unsigned int i, eop;
 	unsigned int count = 0;
 	unsigned int total_tx_bytes = 0, total_tx_packets = 0;
+	unsigned int bytes_compl = 0, pkts_compl = 0;
 
 	i = tx_ring->next_to_clean;
 	eop = tx_ring->buffer_info[i].next_to_watch;
@@ -1096,6 +1097,10 @@  static bool e1000_clean_tx_irq(struct e1000_adapter *adapter)
 			if (cleaned) {
 				total_tx_packets += buffer_info->segs;
 				total_tx_bytes += buffer_info->bytecount;
+				if (buffer_info->skb) {
+					bytes_compl += buffer_info->skb->len;
+					pkts_compl++;
+				}
 			}
 
 			e1000_put_txbuf(adapter, buffer_info);
@@ -1114,6 +1119,8 @@  static bool e1000_clean_tx_irq(struct e1000_adapter *adapter)
 
 	tx_ring->next_to_clean = i;
 
+	netdev_completed_queue(netdev, pkts_compl, bytes_compl);
+
 #define TX_WAKE_THRESHOLD 32
 	if (count && netif_carrier_ok(netdev) &&
 	    e1000_desc_unused(tx_ring) >= TX_WAKE_THRESHOLD) {
@@ -2240,6 +2247,7 @@  static void e1000_clean_tx_ring(struct e1000_adapter *adapter)
 		e1000_put_txbuf(adapter, buffer_info);
 	}
 
+	netdev_reset_queue(adapter->netdev);
 	size = sizeof(struct e1000_buffer) * tx_ring->count;
 	memset(tx_ring->buffer_info, 0, size);
 
@@ -5027,6 +5035,7 @@  static netdev_tx_t e1000_xmit_frame(struct sk_buff *skb,
 	/* if count is 0 then mapping error has occurred */
 	count = e1000_tx_map(adapter, skb, first, max_per_txd, nr_frags, mss);
 	if (count) {
+		netdev_sent_queue(netdev, skb->len);
 		e1000_tx_queue(adapter, tx_flags, count);
 		/* Make sure there is space in the ring for the next send. */
 		e1000_maybe_stop_tx(netdev, MAX_SKB_FRAGS + 2);