diff mbox

[-next] net: tcp: add mib counters to track zero window transitions

Message ID 1392674268-26005-1-git-send-email-fw@strlen.de
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Florian Westphal Feb. 17, 2014, 9:57 p.m. UTC
Three counters are added:
- one to track when we went from non-zero to zero window
- one to track the reverse
- one counter incremented when we want to announce zero window.

The latter is added because it can show cases where we want to close the
window but can't because we would shrink window.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 Eric, is this what you had in mind?

 I re-ran my 'slow-sender-with-reader-that-does-not-drain-socket'
 scenario and, as expected, only TCPWANTZEROWINDOW increases.

 Thanks,
 Florian

 include/uapi/linux/snmp.h |  3 +++
 net/ipv4/proc.c           |  3 +++
 net/ipv4/tcp_output.c     | 13 ++++++++++++-
 3 files changed, 18 insertions(+), 1 deletion(-)

Comments

David Miller Feb. 19, 2014, 6:17 p.m. UTC | #1
From: Florian Westphal <fw@strlen.de>
Date: Mon, 17 Feb 2014 22:57:48 +0100

> Three counters are added:
> - one to track when we went from non-zero to zero window
> - one to track the reverse
> - one counter incremented when we want to announce zero window.
> 
> The latter is added because it can show cases where we want to close the
> window but can't because we would shrink window.
> 
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  Eric, is this what you had in mind?
> 
>  I re-ran my 'slow-sender-with-reader-that-does-not-drain-socket'
>  scenario and, as expected, only TCPWANTZEROWINDOW increases.

Eric, ping?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Feb. 19, 2014, 6:49 p.m. UTC | #2
On Wed, 2014-02-19 at 13:17 -0500, David Miller wrote:
> From: Florian Westphal <fw@strlen.de>
> Date: Mon, 17 Feb 2014 22:57:48 +0100
> 
> > Three counters are added:
> > - one to track when we went from non-zero to zero window
> > - one to track the reverse
> > - one counter incremented when we want to announce zero window.
> > 
> > The latter is added because it can show cases where we want to close the
> > window but can't because we would shrink window.
> > 
> > Suggested-by: Eric Dumazet <edumazet@google.com>
> > Signed-off-by: Florian Westphal <fw@strlen.de>
> > ---
> >  Eric, is this what you had in mind?
> > 
> >  I re-ran my 'slow-sender-with-reader-that-does-not-drain-socket'
> >  scenario and, as expected, only TCPWANTZEROWINDOW increases.
> 
> Eric, ping?

Thanks, I missed this patch. Let me think about it, thanks !


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Feb. 19, 2014, 6:59 p.m. UTC | #3
On Wed, Feb 19, 2014 at 10:49:47AM -0800, Eric Dumazet wrote:
> On Wed, 2014-02-19 at 13:17 -0500, David Miller wrote:
> > From: Florian Westphal <fw@strlen.de>
> > Date: Mon, 17 Feb 2014 22:57:48 +0100
> > 
> > > Three counters are added:
> > > - one to track when we went from non-zero to zero window
> > > - one to track the reverse
> > > - one counter incremented when we want to announce zero window.
> > > 
> > > The latter is added because it can show cases where we want to close the
> > > window but can't because we would shrink window.
> > > 
> > > Suggested-by: Eric Dumazet <edumazet@google.com>
> > > Signed-off-by: Florian Westphal <fw@strlen.de>
> > > ---
> > >  Eric, is this what you had in mind?
> > > 
> > >  I re-ran my 'slow-sender-with-reader-that-does-not-drain-socket'
> > >  scenario and, as expected, only TCPWANTZEROWINDOW increases.
> > 
> > Eric, ping?
> 
> Thanks, I missed this patch. Let me think about it, thanks !

We need NET_INC_STATS instead of NET_INC_STATS_BH, no?

Greetings,

  Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Feb. 19, 2014, 7:06 p.m. UTC | #4
On Wed, Feb 19, 2014 at 07:59:49PM +0100, Hannes Frederic Sowa wrote:
> On Wed, Feb 19, 2014 at 10:49:47AM -0800, Eric Dumazet wrote:
> > On Wed, 2014-02-19 at 13:17 -0500, David Miller wrote:
> > > From: Florian Westphal <fw@strlen.de>
> > > Date: Mon, 17 Feb 2014 22:57:48 +0100
> > > 
> > > > Three counters are added:
> > > > - one to track when we went from non-zero to zero window
> > > > - one to track the reverse
> > > > - one counter incremented when we want to announce zero window.
> > > > 
> > > > The latter is added because it can show cases where we want to close the
> > > > window but can't because we would shrink window.
> > > > 
> > > > Suggested-by: Eric Dumazet <edumazet@google.com>
> > > > Signed-off-by: Florian Westphal <fw@strlen.de>
> > > > ---
> > > >  Eric, is this what you had in mind?
> > > > 
> > > >  I re-ran my 'slow-sender-with-reader-that-does-not-drain-socket'
> > > >  scenario and, as expected, only TCPWANTZEROWINDOW increases.
> > > 
> > > Eric, ping?
> > 
> > Thanks, I missed this patch. Let me think about it, thanks !
> 
> We need NET_INC_STATS instead of NET_INC_STATS_BH, no?

Ok, not strictly needed but just to keep the style, as tcp_select_window can
be called from both, bh and process context.

Greetings,

  Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Florian Westphal Feb. 19, 2014, 7:18 p.m. UTC | #5
Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:
> On Wed, Feb 19, 2014 at 10:49:47AM -0800, Eric Dumazet wrote:
> > On Wed, 2014-02-19 at 13:17 -0500, David Miller wrote:
> > > From: Florian Westphal <fw@strlen.de>
> > > Date: Mon, 17 Feb 2014 22:57:48 +0100
> > > 
> > > > Three counters are added:
> > > > - one to track when we went from non-zero to zero window
> > > > - one to track the reverse
> > > > - one counter incremented when we want to announce zero window.
> > > > 
> > > > The latter is added because it can show cases where we want to close the
> > > > window but can't because we would shrink window.
> > > > 
> > > > Suggested-by: Eric Dumazet <edumazet@google.com>
> > > > Signed-off-by: Florian Westphal <fw@strlen.de>
> > > > ---
> > > >  Eric, is this what you had in mind?
> > > > 
> > > >  I re-ran my 'slow-sender-with-reader-that-does-not-drain-socket'
> > > >  scenario and, as expected, only TCPWANTZEROWINDOW increases.
> > > 
> > > Eric, ping?
> > 
> > Thanks, I missed this patch. Let me think about it, thanks !
> 
> We need NET_INC_STATS instead of NET_INC_STATS_BH, no?

sure, I can send v2 but I'll wait for (n)ack from Eric first
before considering it.

tcp_transmit_skb (caller of tcp_select_window) also uses _BH
which is why it ended up in tcp_select_window.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Feb. 19, 2014, 7:27 p.m. UTC | #6
On Wed, 2014-02-19 at 20:18 +0100, Florian Westphal wrote:

> sure, I can send v2 but I'll wait for (n)ack from Eric first
> before considering it.
> 
> tcp_transmit_skb (caller of tcp_select_window) also uses _BH
> which is why it ended up in tcp_select_window.

Yes, it seems the NET_INC_STATS_BH() in tcp_transmit_skb() should really
be a NET_INC_STATS(), even if it currently doesn't matter, even on 32bit
arches.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Feb. 19, 2014, 7:30 p.m. UTC | #7
On Wed, Feb 19, 2014 at 08:18:31PM +0100, Florian Westphal wrote:
> Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:
> > On Wed, Feb 19, 2014 at 10:49:47AM -0800, Eric Dumazet wrote:
> > > On Wed, 2014-02-19 at 13:17 -0500, David Miller wrote:
> > > > From: Florian Westphal <fw@strlen.de>
> > > > Date: Mon, 17 Feb 2014 22:57:48 +0100
> > > > 
> > > > > Three counters are added:
> > > > > - one to track when we went from non-zero to zero window
> > > > > - one to track the reverse
> > > > > - one counter incremented when we want to announce zero window.
> > > > > 
> > > > > The latter is added because it can show cases where we want to close the
> > > > > window but can't because we would shrink window.
> > > > > 
> > > > > Suggested-by: Eric Dumazet <edumazet@google.com>
> > > > > Signed-off-by: Florian Westphal <fw@strlen.de>
> > > > > ---
> > > > >  Eric, is this what you had in mind?
> > > > > 
> > > > >  I re-ran my 'slow-sender-with-reader-that-does-not-drain-socket'
> > > > >  scenario and, as expected, only TCPWANTZEROWINDOW increases.
> > > > 
> > > > Eric, ping?
> > > 
> > > Thanks, I missed this patch. Let me think about it, thanks !
> > 
> > We need NET_INC_STATS instead of NET_INC_STATS_BH, no?
> 
> sure, I can send v2 but I'll wait for (n)ack from Eric first
> before considering it.
> 
> tcp_transmit_skb (caller of tcp_select_window) also uses _BH
> which is why it ended up in tcp_select_window.

NET_STATS only use 32 bit counter and thus need not be protected with a
seqlock on 32 bit platforms. As such, it does not matter, but e.g. the IP
counter are prone to deadlocks if used with wrong postfix because of 64 bit
counter thus protected by seqlock.

A pitty that the _STATS_BH postfixes have the opposite meaning of the bottom
_bh postfixes.

Basically always safe is _STATS and if we are sure we can omit the bh disable
call because we only call the function from bh, we can use _STATS_BH calls.

Greetings,

  Hannnes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Feb. 19, 2014, 7:32 p.m. UTC | #8
Sorry...

On Wed, Feb 19, 2014 at 08:30:38PM +0100, Hannes Frederic Sowa wrote:
> NET_STATS only use 32 bit counter and thus need not be protected with a
> seqlock on 32 bit platforms. As such, it does not matter, but e.g. the IP
> counter are prone to deadlocks if used with wrong postfix because of 64 bit
> counter thus protected by seqlock.
> 
> A pitty that the _STATS_BH postfixes have the opposite meaning of the bottom

	s/bottom$/locking/

> _bh postfixes.
> 
> Basically always safe is _STATS and if we are sure we can omit the bh disable
> call because we only call the function from bh, we can use _STATS_BH calls.

Should read mails again before sending. ;)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Feb. 19, 2014, 7:46 p.m. UTC | #9
On Wed, 2014-02-19 at 20:30 +0100, Hannes Frederic Sowa wrote:

> NET_STATS only use 32 bit counter and thus need not be protected with a
> seqlock on 32 bit platforms. As such, it does not matter, but e.g. the IP
> counter are prone to deadlocks if used with wrong postfix because of 64 bit
> counter thus protected by seqlock.
> 
> A pitty that the _STATS_BH postfixes have the opposite meaning of the bottom
> _bh postfixes.
> 
> Basically always safe is _STATS and if we are sure we can omit the bh disable
> call because we only call the function from bh, we can use _STATS_BH calls.

Using __this_cpu_inc() is not safe in preemptable contexts.

Sure, x86 doesn't care, but other arches might.

Fortunately LINUX_MIB_TCPSPURIOUS_RTX_HOSTQUEUES can be incremented
only for a retransmit, and they should always happen from BH.

Better avoid the confusion for this ultra rare case, I'll send a patch.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Feb. 25, 2014, 12:14 a.m. UTC | #10
From: Florian Westphal <fw@strlen.de>
Date: Mon, 17 Feb 2014 22:57:48 +0100

> Three counters are added:
> - one to track when we went from non-zero to zero window
> - one to track the reverse
> - one counter incremented when we want to announce zero window.
> 
> The latter is added because it can show cases where we want to close the
> window but can't because we would shrink window.
> 
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  Eric, is this what you had in mind?
> 
>  I re-ran my 'slow-sender-with-reader-that-does-not-drain-socket'
>  scenario and, as expected, only TCPWANTZEROWINDOW increases.

What is happening with this patch?  The discussion just died out.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Florian Westphal Feb. 25, 2014, 8:23 a.m. UTC | #11
David Miller <davem@davemloft.net> wrote:
> From: Florian Westphal <fw@strlen.de>
> Date: Mon, 17 Feb 2014 22:57:48 +0100
> 
> > Three counters are added:
> > - one to track when we went from non-zero to zero window
> > - one to track the reverse
> > - one counter incremented when we want to announce zero window.
> > 
> > The latter is added because it can show cases where we want to close the
> > window but can't because we would shrink window.
> > 
> > Suggested-by: Eric Dumazet <edumazet@google.com>
> > Signed-off-by: Florian Westphal <fw@strlen.de>
> > ---
> >  Eric, is this what you had in mind?
> > 
> >  I re-ran my 'slow-sender-with-reader-that-does-not-drain-socket'
> >  scenario and, as expected, only TCPWANTZEROWINDOW increases.
> 
> What is happening with this patch?  The discussion just died out.

I guess Eric is busy with TCP usec timer resolution changes.
You can mark the patch as defered if you want; I can re-send it for
3.16 if needed.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Feb. 25, 2014, 12:24 p.m. UTC | #12
On Tue, 2014-02-25 at 09:23 +0100, Florian Westphal wrote:
> David Miller <davem@davemloft.net> wrote:
 
> > What is happening with this patch?  The discussion just died out.
> 
> I guess Eric is busy with TCP usec timer resolution changes.
> You can mark the patch as defered if you want; I can re-send it for
> 3.16 if needed.

Well, I was waiting you resend it using Hannes feedback ;)

That is using NET_INC_STATS() instead of NET_INC_STATS_BH()

I am sending the cleanup for LINUX_MIB_TCPSPURIOUS_RTX_HOSTQUEUES
immediately.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Florian Westphal Feb. 25, 2014, 12:34 p.m. UTC | #13
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2014-02-25 at 09:23 +0100, Florian Westphal wrote:
> > David Miller <davem@davemloft.net> wrote:
>  
> > > What is happening with this patch?  The discussion just died out.
> > 
> > I guess Eric is busy with TCP usec timer resolution changes.
> > You can mark the patch as defered if you want; I can re-send it for
> > 3.16 if needed.
> 
> Well, I was waiting you resend it using Hannes feedback ;)

Heh.  I was waiting for you to tell me wheter you reject the general
intention of the patch. 	Classic deadlock :)

> I am sending the cleanup for LINUX_MIB_TCPSPURIOUS_RTX_HOSTQUEUES
> immediately.

Thanks Eric.  I'll send v2 of the patch in a few minutes.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h
index bbaba22..6404eed 100644
--- a/include/uapi/linux/snmp.h
+++ b/include/uapi/linux/snmp.h
@@ -259,6 +259,9 @@  enum
 	LINUX_MIB_TCPSPURIOUS_RTX_HOSTQUEUES, /* TCPSpuriousRtxHostQueues */
 	LINUX_MIB_BUSYPOLLRXPACKETS,		/* BusyPollRxPackets */
 	LINUX_MIB_TCPAUTOCORKING,		/* TCPAutoCorking */
+	LINUX_MIB_TCPFROMZEROWINDOWADV,		/* TCPFromZeroWindowAdv */
+	LINUX_MIB_TCPTOZEROWINDOWADV,		/* TCPToZeroWindowAdv */
+	LINUX_MIB_TCPWANTZEROWINDOW,		/* TCPWantZeroWindow */
 	__LINUX_MIB_MAX
 };
 
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index a6c8a80..542d414 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -280,6 +280,9 @@  static const struct snmp_mib snmp4_net_list[] = {
 	SNMP_MIB_ITEM("TCPSpuriousRtxHostQueues", LINUX_MIB_TCPSPURIOUS_RTX_HOSTQUEUES),
 	SNMP_MIB_ITEM("BusyPollRxPackets", LINUX_MIB_BUSYPOLLRXPACKETS),
 	SNMP_MIB_ITEM("TCPAutoCorking", LINUX_MIB_TCPAUTOCORKING),
+	SNMP_MIB_ITEM("TCPFromZeroWindowAdv", LINUX_MIB_TCPFROMZEROWINDOWADV),
+	SNMP_MIB_ITEM("TCPToZeroWindowAdv", LINUX_MIB_TCPTOZEROWINDOWADV),
+	SNMP_MIB_ITEM("TCPWantZeroWindow", LINUX_MIB_TCPWANTZEROWINDOW),
 	SNMP_MIB_SENTINEL
 };
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 48414fc..e8d6f14 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -269,6 +269,7 @@  EXPORT_SYMBOL(tcp_select_initial_window);
 static u16 tcp_select_window(struct sock *sk)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
+	u32 old_win = tp->rcv_wnd;
 	u32 cur_win = tcp_receive_window(tp);
 	u32 new_win = __tcp_select_window(sk);
 
@@ -281,6 +282,9 @@  static u16 tcp_select_window(struct sock *sk)
 		 *
 		 * Relax Will Robinson.
 		 */
+		if (new_win == 0)
+			NET_INC_STATS_BH(sock_net(sk),
+					 LINUX_MIB_TCPWANTZEROWINDOW);
 		new_win = ALIGN(cur_win, 1 << tp->rx_opt.rcv_wscale);
 	}
 	tp->rcv_wnd = new_win;
@@ -298,8 +302,15 @@  static u16 tcp_select_window(struct sock *sk)
 	new_win >>= tp->rx_opt.rcv_wscale;
 
 	/* If we advertise zero window, disable fast path. */
-	if (new_win == 0)
+	if (new_win == 0) {
 		tp->pred_flags = 0;
+		if (old_win)
+			NET_INC_STATS_BH(sock_net(sk),
+					 LINUX_MIB_TCPTOZEROWINDOWADV);
+	} else if (old_win == 0) {
+		NET_INC_STATS_BH(sock_net(sk),
+				 LINUX_MIB_TCPFROMZEROWINDOWADV);
+	}
 
 	return new_win;
 }