diff mbox

Fix race in tcp_poll

Message ID 20100920185719.GA13355@gmail.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Tom Marshall Sept. 20, 2010, 7:18 p.m. UTC
If a RST comes in immediately after checking sk->sk_err, tcp_poll will
return POLLIN but not POLLOUT.  Fix this by checking sk->sk_err at the end
of tcp_poll.  Additionally, ensure the correct order of operations on SMP
machines with memory barriers.

Signed-off-by: Tom Marshall <tdm.code@gmail.com>


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Dumazet Sept. 20, 2010, 9:54 p.m. UTC | #1
Le lundi 20 septembre 2010 à 12:18 -0700, Tom Marshall a écrit :
> If a RST comes in immediately after checking sk->sk_err, tcp_poll will
> return POLLIN but not POLLOUT.  Fix this by checking sk->sk_err at the end
> of tcp_poll.  Additionally, ensure the correct order of operations on SMP
> machines with memory barriers.
> 
> Signed-off-by: Tom Marshall <tdm.code@gmail.com>
> 
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 3fb1428..95d75d4 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -386,8 +386,6 @@ unsigned int tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
>  	 */
>  
>  	mask = 0;
> -	if (sk->sk_err)
> -		mask = POLLERR;
>  
>  	/*
>  	 * POLLHUP is certainly not done right. But poll() doesn't
> @@ -457,6 +455,11 @@ unsigned int tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
>  		if (tp->urg_data & TCP_URG_VALID)
>  			mask |= POLLPRI;
>  	}
> +	/* This barrier is coupled with smp_wmb() in tcp_reset() */
> +	smp_rmb();
> +	if (sk->sk_err)
> +		mask |= POLLERR;
> +
>  	return mask;
>  }
>  EXPORT_SYMBOL(tcp_poll);
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index e663b78..149e79a 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -4048,6 +4048,8 @@ static void tcp_reset(struct sock *sk)
>  	default:
>  		sk->sk_err = ECONNRESET;
>  	}
> +	/* This barrier is coupled with smp_rmb() in tcp_poll() */
> +	smp_wmb();
>  
>  	if (!sock_flag(sk, SOCK_DEAD))
>  		sk->sk_error_report(sk);
> 

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Maybe this is the last time we can avoid taking socket lock in
tcp_poll() ;)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Sept. 20, 2010, 10:42 p.m. UTC | #2
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 20 Sep 2010 23:54:06 +0200

> Le lundi 20 septembre 2010 à 12:18 -0700, Tom Marshall a écrit :
>> If a RST comes in immediately after checking sk->sk_err, tcp_poll will
>> return POLLIN but not POLLOUT.  Fix this by checking sk->sk_err at the end
>> of tcp_poll.  Additionally, ensure the correct order of operations on SMP
>> machines with memory barriers.
>> 
>> Signed-off-by: Tom Marshall <tdm.code@gmail.com>
...
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> Maybe this is the last time we can avoid taking socket lock in
> tcp_poll() ;)

:-)  Applied, thanks everyone.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 3fb1428..95d75d4 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -386,8 +386,6 @@  unsigned int tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
 	 */
 
 	mask = 0;
-	if (sk->sk_err)
-		mask = POLLERR;
 
 	/*
 	 * POLLHUP is certainly not done right. But poll() doesn't
@@ -457,6 +455,11 @@  unsigned int tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
 		if (tp->urg_data & TCP_URG_VALID)
 			mask |= POLLPRI;
 	}
+	/* This barrier is coupled with smp_wmb() in tcp_reset() */
+	smp_rmb();
+	if (sk->sk_err)
+		mask |= POLLERR;
+
 	return mask;
 }
 EXPORT_SYMBOL(tcp_poll);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index e663b78..149e79a 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4048,6 +4048,8 @@  static void tcp_reset(struct sock *sk)
 	default:
 		sk->sk_err = ECONNRESET;
 	}
+	/* This barrier is coupled with smp_rmb() in tcp_poll() */
+	smp_wmb();
 
 	if (!sock_flag(sk, SOCK_DEAD))
 		sk->sk_error_report(sk);