diff mbox series

sock: Make sock->sk_tstamp thread-safe

Message ID 20181221202733.19627-1-deepa.kernel@gmail.com
State Changes Requested, archived
Delegated to: David Miller
Headers show
Series sock: Make sock->sk_tstamp thread-safe | expand

Commit Message

Deepa Dinamani Dec. 21, 2018, 8:27 p.m. UTC
Al Viro mentioned that there is probably a race condition
lurking in accesses of sk_tstamp on 32-bit machines.

sock->sk_tstamp is of type ktime_t which is always an s64.
On a 32 bit architecture, we might run into situations of
unsafe access as the access to the field becomes non atomic.

Use seqlocks for synchronization.
This allows us to avoid using spinlocks for readers as
readers do not need mutual exclusion.

Another approach to solve this is to require sk_lock for all
modifications of the timestamps. The current approach allows
for timestamps to have their own lock: sk_tstamp_lock.
This allows for the patch to not compete with already
existing critical sections, and side effects are limited
to the paths in the patch.

The addition of the new field maintains the data locality
optimizations from
commit 9115e8cd2a0c ("net: reorganize struct sock for better data
locality")

Note that all the instances of the sk_tstamp accesses
are either through the ioctl or the syscall recvmsg.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
---
 include/net/sock.h | 16 +++++++++++++---
 net/compat.c       | 30 +++++++++++++++++++++++++-----
 net/core/sock.c    | 34 +++++++++++++++++++++++++++++-----
 3 files changed, 67 insertions(+), 13 deletions(-)

Comments

Arnd Bergmann Dec. 22, 2018, 1:29 p.m. UTC | #1
On 12/21/18, Deepa Dinamani <deepa.kernel@gmail.com> wrote:
> diff --git a/include/net/sock.h b/include/net/sock.h
> index fe58aec00d09..2cb641606533 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -2311,8 +2313,11 @@ sock_recv_timestamp(struct msghdr *msg, struct sock
> *sk, struct sk_buff *skb)
>  	    (hwtstamps->hwtstamp &&
>  	     (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE)))
>  		__sock_recv_timestamp(msg, sk, skb);
> -	else
> +	else {
> +		write_seqlock(&sk->sk_stamp_seq);
>  		sk->sk_stamp = kt;
> +		write_sequnlock(&sk->sk_stamp_seq);
> +	}
>
>  	if (sock_flag(sk, SOCK_WIFI_STATUS) && skb->wifi_acked_valid)
>  		__sock_recv_wifi_status(msg, sk, skb);
> @@ -2332,10 +2337,15 @@ static inline void sock_recv_ts_and_drops(struct
> msghdr *msg, struct sock *sk,
>
>  	if (sk->sk_flags & FLAGS_TS_OR_DROPS || sk->sk_tsflags & TSFLAGS_ANY)
>  		__sock_recv_ts_and_drops(msg, sk, skb);
> -	else if (unlikely(sock_flag(sk, SOCK_TIMESTAMP)))
> +	else if (unlikely(sock_flag(sk, SOCK_TIMESTAMP))) {
> +		write_seqlock(&sk->sk_stamp_seq);
>  		sk->sk_stamp = skb->tstamp;
> -	else if (unlikely(sk->sk_stamp == SK_DEFAULT_STAMP))
> +		write_sequnlock(&sk->sk_stamp_seq);
> +	} else if (unlikely(sk->sk_stamp == SK_DEFAULT_STAMP)) {
> +		write_seqlock(&sk->sk_stamp_seq);
>  		sk->sk_stamp = 0;
> +		write_sequnlock(&sk->sk_stamp_seq);
> +	}
>  }
>

Hi Deepa,

Thanks a lot for the follow-up to our earlier discussion here!

Are we actually worried about concurrent writers here? I thought the
only problem was a race between writer and reader, which would mean
that we could solve it using only a seqcount_t which is cheaper to
update than a seqlock_t.

       Arnd
Deepa Dinamani Dec. 22, 2018, 4:34 p.m. UTC | #2
> Are we actually worried about concurrent writers here? I thought the
> only problem was a race between writer and reader, which would mean
> that we could solve it using only a seqcount_t which is cheaper to
> update than a seqlock_t.

I considered using just the seqcount_t. But, I think we do care about
concurrent writers here.
A couple of scenarios I can think of:

1. When you have 2 concurrent recvmsg() calls on a socket, and they
both try to update sk_tstamp.
2. If a socket has don't have one of the SO_TIMESTAMP/NS options set
and you have a first recvmsg and a concurrent ioctl call on the
socket.

These are corner cases and if we don't care aout these then we can use
just the sequence counters.

I have missed out tstamp update in the sunrcpc code. If everyone is ok
with this approach, I will add it in when I post an update

-Deepa
David Miller Dec. 22, 2018, 11:03 p.m. UTC | #3
From: Deepa Dinamani <deepa.kernel@gmail.com>
Date: Fri, 21 Dec 2018 12:27:33 -0800

> Al Viro mentioned that there is probably a race condition
> lurking in accesses of sk_tstamp on 32-bit machines.
> 
> sock->sk_tstamp is of type ktime_t which is always an s64.
> On a 32 bit architecture, we might run into situations of
> unsafe access as the access to the field becomes non atomic.
> 
> Use seqlocks for synchronization.
> This allows us to avoid using spinlocks for readers as
> readers do not need mutual exclusion.
> 
> Another approach to solve this is to require sk_lock for all
> modifications of the timestamps. The current approach allows
> for timestamps to have their own lock: sk_tstamp_lock.
> This allows for the patch to not compete with already
> existing critical sections, and side effects are limited
> to the paths in the patch.
> 
> The addition of the new field maintains the data locality
> optimizations from
> commit 9115e8cd2a0c ("net: reorganize struct sock for better data
> locality")
> 
> Note that all the instances of the sk_tstamp accesses
> are either through the ioctl or the syscall recvmsg.
> 
> Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>

Since, regardless of whether this is the final approach we will
take, it seems that sunrpc needs to be added to this patch.

So I'm definitely waiting for a new version.

Thanks.
Eric Dumazet Dec. 23, 2018, 7:31 a.m. UTC | #4
On 12/21/2018 12:27 PM, Deepa Dinamani wrote:
> Al Viro mentioned that there is probably a race condition
> lurking in accesses of sk_tstamp on 32-bit machines.
> 
> sock->sk_tstamp is of type ktime_t which is always an s64.
> On a 32 bit architecture, we might run into situations of
> unsafe access as the access to the field becomes non atomic.
> 
> Use seqlocks for synchronization.
> This allows us to avoid using spinlocks for readers as
> readers do not need mutual exclusion.
>

Hi Deepa

Please come up with something that has zero added costs for 64bit kernels.

Most of us do not really care about 32bit kernels anymore, so we do not want to slow
down 64bits kernels for such things.

Look at include/linux/u64_stats_sync.h for initial thoughts.

Thanks.
Deepa Dinamani Dec. 23, 2018, 6:50 p.m. UTC | #5
On Sat, Dec 22, 2018 at 11:31 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 12/21/2018 12:27 PM, Deepa Dinamani wrote:
> > Al Viro mentioned that there is probably a race condition
> > lurking in accesses of sk_tstamp on 32-bit machines.
> >
> > sock->sk_tstamp is of type ktime_t which is always an s64.
> > On a 32 bit architecture, we might run into situations of
> > unsafe access as the access to the field becomes non atomic.
> >
> > Use seqlocks for synchronization.
> > This allows us to avoid using spinlocks for readers as
> > readers do not need mutual exclusion.
> >
>
> Hi Deepa
>
> Please come up with something that has zero added costs for 64bit kernels.
>
> Most of us do not really care about 32bit kernels anymore, so we do not want to slow
> down 64bits kernels for such things.
>
> Look at include/linux/u64_stats_sync.h for initial thoughts.

This is similar to what I did here. But, I can add a few ifdef's to
make this code a noop on 64 bit arches.
I will include this in my next update.

I'm assuming there is no contention on whether writers need exclusive
access and hence requiring a lock here.
Let me know otherwise.

Thanks,
Deepa
diff mbox series

Patch

diff --git a/include/net/sock.h b/include/net/sock.h
index fe58aec00d09..2cb641606533 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -298,6 +298,7 @@  struct sock_common {
   *	@sk_filter: socket filtering instructions
   *	@sk_timer: sock cleanup timer
   *	@sk_stamp: time stamp of last packet received
+  *	@sk_stamp_seq: lock for accessing sk_stamp
   *	@sk_tsflags: SO_TIMESTAMPING socket options
   *	@sk_tskey: counter to disambiguate concurrent tstamp requests
   *	@sk_zckey: counter to order MSG_ZEROCOPY notifications
@@ -474,6 +475,7 @@  struct sock {
 	const struct cred	*sk_peer_cred;
 	long			sk_rcvtimeo;
 	ktime_t			sk_stamp;
+	seqlock_t		sk_stamp_seq;
 	u16			sk_tsflags;
 	u8			sk_shutdown;
 	u32			sk_tskey;
@@ -2311,8 +2313,11 @@  sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb)
 	    (hwtstamps->hwtstamp &&
 	     (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE)))
 		__sock_recv_timestamp(msg, sk, skb);
-	else
+	else {
+		write_seqlock(&sk->sk_stamp_seq);
 		sk->sk_stamp = kt;
+		write_sequnlock(&sk->sk_stamp_seq);
+	}
 
 	if (sock_flag(sk, SOCK_WIFI_STATUS) && skb->wifi_acked_valid)
 		__sock_recv_wifi_status(msg, sk, skb);
@@ -2332,10 +2337,15 @@  static inline void sock_recv_ts_and_drops(struct msghdr *msg, struct sock *sk,
 
 	if (sk->sk_flags & FLAGS_TS_OR_DROPS || sk->sk_tsflags & TSFLAGS_ANY)
 		__sock_recv_ts_and_drops(msg, sk, skb);
-	else if (unlikely(sock_flag(sk, SOCK_TIMESTAMP)))
+	else if (unlikely(sock_flag(sk, SOCK_TIMESTAMP))) {
+		write_seqlock(&sk->sk_stamp_seq);
 		sk->sk_stamp = skb->tstamp;
-	else if (unlikely(sk->sk_stamp == SK_DEFAULT_STAMP))
+		write_sequnlock(&sk->sk_stamp_seq);
+	} else if (unlikely(sk->sk_stamp == SK_DEFAULT_STAMP)) {
+		write_seqlock(&sk->sk_stamp_seq);
 		sk->sk_stamp = 0;
+		write_sequnlock(&sk->sk_stamp_seq);
+	}
 }
 
 void __sock_tx_timestamp(__u16 tsflags, __u8 *tx_flags);
diff --git a/net/compat.c b/net/compat.c
index 6c9fceeefac0..3022f4941687 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -459,6 +459,7 @@  int compat_sock_get_timestamp(struct sock *sk, struct timeval __user *userstamp)
 {
 	struct compat_timeval __user *ctv;
 	int err;
+	unsigned int seq;
 	struct timeval tv;
 
 	if (COMPAT_USE_64BIT_TIME)
@@ -467,11 +468,20 @@  int compat_sock_get_timestamp(struct sock *sk, struct timeval __user *userstamp)
 	ctv = (struct compat_timeval __user *) userstamp;
 	err = -ENOENT;
 	sock_enable_timestamp(sk, SOCK_TIMESTAMP);
-	tv = ktime_to_timeval(sk->sk_stamp);
+
+	do {
+		seq = read_seqbegin(&sk->sk_stamp_seq);
+		tv = ktime_to_timeval(sk->sk_stamp);
+	} while (read_seqretry(&sk->sk_stamp_seq, seq));
+
 	if (tv.tv_sec == -1)
 		return err;
 	if (tv.tv_sec == 0) {
-		sk->sk_stamp = ktime_get_real();
+		ktime_t kt = ktime_get_real();
+
+		write_seqlock(&sk->sk_stamp_seq);
+		sk->sk_stamp = kt;
+		write_sequnlock(&sk->sk_stamp_seq);
 		tv = ktime_to_timeval(sk->sk_stamp);
 	}
 	err = 0;
@@ -486,6 +496,7 @@  int compat_sock_get_timestampns(struct sock *sk, struct timespec __user *usersta
 {
 	struct compat_timespec __user *ctv;
 	int err;
+	unsigned int seq;
 	struct timespec ts;
 
 	if (COMPAT_USE_64BIT_TIME)
@@ -494,12 +505,21 @@  int compat_sock_get_timestampns(struct sock *sk, struct timespec __user *usersta
 	ctv = (struct compat_timespec __user *) userstamp;
 	err = -ENOENT;
 	sock_enable_timestamp(sk, SOCK_TIMESTAMP);
-	ts = ktime_to_timespec(sk->sk_stamp);
+
+	do {
+		seq = read_seqbegin(&sk->sk_stamp_seq);
+		ts = ktime_to_timespec(sk->sk_stamp);
+	} while (read_seqretry(&sk->sk_stamp_seq, seq));
+
 	if (ts.tv_sec == -1)
 		return err;
 	if (ts.tv_sec == 0) {
-		sk->sk_stamp = ktime_get_real();
-		ts = ktime_to_timespec(sk->sk_stamp);
+		ktime_t kt = ktime_get_real();
+
+		write_seqlock(&sk->sk_stamp_seq);
+		sk->sk_stamp = kt;
+		write_sequnlock(&sk->sk_stamp_seq);
+		ts = ktime_to_timespec(kt);
 	}
 	err = 0;
 	if (put_user(ts.tv_sec, &ctv->tv_sec) ||
diff --git a/net/core/sock.c b/net/core/sock.c
index f0e0e12b2e0d..a9a99af24a95 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2783,6 +2783,7 @@  void sock_init_data(struct socket *sock, struct sock *sk)
 	sk->sk_sndtimeo		=	MAX_SCHEDULE_TIMEOUT;
 
 	sk->sk_stamp = SK_DEFAULT_STAMP;
+	seqlock_init(&sk->sk_stamp_seq);
 	atomic_set(&sk->sk_zckey, 0);
 
 #ifdef CONFIG_NET_RX_BUSY_POLL
@@ -2880,15 +2881,27 @@  EXPORT_SYMBOL(lock_sock_fast);
 int sock_get_timestamp(struct sock *sk, struct timeval __user *userstamp)
 {
 	struct timeval tv;
+	unsigned int seq;
 
 	sock_enable_timestamp(sk, SOCK_TIMESTAMP);
-	tv = ktime_to_timeval(sk->sk_stamp);
+
+	do {
+		seq = read_seqbegin(&sk->sk_stamp_seq);
+		tv = ktime_to_timeval(sk->sk_stamp);
+	} while (read_seqretry(&sk->sk_stamp_seq, seq));
+
 	if (tv.tv_sec == -1)
 		return -ENOENT;
 	if (tv.tv_sec == 0) {
-		sk->sk_stamp = ktime_get_real();
-		tv = ktime_to_timeval(sk->sk_stamp);
+		ktime_t kt = ktime_get_real();
+
+		write_seqlock(&sk->sk_stamp_seq);
+		sk->sk_stamp = kt;
+		write_sequnlock(&sk->sk_stamp_seq);
+
+		tv = ktime_to_timeval(kt);
 	}
+
 	return copy_to_user(userstamp, &tv, sizeof(tv)) ? -EFAULT : 0;
 }
 EXPORT_SYMBOL(sock_get_timestamp);
@@ -2896,13 +2909,24 @@  EXPORT_SYMBOL(sock_get_timestamp);
 int sock_get_timestampns(struct sock *sk, struct timespec __user *userstamp)
 {
 	struct timespec ts;
+	unsigned int seq;
 
 	sock_enable_timestamp(sk, SOCK_TIMESTAMP);
-	ts = ktime_to_timespec(sk->sk_stamp);
+
+	do {
+		seq = read_seqbegin(&sk->sk_stamp_seq);
+		ts = ktime_to_timespec(sk->sk_stamp);
+	} while (read_seqretry(&sk->sk_stamp_seq, seq));
+
 	if (ts.tv_sec == -1)
 		return -ENOENT;
 	if (ts.tv_sec == 0) {
-		sk->sk_stamp = ktime_get_real();
+		ktime_t kt = ktime_get_real();
+
+		write_seqlock(&sk->sk_stamp_seq);
+		sk->sk_stamp = kt;
+		write_sequnlock(&sk->sk_stamp_seq);
+
 		ts = ktime_to_timespec(sk->sk_stamp);
 	}
 	return copy_to_user(userstamp, &ts, sizeof(ts)) ? -EFAULT : 0;