From patchwork Tue Aug 30 09:18:10 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Miklos Szeredi X-Patchwork-Id: 664062 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3sNjcW5lcwz9s9N for ; Tue, 30 Aug 2016 19:18:35 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756771AbcH3JSP (ORCPT ); Tue, 30 Aug 2016 05:18:15 -0400 Received: from mail-lf0-f47.google.com ([209.85.215.47]:34205 "EHLO mail-lf0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751629AbcH3JSN (ORCPT ); Tue, 30 Aug 2016 05:18:13 -0400 Received: by mail-lf0-f47.google.com with SMTP id l89so9043005lfi.1 for ; Tue, 30 Aug 2016 02:18:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=vrRXgcXttIAzrlgPnNSo9v8SsCrSMpemeaBFz9aMp8w=; b=YmLFE4tu7v/s850QlRIbR2sz5lKvEuFW1GLXk9y4Ln1fHLSryy92ocrDj6KskIKaME pr+UPWxdBimsoDGAFElhMqLseX29QQS/LHim0dv575Xh9584xRAzqj2i7zx4dq+YZ4lm VzTXy02NbQ+3y5Mewth46z5ZlDRhnnhIEhy41ILCtJF/n0VWmjwWuSl5Y4fKVZ4tyUWw h0UHZfnxUDLwKkz3+eIeJ102OkzTjqMBX8vapH1DKbzeL7IlCmXnMHAQpB7L22CHBAq5 rGPBYHotcbjyVcvZx1IFmJom+Bn7q9fSdRCsjNrRuRqD54GqDHMaOVUYqXFdwGo0K13u 34nQ== X-Gm-Message-State: AE9vXwOGMbgkbSMjS0WQ+ulujd/VW5ORRqcQH9ov8d+QVdMS6644KFCTa4TJ5986CBV2wDPCoI1UVd7vHFP+SInC X-Received: by 10.25.22.97 with SMTP id m94mr742674lfi.70.1472548691779; Tue, 30 Aug 2016 02:18:11 -0700 (PDT) MIME-Version: 1.0 Received: by 10.25.64.7 with HTTP; Tue, 30 Aug 2016 02:18:10 -0700 (PDT) In-Reply-To: References: <57BDAE06.1040400@kyup.com> <9dde3145-9128-ffef-1b84-e3bd429dd4e8@stressinduktion.org> From: Miklos Szeredi Date: Tue, 30 Aug 2016 11:18:10 +0200 Message-ID: Subject: Re: kernel BUG at net/unix/garbage.c:149!" To: Hannes Frederic Sowa Cc: Nikolay Borisov , "Linux-Kernel@Vger. Kernel. Org" , netdev@vger.kernel.org Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Tue, Aug 30, 2016 at 12:37 AM, Miklos Szeredi wrote: > On Sat, Aug 27, 2016 at 11:55 AM, Miklos Szeredi wrote: > crash> list -H gc_inflight_list unix_sock.link -s unix_sock.inflight | > grep counter | cut -d= -f2 | awk '{s+=$1} END {print s}' > 130 > crash> p unix_tot_inflight > unix_tot_inflight = $2 = 135 > > We've lost track of a total of five inflight sockets, so it's not a > one-off thing. Really weird... Now off to sleep, maybe I'll dream of > the solution. Okay, found one bug: gc assumes that in-flight sockets that don't have an external ref can't gain one while unix_gc_lock is held. That is true because unix_notinflight() will be called before detaching fds, which takes unix_gc_lock. Only MSG_PEEK was somehow overlooked. That one also clones the fds, also keeping them in the skb. But through MSG_PEEK an external reference can definitely be gained without ever touching unix_gc_lock. Not sure whether the reported bug can be explained by this. Can you confirm the MSG_PEEK was used in the setup? Does someone want to write a stress test for SCM_RIGHTS + MSG_PEEK? Anyway, attaching a fix that works by acquiring unix_gc_lock in case of MSG_PEEK also. It is trivially correct, but I haven't tested it. Thanks, Miklos From: Miklos Szeredi Subject: af_unix: fix garbage collect vs. MSG_PEEK Gc assumes that in-flight sockets that don't have an external ref can't gain one while unix_gc_lock is held. That is true because unix_notinflight() will be called before detaching fds, which takes unix_gc_lock. Only MSG_PEEK was somehow overlooked. That one also clones the fds, also keeping them in the skb. But through MSG_PEEK an external reference can definitely be gained without ever touching unix_gc_lock. Signed-off-by: Miklos Szeredi Cc: --- include/net/af_unix.h | 1 + net/unix/af_unix.c | 15 +++++++++++++-- net/unix/garbage.c | 6 ++++++ 3 files changed, 20 insertions(+), 2 deletions(-) --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -10,6 +10,7 @@ void unix_inflight(struct user_struct *u void unix_notinflight(struct user_struct *user, struct file *fp); void unix_gc(void); void wait_for_unix_gc(void); +void unix_gc_barrier(void); struct sock *unix_get_socket(struct file *filp); struct sock *unix_peer_get(struct sock *); --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1563,6 +1563,17 @@ static int unix_attach_fds(struct scm_co return max_level; } +static void unix_peek_fds(struct scm_cookie *scm, struct sk_buff *skb) +{ + scm->fp = scm_fp_dup(UNIXCB(skb).fp); + /* + * During garbage collection it is assumed that in-flight sockets don't + * get a new external reference. So we need to wait until current run + * finishes. + */ + unix_gc_barrier(); +} + static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb, bool send_fds) { int err = 0; @@ -2195,7 +2206,7 @@ static int unix_dgram_recvmsg(struct soc sk_peek_offset_fwd(sk, size); if (UNIXCB(skb).fp) - scm.fp = scm_fp_dup(UNIXCB(skb).fp); + unix_peek_fds(&scm, skb); } err = (flags & MSG_TRUNC) ? skb->len - skip : size; @@ -2435,7 +2446,7 @@ static int unix_stream_read_generic(stru /* It is questionable, see note in unix_dgram_recvmsg. */ if (UNIXCB(skb).fp) - scm.fp = scm_fp_dup(UNIXCB(skb).fp); + unix_peek_fds(&scm, skb); sk_peek_offset_fwd(sk, chunk); --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -266,6 +266,12 @@ void wait_for_unix_gc(void) wait_event(unix_gc_wait, gc_in_progress == false); } +void unix_gc_barrier(void) +{ + spin_lock(&unix_gc_lock); + spin_unlock(&unix_gc_lock); +} + /* The external entry point: unix_gc() */ void unix_gc(void) {