From patchwork Wed Aug  7 12:46:17 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Andrea Righi <andrea.righi@canonical.com>
X-Patchwork-Id: 1143481
Return-Path: <kernel-team-bounces@lists.ubuntu.com>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
	spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com
	(client-ip=91.189.94.19; helo=huckleberry.canonical.com;
	envelope-from=kernel-team-bounces@lists.ubuntu.com;
	receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none)
	header.from=canonical.com
Received: from huckleberry.canonical.com (huckleberry.canonical.com
	[91.189.94.19])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 463WTh0rQmz9sP3;
	Wed,  7 Aug 2019 22:46:31 +1000 (AEST)
Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com)
	by huckleberry.canonical.com with esmtp (Exim 4.86_2)
	(envelope-from <kernel-team-bounces@lists.ubuntu.com>)
	id 1hvLKm-0001JG-Ej; Wed, 07 Aug 2019 12:46:28 +0000
Received: from youngberry.canonical.com ([91.189.89.112])
	by huckleberry.canonical.com with esmtps
	(TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128)
	(Exim 4.86_2) (envelope-from <andrea.righi@canonical.com>)
	id 1hvLKj-0001I3-Ql
	for kernel-team@lists.ubuntu.com; Wed, 07 Aug 2019 12:46:25 +0000
Received: from mail-wr1-f72.google.com ([209.85.221.72])
	by youngberry.canonical.com with esmtps
	(TLS1.0:RSA_AES_128_CBC_SHA1:16)
	(Exim 4.76) (envelope-from <andrea.righi@canonical.com>)
	id 1hvLKj-0007Gv-Fh
	for kernel-team@lists.ubuntu.com; Wed, 07 Aug 2019 12:46:25 +0000
Received: by mail-wr1-f72.google.com with SMTP id r4so43788114wrt.13
	for <kernel-team@lists.ubuntu.com>;
	Wed, 07 Aug 2019 05:46:25 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to
	:references:mime-version:content-transfer-encoding;
	bh=RU5v1x1pV+RHxIe6+PMRLraZYlsiDguSdLX57/Vabew=;
	b=UzfX2pM4MgD9PlJ8GvXyf24cu71zJrrwWU0s9z3NxN+MouEx9CSREwvF5X+1f6aIcq
	VidGJpC3yhA4PkA7JWb6BthOKNc5GAfy2V621mLh07tmsLuAfZ9M068/uEDZi+EdLdUz
	bdOpK66YIRzlv4U6f68Z81ypp6UdmxFZIIWfuBtro7FfD9+PHDbvlKuZonv8LQRTcKm/
	52lkTJUCIQio7a5YyPYtkkHFrAwApu8R2ZCh4n+5dzq18Dd9jVtwCbnZ6ZqYqoXDp/z0
	3cdpTiqf8EnNrL7qDCoak/nLLSzJRA5q1uBC9/bq6EKI6LQidPKYf803i2lHyXvhPLVn
	8kLg==
X-Gm-Message-State: APjAAAWu3OlTxJ91agLWUGBwrajdsZ56C52G7VLrSRgs7+e3vJwAFCE+
	qIz74k61BiUpqeIg94ZZNZzn0aGGto2CsTEwtTq8hi+50fmcrzwq+Dr7ongWrG+dA+EH4Stfv9x
	pljEHpdTNnFoRGi/+g58m7PTrVAUSONGjaQNLR5zTfw==
X-Received: by 2002:a7b:c4d2:: with SMTP id
	g18mr10827273wmk.79.1565181984902;
	Wed, 07 Aug 2019 05:46:24 -0700 (PDT)
X-Google-Smtp-Source: 
 APXvYqwWlDw7rhab3yxqnCopilOfdGRTxprrN6ax8CdWKvE6Axta0AFNxeuiA32dteIjlr7LWo8cdQ==
X-Received: by 2002:a7b:c4d2:: with SMTP id
	g18mr10827245wmk.79.1565181984581;
	Wed, 07 Aug 2019 05:46:24 -0700 (PDT)
Received: from xps-13.homenet.telecomitalia.it
	(host21-131-dynamic.46-79-r.retail.telecomitalia.it. [79.46.131.21])
	by smtp.gmail.com with ESMTPSA id
	p7sm64020091wrs.6.2019.08.07.05.46.23
	for <kernel-team@lists.ubuntu.com>
	(version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256);
	Wed, 07 Aug 2019 05:46:24 -0700 (PDT)
From: Andrea Righi <andrea.righi@canonical.com>
To: kernel-team@lists.ubuntu.com
Subject: [PATCH 3/3] UBUNTU: SAUCE: bcache: fix deadlock in bcache_allocator
Date: Wed,  7 Aug 2019 14:46:17 +0200
Message-Id: <20190807124617.30224-4-andrea.righi@canonical.com>
X-Mailer: git-send-email 2.20.1
In-Reply-To: <20190807124617.30224-1-andrea.righi@canonical.com>
References: <20190807124617.30224-1-andrea.righi@canonical.com>
MIME-Version: 1.0
X-BeenThere: kernel-team@lists.ubuntu.com
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Kernel team discussions <kernel-team.lists.ubuntu.com>
List-Unsubscribe: <https://lists.ubuntu.com/mailman/options/kernel-team>,
	<mailto:kernel-team-request@lists.ubuntu.com?subject=unsubscribe>
List-Archive: <https://lists.ubuntu.com/archives/kernel-team>
List-Post: <mailto:kernel-team@lists.ubuntu.com>
List-Help: <mailto:kernel-team-request@lists.ubuntu.com?subject=help>
List-Subscribe: <https://lists.ubuntu.com/mailman/listinfo/kernel-team>,
	<mailto:kernel-team-request@lists.ubuntu.com?subject=subscribe>
Errors-To: kernel-team-bounces@lists.ubuntu.com
Sender: "kernel-team" <kernel-team-bounces@lists.ubuntu.com>

bcache_allocator() can call the following:

 bch_allocator_thread()
  -> bch_prio_write()
     -> bch_bucket_alloc()
        -> wait on &ca->set->bucket_wait

But the wake up event on bucket_wait is supposed to come from
bch_allocator_thread() itself => deadlock:

[ 1158.490744] INFO: task bcache_allocato:15861 blocked for more than 10 seconds.
[ 1158.495929]       Not tainted 5.3.0-050300rc3-generic #201908042232
[ 1158.500653] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1158.504413] bcache_allocato D    0 15861      2 0x80004000
[ 1158.504419] Call Trace:
[ 1158.504429]  __schedule+0x2a8/0x670
[ 1158.504432]  schedule+0x2d/0x90
[ 1158.504448]  bch_bucket_alloc+0xe5/0x370 [bcache]
[ 1158.504453]  ? wait_woken+0x80/0x80
[ 1158.504466]  bch_prio_write+0x1dc/0x390 [bcache]
[ 1158.504476]  bch_allocator_thread+0x233/0x490 [bcache]
[ 1158.504491]  kthread+0x121/0x140
[ 1158.504503]  ? invalidate_buckets+0x890/0x890 [bcache]
[ 1158.504506]  ? kthread_park+0xb0/0xb0
[ 1158.504510]  ret_from_fork+0x35/0x40

Fix by making the call to bch_prio_write() non-blocking, so that
bch_allocator_thread() never waits on itself.

Moreover, make sure to wake up the garbage collector thread when
bch_prio_write() is failing to allocate buckets.

BugLink: https://bugs.launchpad.net/bugs/1784665
BugLink: https://bugs.launchpad.net/bugs/1796292
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
---
 drivers/md/bcache/alloc.c  |  5 ++++-
 drivers/md/bcache/bcache.h |  2 +-
 drivers/md/bcache/super.c  | 27 +++++++++++++++++++++------
 3 files changed, 26 insertions(+), 8 deletions(-)
diff --git a/drivers/md/bcache/alloc.c b/drivers/md/bcache/alloc.c
index 5002838ea476..0a2cdaac682e 100644
--- a/drivers/md/bcache/alloc.c
+++ b/drivers/md/bcache/alloc.c
@@ -376,7 +376,10 @@ static int bch_allocator_thread(void *arg)
 			if (!fifo_full(&ca->free_inc))
 				goto retry_invalidate;
 
-			bch_prio_write(ca);
+			if (bch_prio_write(ca, false) < 0) {
+				ca->invalidate_needs_gc = 1;
+				wake_up_gc(ca->set);
+			}
 		}
 	}
 out:
diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index e5d2158f4f32..9f64ae22915b 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -979,7 +979,7 @@ bool bch_cached_dev_error(struct cached_dev *dc);
 __printf(2, 3)
 bool bch_cache_set_error(struct cache_set *c, const char *fmt, ...);
 
-void bch_prio_write(struct cache *ca);
+int bch_prio_write(struct cache *ca, bool wait);
 void bch_write_bdev_super(struct cached_dev *dc, struct closure *parent);
 
 extern struct workqueue_struct *bcache_wq;
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 5f7b3ce09c6f..9176f5962aa6 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -525,12 +525,29 @@ static void prio_io(struct cache *ca, uint64_t bucket, int op,
 	closure_sync(cl);
 }
 
-void bch_prio_write(struct cache *ca)
+int bch_prio_write(struct cache *ca, bool wait)
 {
 	int i;
 	struct bucket *b;
 	struct closure cl;
 
+	pr_debug("free_prio=%zu, free_none=%zu, free_inc=%zu",
+		 fifo_used(&ca->free[RESERVE_PRIO]),
+		 fifo_used(&ca->free[RESERVE_NONE]),
+		 fifo_used(&ca->free_inc));
+
+	/*
+	 * Pre-check if there are enough free buckets. In the non-blocking
+	 * scenario it's better to fail early rather than starting to allocate
+	 * buckets and do a cleanup later in case of failure.
+	 */
+	if (!wait) {
+		size_t avail = fifo_used(&ca->free[RESERVE_PRIO]) +
+			       fifo_used(&ca->free[RESERVE_NONE]);
+		if (prio_buckets(ca) > avail)
+			return -ENOMEM;
+	}
+
 	closure_init_stack(&cl);
 
 	lockdep_assert_held(&ca->set->bucket_lock);
@@ -540,9 +557,6 @@ void bch_prio_write(struct cache *ca)
 	atomic_long_add(ca->sb.bucket_size * prio_buckets(ca),
 			&ca->meta_sectors_written);
 
-	//pr_debug("free %zu, free_inc %zu, unused %zu", fifo_used(&ca->free),
-	//	 fifo_used(&ca->free_inc), fifo_used(&ca->unused));
-
 	for (i = prio_buckets(ca) - 1; i >= 0; --i) {
 		long bucket;
 		struct prio_set *p = ca->disk_buckets;
@@ -560,7 +574,7 @@ void bch_prio_write(struct cache *ca)
 		p->magic	= pset_magic(&ca->sb);
 		p->csum		= bch_crc64(&p->magic, bucket_bytes(ca) - 8);
 
-		bucket = bch_bucket_alloc(ca, RESERVE_PRIO, true);
+		bucket = bch_bucket_alloc(ca, RESERVE_PRIO, wait);
 		BUG_ON(bucket == -1);
 
 		mutex_unlock(&ca->set->bucket_lock);
@@ -589,6 +603,7 @@ void bch_prio_write(struct cache *ca)
 
 		ca->prio_last_buckets[i] = ca->prio_buckets[i];
 	}
+	return 0;
 }
 
 static void prio_read(struct cache *ca, uint64_t bucket)
@@ -1880,7 +1895,7 @@ static void run_cache_set(struct cache_set *c)
 
 		mutex_lock(&c->bucket_lock);
 		for_each_cache(ca, c, i)
-			bch_prio_write(ca);
+			bch_prio_write(ca, true);
 		mutex_unlock(&c->bucket_lock);
 
 		err = "cannot allocate new UUID bucket";