From patchwork Tue Nov 20 17:19:40 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: [3.5.yuz, extended, stable] Patch "rbd: reset BACKOFF if unable to re-queue" has been added to staging queue From: Herton Ronaldo Krzesinski X-Patchwork-Id: 200481 Message-Id: <1353431980-11274-1-git-send-email-herton.krzesinski@canonical.com> To: Alex Elder Cc: kernel-team@lists.ubuntu.com, Sage Weil Date: Tue, 20 Nov 2012 15:19:40 -0200 This is a note to let you know that I have just added a patch titled rbd: reset BACKOFF if unable to re-queue to the linux-3.5.y-queue branch of the 3.5.yuz extended stable tree which can be found at: http://kernel.ubuntu.com/git?p=ubuntu/linux.git;a=shortlog;h=refs/heads/linux-3.5.y-queue If you, or anyone else, feels it should not be added to this tree, please reply to this email. For more information about the 3.5.yuz tree, see https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable Thanks. -Herton ------ >From 2f2bf413acd52e5a9bade876b9c7c726bf06b61f Mon Sep 17 00:00:00 2001 From: Alex Elder Date: Mon, 8 Oct 2012 20:37:30 -0700 Subject: [PATCH 73/78] rbd: reset BACKOFF if unable to re-queue commit 588377d6199034c36d335e7df5818b731fea072c upstream. If ceph_fault() is unable to queue work after a delay, it sets the BACKOFF connection flag so con_work() will attempt to do so. In con_work(), when BACKOFF is set, if queue_delayed_work() doesn't result in newly-queued work, it simply ignores this condition and proceeds as if no backoff delay were desired. There are two problems with this--one of which is a bug. The first problem is simply that the intended behavior is to back off, and if we aren't able queue the work item to run after a delay we're not doing that. The only reason queue_delayed_work() won't queue work is if the provided work item is already queued. In the messenger, this means that con_work() is already scheduled to be run again. So if we simply set the BACKOFF flag again when this occurs, we know the next con_work() call will again attempt to hold off activity on the connection until after the delay. The second problem--the bug--is a leak of a reference count. If queue_delayed_work() returns 0 in con_work(), con->ops->put() drops the connection reference held on entry to con_work(). However, processing is (was) allowed to continue, and at the end of the function a second con->ops->put() is called. This patch fixes both problems. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Herton Ronaldo Krzesinski --- net/ceph/messenger.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) -- 1.7.9.5 diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c index 8ba0eee..0de041f 100644 --- a/net/ceph/messenger.c +++ b/net/ceph/messenger.c @@ -2296,10 +2296,11 @@ restart: mutex_unlock(&con->mutex); return; } else { - con->ops->put(con); dout("con_work %p FAILED to back off %lu\n", con, con->delay); + set_bit(CON_FLAG_BACKOFF, &con->flags); } + goto done; } if (con->state == CON_STATE_STANDBY) {