From patchwork Fri Jun 14 18:33:50 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kamal Mostafa X-Patchwork-Id: 251500 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) by ozlabs.org (Postfix) with ESMTP id 0E96C2C008A for ; Sat, 15 Jun 2013 04:35:06 +1000 (EST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.76) (envelope-from ) id 1UnYpv-0008MF-Hx; Fri, 14 Jun 2013 18:34:59 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtp (Exim 4.76) (envelope-from ) id 1UnYoq-0007pC-G9 for kernel-team@lists.ubuntu.com; Fri, 14 Jun 2013 18:33:52 +0000 Received: from c-67-160-231-42.hsd1.ca.comcast.net ([67.160.231.42] helo=fourier) by youngberry.canonical.com with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1UnYoq-00022p-9J; Fri, 14 Jun 2013 18:33:52 +0000 Received: from kamal by fourier with local (Exim 4.80) (envelope-from ) id 1UnYoo-0000Ak-2z; Fri, 14 Jun 2013 11:33:50 -0700 From: Kamal Mostafa To: "H. Peter Anvin" Subject: [ 3.8.y.z extended stable ] Patch "md/raid1, 5, 10: Disable WRITE SAME until a recovery strategy is in" has been added to staging queue Date: Fri, 14 Jun 2013 11:33:50 -0700 Message-Id: <1371234830-626-1-git-send-email-kamal@canonical.com> X-Mailer: git-send-email 1.8.1.2 X-Extended-Stable: 3.8 Cc: NeilBrown , Kamal Mostafa , "H. Peter Anvin" , kernel-team@lists.ubuntu.com X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.14 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: kernel-team-bounces@lists.ubuntu.com This is a note to let you know that I have just added a patch titled md/raid1,5,10: Disable WRITE SAME until a recovery strategy is in to the linux-3.8.y-queue branch of the 3.8.y.z extended stable tree which can be found at: http://kernel.ubuntu.com/git?p=ubuntu/linux.git;a=shortlog;h=refs/heads/linux-3.8.y-queue This patch is scheduled to be released in version 3.8.13.3. If you, or anyone else, feels it should not be added to this tree, please reply to this email. For more information about the 3.8.y.z tree, see https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable Thanks. -Kamal ------ From 30410bc50edf2acad3ad3fefe5f49021e0f24b79 Mon Sep 17 00:00:00 2001 From: "H. Peter Anvin" Date: Wed, 12 Jun 2013 07:37:43 -0700 Subject: md/raid1,5,10: Disable WRITE SAME until a recovery strategy is in place commit 5026d7a9b2f3eb1f9bda66c18ac6bc3036ec9020 upstream. There are cases where the kernel will believe that the WRITE SAME command is supported by a block device which does not, in fact, support WRITE SAME. This currently happens for SATA drivers behind a SAS controller, but there are probably a hundred other ways that can happen, including drive firmware bugs. After receiving an error for WRITE SAME the block layer will retry the request as a plain write of zeroes, but mdraid will consider the failure as fatal and consider the drive failed. This has the effect that all the mirrors containing a specific set of data are each offlined in very rapid succession resulting in data loss. However, just bouncing the request back up to the block layer isn't ideal either, because the whole initial request-retry sequence should be inside the write bitmap fence, which probably means that md needs to do its own conversion of WRITE SAME to write zero. Until the failure scenario has been sorted out, disable WRITE SAME for raid1, raid5, and raid10. [neilb: added raid5] This patch is appropriate for any -stable since 3.7 when write_same support was added. Signed-off-by: H. Peter Anvin Signed-off-by: NeilBrown Signed-off-by: Kamal Mostafa --- drivers/md/raid1.c | 4 ++-- drivers/md/raid10.c | 3 +-- drivers/md/raid5.c | 4 +++- 3 files changed, 6 insertions(+), 5 deletions(-) -- 1.8.1.2 diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 3dab5bb..7116798 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -2837,8 +2837,8 @@ static int run(struct mddev *mddev) return PTR_ERR(conf); if (mddev->queue) - blk_queue_max_write_same_sectors(mddev->queue, - mddev->chunk_sectors); + blk_queue_max_write_same_sectors(mddev->queue, 0); + rdev_for_each(rdev, mddev) { if (!mddev->gendisk) continue; diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index fa1e11c..2f4be3c 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -3588,8 +3588,7 @@ static int run(struct mddev *mddev) if (mddev->queue) { blk_queue_max_discard_sectors(mddev->queue, mddev->chunk_sectors); - blk_queue_max_write_same_sectors(mddev->queue, - mddev->chunk_sectors); + blk_queue_max_write_same_sectors(mddev->queue, 0); blk_queue_io_min(mddev->queue, chunk_size); if (conf->geo.raid_disks % conf->geo.near_copies) blk_queue_io_opt(mddev->queue, chunk_size * conf->geo.raid_disks); diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 94ce78e..9a6ff72 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -5494,7 +5494,7 @@ static int run(struct mddev *mddev) if (mddev->major_version == 0 && mddev->minor_version > 90) rdev->recovery_offset = reshape_offset; - + if (rdev->recovery_offset < reshape_offset) { /* We need to check old and new layout */ if (!only_parity(rdev->raid_disk, @@ -5617,6 +5617,8 @@ static int run(struct mddev *mddev) */ mddev->queue->limits.discard_zeroes_data = 0; + blk_queue_max_write_same_sectors(mddev->queue, 0); + rdev_for_each(rdev, mddev) { disk_stack_limits(mddev->gendisk, rdev->bdev, rdev->data_offset << 9);