From patchwork Sat Jul 10 01:24:08 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Lord X-Patchwork-Id: 58448 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 040F7B6F06 for ; Sat, 10 Jul 2010 11:24:18 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751614Ab0GJBYM (ORCPT ); Fri, 9 Jul 2010 21:24:12 -0400 Received: from ironport2-out.teksavvy.com ([206.248.154.181]:39128 "EHLO ironport2-out.pppoe.ca" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751425Ab0GJBYL (ORCPT ); Fri, 9 Jul 2010 21:24:11 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApIBAAZpN0xLd/sX/2dsb2JhbAAHgxbLSJEZhDVyBA X-IronPort-AV: E=Sophos;i="4.55,176,1278302400"; d="scan'208";a="69905202" Received: from rtr.ca (HELO [10.0.0.6]) ([75.119.251.23]) by ironport2-out.pppoe.ca with ESMTP/TLS/DHE-RSA-CAMELLIA256-SHA; 09 Jul 2010 21:24:10 -0400 Message-ID: <4C37CBB8.1040909@teksavvy.com> Date: Fri, 09 Jul 2010 21:24:08 -0400 From: Mark Lord User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-GB; rv:1.9.1.10) Gecko/20100512 Thunderbird/3.0.5 MIME-Version: 1.0 To: Greg Freemyer CC: IDE/ATA development list , Mark Lord Subject: Re: If I have a single bad sector, how many failed reads should simple dd report? References: <4C37CA99.1040104@teksavvy.com> In-Reply-To: <4C37CA99.1040104@teksavvy.com> Sender: linux-ide-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ide@vger.kernel.org On 09/07/10 09:19 PM, Mark Lord wrote: > On 09/07/10 03:04 PM, Greg Freemyer wrote: > .. >>> When I re-ran it, /var/log/messages reported 10 bad logical blocks. >>> And even worse, dd reported 20 bad blocks. I examined the data dd >>> read and it had 80KB of zero'ed out data. So that's 160 sectors worth >>> of data lost because of a single bad sector. At most I was expecting >>> 4KB of zero'ed out data. > .. > > That's just the standard, undesirable result of the current SCSI EH > when used with libata for (mainly) desktop computers. > > I have patches (against older kernels) to fix it, but have yet to > get both myself and James B. interested enough simultaneously to > actually get the kernel fixed. :) .. Here (attached and inline below) are my most recent patches for this. Still outdated, though. These are against the SLES11 2.6.27.19 kernel: -----------------------------snip---------------------------- Stop the SCSI EH from performing tons of retries on unrecoverable medium errors, so that error-handling fails more quickly and we (EMC) avoid unneeded node resets. The ugliness of this patch matches the ugliness of SCSI EH. Does *anyone* actually understand this code completely? Signed-off-by: Mark Lord -----------------------------snip---------------------------- sles11: On encountering a bad sector, report and skip over it, then continue with the remainder of the request. Otherwise we would fail perfectly good sectors, making a bad situation even worse. Signed-off-by: Mark Lord --- old/drivers/scsi/scsi_lib.c 2009-06-04 12:26:52.000000000 -0400 +++ linux/drivers/scsi/scsi_lib.c 2009-06-04 14:40:11.000000000 -0400 @@ -952,6 +952,12 @@ */ if (sense_valid && !sense_deferred) { switch (sshdr.sense_key) { + case MEDIUM_ERROR: + /* Bad sector. Fail it, and then continue the rest of the request. */ + if (this_count && scsi_end_request(cmd, -EIO, cmd->device->sector_size, 1) == NULL) { + cmd->retries = 0; // go around again.. + return; + } case UNIT_ATTENTION: if (cmd->device->removable) { /* Detected disc change. Set a bit