From patchwork Sat Nov 13 13:15:01 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Artem Bityutskiy X-Patchwork-Id: 71052 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from canuck.infradead.org (canuck.infradead.org [134.117.69.58]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 86974B7119 for ; Sun, 14 Nov 2010 00:20:44 +1100 (EST) Received: from localhost ([127.0.0.1] helo=canuck.infradead.org) by canuck.infradead.org with esmtp (Exim 4.72 #1 (Red Hat Linux)) id 1PHFwy-000280-4t; Sat, 13 Nov 2010 13:15:24 +0000 Received: from mail-bw0-f49.google.com ([209.85.214.49]) by canuck.infradead.org with esmtp (Exim 4.72 #1 (Red Hat Linux)) id 1PHFwq-00027M-BN for linux-mtd@lists.infradead.org; Sat, 13 Nov 2010 13:15:21 +0000 Received: by bwz5 with SMTP id 5so4012542bwz.36 for ; Sat, 13 Nov 2010 05:15:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:subject:from:reply-to:to:cc :in-reply-to:references:content-type:date:message-id:mime-version :x-mailer:content-transfer-encoding; bh=/a5Njtk3MQ9UyR9v+D0Kcklj0cZfnjfaldyhPaZQS/I=; b=vcXKLwfYveE9j1bAlRbsqCq2Xu9/thqQpLhh6GX0yi/q/LhbJIZtZFbLfN96TOobvp yGL0c5WPaKnXYHeJY7mMACRwMd+Fdd5ERPj7KPB+NWw3XpqpDuG7YcDxBMFnZBVSRN3W iB8duR0ssSbv/meoKAE6x+CYz2gkuurSIEJfs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=subject:from:reply-to:to:cc:in-reply-to:references:content-type :date:message-id:mime-version:x-mailer:content-transfer-encoding; b=c5d/2hATGaiwp49Y1dDGCkzd6qn4tBUlnTcvZ6Vijjf6zAmCz1KyAXLgO0QlHCfhnZ 2Ep/ayiyl/O2tGFhOQk92pLI7ZiUnS867Z8GBQZoocxJjZWOUkLo6LxxE9VbkptKdZV2 h/zHL1wC5RgMxoYZmJ/IhtMgo/VcilTMq7OJk= Received: by 10.204.79.129 with SMTP id p1mr4049942bkk.59.1289654112482; Sat, 13 Nov 2010 05:15:12 -0800 (PST) Received: from ?IPv6:::1? (shutemov.name [188.40.19.243]) by mx.google.com with ESMTPS id r21sm2031129bkj.22.2010.11.13.05.15.10 (version=SSLv3 cipher=RC4-MD5); Sat, 13 Nov 2010 05:15:11 -0800 (PST) Subject: Re: ubi deadlock on .36+ From: Artem Bityutskiy To: Grazvydas Ignotas In-Reply-To: <1289651831.2218.48.camel@localhost> References: <1288855749.2075.7.camel@localhost> <1289651831.2218.48.camel@localhost> Date: Sat, 13 Nov 2010 15:15:01 +0200 Message-ID: <1289654101.2218.51.camel@localhost> Mime-Version: 1.0 X-Mailer: Evolution 2.32.0 (2.32.0-2.fc14) X-CRM114-Version: 20090807-BlameThorstenAndJenny ( TRE 0.7.6 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20101113_081516_732470_2FBD5DAF X-CRM114-Status: GOOD ( 32.00 ) X-Spam-Score: 2.1 (++) X-Spam-Report: SpamAssassin version 3.3.1 on canuck.infradead.org summary: Content analysis details: (2.1 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, low trust [209.85.214.49 listed in list.dnswl.org] 0.0 FREEMAIL_FROM Sender email is freemail (dedekind1[at]gmail.com) 2.2 FREEMAIL_ENVFROM_END_DIGIT Envelope-from freemail username ends in digit (dedekind1[at]gmail.com) -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature Cc: linux-mtd@lists.infradead.org X-BeenThere: linux-mtd@lists.infradead.org X-Mailman-Version: 2.1.12 Precedence: list Reply-To: dedekind1@gmail.com List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: linux-mtd-bounces@lists.infradead.org Errors-To: linux-mtd-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org On Sat, 2010-11-13 at 14:37 +0200, Artem Bityutskiy wrote: > On Thu, 2010-11-04 at 15:07 +0200, Grazvydas Ignotas wrote: > > On Thu, Nov 4, 2010 at 9:29 AM, Artem Bityutskiy wrote: > > > On Wed, 2010-11-03 at 23:30 +0200, Grazvydas Ignotas wrote: > > >> Hi, > > >> > > >> there seems to be some issue with NAND on my OMAP3 board that causes > > >> CRC errors on 2.6.36 and 2.6.37-rc1. Those seem to be triggering a bug > > >> in UBI that makes it loop forever (or very long) printing this: > > >> > > >> uncorrectable error : > > >> UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes > > >> from PEB 0:512, read 512 bytes > > >> uncorrectable error : > > >> UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes > > >> from PEB 68:512, read 512 bytes > > >> UBI: run torture test for PEB 68 > > >> UBI: PEB 68 passed torture test, do not mark it a bad > > >> > > >> > > >> here is full log of one minute session, after which I killed power: > > >> http://notaz.gp2x.de/misc/pnd/logs/linux_20101103_ubi_lockup > > > > > > Hmm, could you please enable UBI debugging and provide me the logs? See > > > here for some hints: > > > http://www.linux-mtd.infradead.org/doc/ubi.html#L_how_send_bugreport > > > > done: > > http://notaz.gp2x.de/misc/pnd/logs/linux_20101103_ubi_lockup2 > > But would it be possible to enable all UBI debugging messages? While trying to figure out what is happening in your system, I realized one possible scenario which may confuse UBI. I've added a patch below. This probably won't fix your issue (but you could try), I need more time to think about what was happening. But a log with all messages (not only I/O) would help. Thanks. From 703ba5f120644fefef3cfed46c0d8ccf6a15b4ee Mon Sep 17 00:00:00 2001 From: Artem Bityutskiy Date: Sat, 13 Nov 2010 15:08:29 +0200 Subject: [PATCH] UBI: improve UBI robustness When reading data from the flash, corrupt the buffer we are about to read to before reading. The idea is to fix the following possible situation: 1. The buffer contains data from previous operation, e.g., read from another PEB previously. The data looks like expected, e.g., if we just do not read anything and return - the caller would not notice this. E.g., if we are reading a VID header, the buffer may contain a valid VID header from another PEB. 2. The driver is buggy and returns use success or -EBADMSG or -EUCLEAN, but it does not actually put any data to the buffer. This may confuse UBI or upper layers - they may think the buffer contains valid data while in fact it is just old data. This is especially possible because UBI (and UBIFS) relies on CRC, and treats data as correct even in case of ECC errors if the CRC is correct. Try to prevent this situation by changing the first byte of the buffer. Signed-off-by: Artem Bityutskiy --- drivers/mtd/ubi/io.c | 22 ++++++++++++++++++++++ 1 files changed, 22 insertions(+), 0 deletions(-) diff --git a/drivers/mtd/ubi/io.c b/drivers/mtd/ubi/io.c index c2960ac..9ab1a33 100644 --- a/drivers/mtd/ubi/io.c +++ b/drivers/mtd/ubi/io.c @@ -146,6 +146,28 @@ int ubi_io_read(const struct ubi_device *ubi, void *buf, int pnum, int offset, if (err) return err; + /* + * Deliberately corrupt the buffer to improve robustness. Indeed, if we + * do not do this, the following may happen: + * 1. The buffer contains data from previous operation, e.g., read from + * another PEB previously. The data looks like expected, e.g., if we + * just do not read anything and return - the caller would not + * notice this. E.g., if we are reading a VID header, the buffer may + * contain a valid VID header from another PEB. + * 2. The driver is buggy and returns us success or -EBADMSG or + * -EUCLEAN, but it does not actually put any data to the buffer. + * + * This may confuse UBI or upper layers - they may think the buffer + * contains valid data while in fact it is just old data. This is + * especially possible because UBI (and UBIFS) relies on CRC, and + * treats data as correct even in case of ECC errors if the CRC is + * correct. + * + * Try to prevent this situation by changing the first byte of the + * buffer. + */ + *((uint8_t *)buf) ^= 0xFF; + addr = (loff_t)pnum * ubi->peb_size + offset; retry: err = ubi->mtd->read(ubi->mtd, addr, len, &read, buf);