From patchwork Tue Mar 8 10:11:37 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Artem Bityutskiy X-Patchwork-Id: 85963 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 07C44B6F01 for ; Tue, 8 Mar 2011 21:15:05 +1100 (EST) Received: from canuck.infradead.org ([2001:4978:20e::1]) by bombadil.infradead.org with esmtps (Exim 4.72 #1 (Red Hat Linux)) id 1Pwtul-0007S8-8Y; Tue, 08 Mar 2011 10:13:15 +0000 Received: from localhost ([127.0.0.1] helo=canuck.infradead.org) by canuck.infradead.org with esmtp (Exim 4.72 #1 (Red Hat Linux)) id 1Pwtuh-0000gq-HS; Tue, 08 Mar 2011 10:13:11 +0000 Received: from mail-ww0-f49.google.com ([74.125.82.49]) by canuck.infradead.org with esmtps (Exim 4.72 #1 (Red Hat Linux)) id 1Pwtuc-0000gX-VN for linux-mtd@lists.infradead.org; Tue, 08 Mar 2011 10:13:08 +0000 Received: by wwc33 with SMTP id 33so554601wwc.18 for ; Tue, 08 Mar 2011 02:13:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:subject:from:reply-to:to:cc:in-reply-to :references:content-type:date:message-id:mime-version:x-mailer :content-transfer-encoding; bh=6mKecLc0LYM3K4BeH+BUYeOPZKBgmkjUP+uAV4qVaMs=; b=DueFOtLfvH41Sh0adG0Hqd95r2ZiS6ft7wx/OFhWRxxyu7CA/6DHbJv/YszBxt1hlR SJOqADdNVMtblxLwgFU6oRop1qoyEe5md9jwOkzOltjn3UTkIrAGl8tTd3ws4n4sasjQ 3EExp2+hw5Z3vuOHxDK91pr1/hhozKJwY8U0o= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=subject:from:reply-to:to:cc:in-reply-to:references:content-type :date:message-id:mime-version:x-mailer:content-transfer-encoding; b=qbXgILxwPTTmJggmwAh/fliiQwB1mMFrInd5/I6oWow2ek6TAzyEnsb6aTH47O7IAr lnx4dnvfHdMltcxj1MhXmy35n8dHyJLb9PDzY1NM5dfki7WIsY3xicGtJdNuFpSeX6Ur GWlO4b5m2sOcD/XgkmCQKjCKhSz1IUzWf9zuA= Received: by 10.227.140.80 with SMTP id h16mr4407107wbu.127.1299579184476; Tue, 08 Mar 2011 02:13:04 -0800 (PST) Received: from ?IPv6:::1? (shutemov.name [188.40.19.243]) by mx.google.com with ESMTPS id x1sm454342wbh.2.2011.03.08.02.13.01 (version=SSLv3 cipher=OTHER); Tue, 08 Mar 2011 02:13:02 -0800 (PST) Subject: Re: [PATCH] Handle high-order allocation failures in ubifs_jnl_write_data From: Artem Bityutskiy To: "Matthew L. Creech" In-Reply-To: References: Date: Tue, 08 Mar 2011 12:11:37 +0200 Message-ID: <1299579097.2754.14.camel@localhost> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 (2.32.1-1.fc14) X-CRM114-Version: 20090807-BlameThorstenAndJenny ( TRE 0.7.6 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20110308_051307_316788_00EBBEED X-CRM114-Status: GOOD ( 24.68 ) X-Spam-Score: 1.4 (+) X-Spam-Report: SpamAssassin version 3.3.1 on canuck.infradead.org summary: Content analysis details: (1.4 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low trust [74.125.82.49 listed in list.dnswl.org] 0.0 FREEMAIL_FROM Sender email is freemail (dedekind1[at]gmail.com) 2.2 FREEMAIL_ENVFROM_END_DIGIT Envelope-from freemail username ends in digit (dedekind1[at]gmail.com) -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature Cc: t.stanislaws@samsung.com, linux-mtd@lists.infradead.org X-BeenThere: linux-mtd@lists.infradead.org X-Mailman-Version: 2.1.12 Precedence: list Reply-To: dedekind1@gmail.com List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: linux-mtd-bounces@lists.infradead.org Errors-To: linux-mtd-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org On Fri, 2011-03-04 at 17:55 -0500, Matthew L. Creech wrote: > Running kernel 2.6.37, my PPC-based device occasionally gets an > order-2 allocation failure in UBIFS, which causes the root FS to > become unwritable: Matthew, I've massaged your patch a bit. The changes I've made are as follows. 1. Some more commentaries and some re-naming. 2. Kill the union. 4. Tweak patch commit message. 5. Allocate write reserve buffer dynamically 6. Allocate write reserve buffer only when mounting in R/W mode to save some RAM when we are in R/O mode. 7. Free the buffer when we are remounting to R/O mode and allocate it again when re-mounting to R/W mode. The patch is below. Please, let me know if you are OK with this patch. I've also pushed it to ubifs-2.6.git / master. From 7b0ebd08a562b1d78e459880fcc4282d08bcfd8b Mon Sep 17 00:00:00 2001 From: Matthew L. Creech Date: Fri, 4 Mar 2011 17:55:02 -0500 Subject: [PATCH] UBIFS: handle allocation failures in UBIFS write path Running kernel 2.6.37, my PPC-based device occasionally gets an order-2 allocation failure in UBIFS, which causes the root FS to become unwritable: kswapd0: page allocation failure. order:2, mode:0x4050 Call Trace: [c787dc30] [c00085b8] show_stack+0x7c/0x194 (unreliable) [c787dc70] [c0061aec] __alloc_pages_nodemask+0x4f0/0x57c [c787dd00] [c0061b98] __get_free_pages+0x20/0x50 [c787dd10] [c00e4f88] ubifs_jnl_write_data+0x54/0x200 [c787dd50] [c00e82d4] do_writepage+0x94/0x198 [c787dd90] [c00675e4] shrink_page_list+0x40c/0x77c [c787de40] [c0067de0] shrink_inactive_list+0x1e0/0x370 [c787de90] [c0068224] shrink_zone+0x2b4/0x2b8 [c787df00] [c0068854] kswapd+0x408/0x5d4 [c787dfb0] [c0037bcc] kthread+0x80/0x84 [c787dff0] [c000ef44] kernel_thread+0x4c/0x68 Similar problems were encountered last April by Tomasz Stanislawski: http://patchwork.ozlabs.org/patch/50965/ This patch implements Artem's suggested fix: fall back to a mutex-protected static buffer, allocated at mount time. I tested it by forcing execution down the failure path, and didn't see any ill effects. Artem: massaged the patch a little, improved it so that we'd not allocate the write reserve buffer when we are in R/O mode. Signed-off-by: Matthew L. Creech Signed-off-by: Artem Bityutskiy --- fs/ubifs/journal.c | 28 ++++++++++++++++++++++------ fs/ubifs/super.c | 18 ++++++++++++++++++ fs/ubifs/ubifs.h | 14 ++++++++++++++ 3 files changed, 54 insertions(+), 6 deletions(-) diff --git a/fs/ubifs/journal.c b/fs/ubifs/journal.c index 914f1bd..aed25e8 100644 --- a/fs/ubifs/journal.c +++ b/fs/ubifs/journal.c @@ -690,7 +690,7 @@ int ubifs_jnl_write_data(struct ubifs_info *c, const struct inode *inode, { struct ubifs_data_node *data; int err, lnum, offs, compr_type, out_len; - int dlen = UBIFS_DATA_NODE_SZ + UBIFS_BLOCK_SIZE * WORST_COMPR_FACTOR; + int dlen = COMPRESSED_DATA_NODE_BUF_SZ, allocated = 1; struct ubifs_inode *ui = ubifs_inode(inode); dbg_jnl("ino %lu, blk %u, len %d, key %s", @@ -698,9 +698,19 @@ int ubifs_jnl_write_data(struct ubifs_info *c, const struct inode *inode, DBGKEY(key)); ubifs_assert(len <= UBIFS_BLOCK_SIZE); - data = kmalloc(dlen, GFP_NOFS); - if (!data) - return -ENOMEM; + data = kmalloc(dlen, GFP_NOFS | __GFP_NOWARN); + if (!data) { + /* + * Fall-back to the write reserve buffer. Note, we might be + * currently on the memory reclaim path, when the kernel is + * trying to free some memory by writing out dirty pages. The + * write reserve buffer helps us to guarantee that we are + * always able to write the data. + */ + allocated = 0; + mutex_lock(&c->write_reserve_mutex); + data = c->write_reserve_buf; + } data->ch.node_type = UBIFS_DATA_NODE; key_write(c, key, &data->key); @@ -736,7 +746,10 @@ int ubifs_jnl_write_data(struct ubifs_info *c, const struct inode *inode, goto out_ro; finish_reservation(c); - kfree(data); + if (!allocated) + mutex_unlock(&c->write_reserve_mutex); + else + kfree(data); return 0; out_release: @@ -745,7 +758,10 @@ out_ro: ubifs_ro_mode(c, err); finish_reservation(c); out_free: - kfree(data); + if (!allocated) + mutex_unlock(&c->write_reserve_mutex); + else + kfree(data); return err; } diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c index c20c6d2..0e1c1c6 100644 --- a/fs/ubifs/super.c +++ b/fs/ubifs/super.c @@ -1221,6 +1221,13 @@ static int mount_ubifs(struct ubifs_info *c) if (c->bulk_read == 1) bu_init(c); + if (!c->ro_mount) { + c->write_reserve_buf = kmalloc(COMPRESSED_DATA_NODE_BUF_SZ, + GFP_KERNEL); + if (!c->write_reserve_buf) + goto out_free; + } + c->mounting = 1; err = ubifs_read_superblock(c); @@ -1490,6 +1497,7 @@ out_wbufs: out_cbuf: kfree(c->cbuf); out_free: + kfree(c->write_reserve_buf); kfree(c->bu.buf); vfree(c->ileb_buf); vfree(c->sbuf); @@ -1528,6 +1536,7 @@ static void ubifs_umount(struct ubifs_info *c) kfree(c->cbuf); kfree(c->rcvrd_mst_node); kfree(c->mst_node); + kfree(c->write_reserve_buf); kfree(c->bu.buf); vfree(c->ileb_buf); vfree(c->sbuf); @@ -1613,6 +1622,10 @@ static int ubifs_remount_rw(struct ubifs_info *c) goto out; } + c->write_reserve_buf = kmalloc(COMPRESSED_DATA_NODE_BUF_SZ, GFP_KERNEL); + if (!c->write_reserve_buf) + goto out; + err = ubifs_lpt_init(c, 0, 1); if (err) goto out; @@ -1677,6 +1690,8 @@ out: c->bgt = NULL; } free_wbufs(c); + kfree(c->write_reserve_buf); + c->write_reserve_buf = NULL; vfree(c->ileb_buf); c->ileb_buf = NULL; ubifs_lpt_free(c, 1); @@ -1720,6 +1735,8 @@ static void ubifs_remount_ro(struct ubifs_info *c) free_wbufs(c); vfree(c->orph_buf); c->orph_buf = NULL; + kfree(c->write_reserve_buf); + c->write_reserve_buf = NULL; vfree(c->ileb_buf); c->ileb_buf = NULL; ubifs_lpt_free(c, 1); @@ -1950,6 +1967,7 @@ static int ubifs_fill_super(struct super_block *sb, void *data, int silent) mutex_init(&c->mst_mutex); mutex_init(&c->umount_mutex); mutex_init(&c->bu_mutex); + mutex_init(&c->write_reserve_mutex); init_waitqueue_head(&c->cmt_wq); c->buds = RB_ROOT; c->old_idx = RB_ROOT; diff --git a/fs/ubifs/ubifs.h b/fs/ubifs/ubifs.h index 3624950..8c40ad3 100644 --- a/fs/ubifs/ubifs.h +++ b/fs/ubifs/ubifs.h @@ -151,6 +151,12 @@ */ #define WORST_COMPR_FACTOR 2 +/* + * How much memory is needed for a buffer where we comress a data node. + */ +#define COMPRESSED_DATA_NODE_BUF_SZ \ + (UBIFS_DATA_NODE_SZ + UBIFS_BLOCK_SIZE * WORST_COMPR_FACTOR) + /* Maximum expected tree height for use by bottom_up_buf */ #define BOTTOM_UP_HEIGHT 64 @@ -1005,6 +1011,11 @@ struct ubifs_debug_info; * @bu_mutex: protects the pre-allocated bulk-read buffer and @c->bu * @bu: pre-allocated bulk-read information * + * @write_reserve_mutex: protects @write_reserve_buf + * @write_reserve_buf: on the write path we allocate memory, which might + * sometimes be unavailable, in which case we use this + * write reserve buffer + * * @log_lebs: number of logical eraseblocks in the log * @log_bytes: log size in bytes * @log_last: last LEB of the log @@ -1256,6 +1267,9 @@ struct ubifs_info { struct mutex bu_mutex; struct bu_info bu; + struct mutex write_reserve_mutex; + void *write_reserve_buf; + int log_lebs; long long log_bytes; int log_last;