From patchwork Wed Feb 12 23:53:53 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ross Zwisler X-Patchwork-Id: 319846 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 310B02C00B0 for ; Thu, 13 Feb 2014 10:53:52 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751196AbaBLXxv (ORCPT ); Wed, 12 Feb 2014 18:53:51 -0500 Received: from mga11.intel.com ([192.55.52.93]:21582 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751053AbaBLXxu (ORCPT ); Wed, 12 Feb 2014 18:53:50 -0500 Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga102.fm.intel.com with ESMTP; 12 Feb 2014 15:53:49 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.95,835,1384329600"; d="scan'208";a="480571829" Received: from rzwisler-mac09.lm.intel.com ([10.232.112.158]) by fmsmga002.fm.intel.com with ESMTP; 12 Feb 2014 15:53:40 -0800 Date: Wed, 12 Feb 2014 16:53:53 -0700 (MST) From: Ross Zwisler X-X-Sender: rzwisler@scrumpy To: Matthew Wilcox cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-ext4@vger.kernel.org Subject: Re: [PATCH v5 06/22] Treat XIP like O_DIRECT In-Reply-To: Message-ID: References: User-Agent: Alpine 2.00 (OSX 1167 2008-08-23) MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Wed, 15 Jan 2014, Matthew Wilcox wrote: > Instead of separate read and write methods, use the generic AIO > infrastructure. In addition to giving us support for AIO, this adds > the locking between read() and truncate() that was missing. > > Signed-off-by: Matthew Wilcox ... > +static ssize_t xip_io(int rw, struct inode *inode, const struct iovec > *iov, > + loff_t start, loff_t end, unsigned nr_segs, > + get_block_t get_block, struct buffer_head *bh) > +{ > + ssize_t retval = 0; > + unsigned seg = 0; > + unsigned len; > + unsigned copied = 0; > + loff_t offset = start; > + loff_t max = start; > + void *addr; > + bool hole = false; > + > + while (offset < end) { > + void __user *buf = iov[seg].iov_base + copied; > + > + if (max == offset) { > + sector_t block = offset >> inode->i_blkbits; > + long size; > + memset(bh, 0, sizeof(*bh)); > + bh->b_size = ALIGN(end - offset, PAGE_SIZE); > + retval = get_block(inode, block, bh, rw == WRITE); > + if (retval) > + break; > + if (buffer_mapped(bh)) { > + retval = xip_get_addr(inode, bh, &addr); > + if (retval < 0) > + break; > + addr += offset - (block << inode->i_blkbits); > + hole = false; > + size = retval; > + } else { > + if (rw == WRITE) { > + retval = -EIO; > + break; > + } > + addr = NULL; > + hole = true; > + size = bh->b_size; > + } > + max = offset + size; > + } > + > + len = min_t(unsigned, iov[seg].iov_len - copied, max - offset); > + > + if (rw == WRITE) > + len -= __copy_from_user_nocache(addr, buf, len); > + else if (!hole) > + len -= __copy_to_user(buf, addr, len); > + else > + len -= __clear_user(buf, len); > + > + if (!len) > + break; > + > + offset += len; > + copied += len; > + if (copied == iov[seg].iov_len) { > + seg++; > + copied = 0; > + } > + } > + > + return (offset == start) ? retval : offset - start; > +} xip_io() as it is currently written has an issue where reads can go beyond inode->i_size. A quick test to show this issue is: create a new file write to the file for 1/2 a block seek back to 0 read for a full block The read in this case will return 4096, the length of the full block that was requested. It should return 2048, reading just the data that was written. The issue is that we do have a full block allocated in ext4, we do have it available via XIP via xip_get_addr(), and the only extra check that we currently have is a check against iov_len. iov_len in this case is 4096, so no one stops us from doing a full block read. Here is a quick patch that fixes this issue: --- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/xip.c b/fs/xip.c index e902593..1608f29 100644 --- a/fs/xip.c +++ b/fs/xip.c @@ -91,13 +91,16 @@ static ssize_t xip_io(int rw, struct inode *inode, const struct { ssize_t retval = 0; unsigned seg = 0; - unsigned len; + unsigned len, total_len; unsigned copied = 0; loff_t offset = start; loff_t max = start; void *addr; bool hole = false; + end = min(end, inode->i_size); + total_len = end - start; + while (offset < end) { void __user *buf = iov[seg].iov_base + copied; @@ -136,6 +139,7 @@ static ssize_t xip_io(int rw, struct inode *inode, const struct } len = min_t(unsigned, iov[seg].iov_len - copied, max - offset); + len = min(len, total_len); if (rw == WRITE) len -= __copy_from_user_nocache(addr, buf, len); @@ -149,6 +153,7 @@ static ssize_t xip_io(int rw, struct inode *inode, const struct offset += len; copied += len; + total_len -= len; if (copied == iov[seg].iov_len) { seg++; copied = 0;