From patchwork Fri Sep 28 15:44:10 2012
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dmitry Monakhov <dmonakhov@openvz.org>
X-Patchwork-Id: 187817
Return-Path: <linux-ext4-owner@vger.kernel.org>
X-Original-To: patchwork-incoming@ozlabs.org
Delivered-To: patchwork-incoming@ozlabs.org
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by ozlabs.org (Postfix) with ESMTP id C1CD12C00C9
	for <patchwork-incoming@ozlabs.org>;
	Sat, 29 Sep 2012 01:44:38 +1000 (EST)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1758078Ab2I1Pof (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);
	Fri, 28 Sep 2012 11:44:35 -0400
Received: from mail-la0-f46.google.com ([209.85.215.46]:61221 "EHLO
	mail-la0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1758538Ab2I1Pod (ORCPT
	<rfc822; linux-ext4@vger.kernel.org>); Fri, 28 Sep 2012 11:44:33 -0400
Received: by mail-la0-f46.google.com with SMTP id h6so1127961lag.19
	for <linux-ext4@vger.kernel.org>;
	Fri, 28 Sep 2012 08:44:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=gmail.com; s=20120113;
	h=sender:from:to:cc:subject:date:message-id:x-mailer:in-reply-to
	:references; bh=n4SUyNSzjKKMw5Wz0zLUa+HfYD51VFK1q/y/okvJ4nY=;
	b=MAx6FlPp3lKhGiAZPUWNOySjr079COJkoy1lH0RY2CpbCnK2g9/yod9Y2PmyEUNWnL
	CK5mySc/RkoxRAA3MDn9DVxBPMOm/mumBCC9RHKtGLG+iF7mOwbYR8I1qt9bFehLyBCs
	HaQdSYGn87FkxPDv5mMUViy36EzroqlXSh+wdrHXDRjsJGe+GxSZ3o9K/B5x3AeWYxUo
	eLO1damEVplRtBXj9wk4+RTVV1M+4BPomMKTZOvfY/uVBgYIvnf0glY7l3BYbFzS8mhu
	/t8Akk3KhvhKtr8S/mwZvA+fE4k7gGaYLI8dBEJd6fc0Hk+oqDMh3Q5mNXQm2Dg2TKDm
	8HgQ==
Received: by 10.152.48.102 with SMTP id k6mr6178802lan.12.1348847072645;
	Fri, 28 Sep 2012 08:44:32 -0700 (PDT)
Received: from smtp.gmail.com (swsoft-msk-nat.sw.ru. [195.214.232.10])
	by mx.google.com with ESMTPS id c6sm2580549lbn.1.2012.09.28.08.44.31
	(version=TLSv1/SSLv3 cipher=OTHER);
	Fri, 28 Sep 2012 08:44:32 -0700 (PDT)
From: Dmitry Monakhov <dmonakhov@openvz.org>
To: linux-ext4@vger.kernel.org
Cc: tytso@mit.edu, jack@suse.cz, lczerner@redhat.com,
	Dmitry Monakhov <dmonakhov@openvz.org>
Subject: [PATCH 10/11] ext4: punch_hole should wait for DIO writers V2
Date: Fri, 28 Sep 2012 19:44:10 +0400
Message-Id: <1348847051-6746-11-git-send-email-dmonakhov@openvz.org>
X-Mailer: git-send-email 1.7.7.6
In-Reply-To: <1348847051-6746-1-git-send-email-dmonakhov@openvz.org>
References: <1348847051-6746-1-git-send-email-dmonakhov@openvz.org>
Sender: linux-ext4-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-ext4.vger.kernel.org>
X-Mailing-List: linux-ext4@vger.kernel.org

punch_hole is the place where we have to wait for all existing writers
(writeback, aio, dio), but currently we simply flush pended end_io request
which is not sufficient. Other issue is that punch_hole performed w/o i_mutex
held which obviously result in dangerous data corruption due to
write-after-free.

This patch performs following changes:
- Guard punch_hole with i_mutex
- Recheck inode flags under i_mutex
- Block all new dio readers in order to prevent information leak caused by
  read-after-free pattern.
- punch_hole now wait for all writers in flight
  NOTE: XXX write-after-free race is still possible because new dirty pages
  may appear due to mmap(), and currently there is no easy way to stop
  writeback while punch_hole is in progress.

Changes from V1:
  Add flag checks once we hold i_mutex

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext4/extents.c |   50 +++++++++++++++++++++++++++++++++-----------------
 1 files changed, 33 insertions(+), 17 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 70ba122..a1d16eb 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4568,9 +4568,29 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
 	loff_t first_page_offset, last_page_offset;
 	int credits, err = 0;
 
+	/*
+	 * Write out all dirty pages to avoid race conditions
+	 * Then release them.
+	 */
+	if (mapping->nrpages && mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
+		err = filemap_write_and_wait_range(mapping,
+			offset, offset + length - 1);
+
+		if (err)
+			return err;
+	}
+
+	mutex_lock(&inode->i_mutex);
+	/* Need recheck file flags under mutex */
+	/* It's not possible punch hole on append only file */
+	if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
+		return -EPERM;
+	if (IS_SWAPFILE(inode))
+		return -ETXTBSY;
+
 	/* No need to punch hole beyond i_size */
 	if (offset >= inode->i_size)
-		return 0;
+		goto out_mutex;
 
 	/*
 	 * If the hole extends beyond i_size, set the hole
@@ -4588,33 +4608,25 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
 	first_page_offset = first_page << PAGE_CACHE_SHIFT;
 	last_page_offset = last_page << PAGE_CACHE_SHIFT;
 
-	/*
-	 * Write out all dirty pages to avoid race conditions
-	 * Then release them.
-	 */
-	if (mapping->nrpages && mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
-		err = filemap_write_and_wait_range(mapping,
-			offset, offset + length - 1);
-
-		if (err)
-			return err;
-	}
-
 	/* Now release the pages */
 	if (last_page_offset > first_page_offset) {
 		truncate_pagecache_range(inode, first_page_offset,
 					 last_page_offset - 1);
 	}
 
-	/* finish any pending end_io work */
+	/* Wait all existing dio workers, newcomers will block on i_mutex */
+	ext4_inode_block_unlocked_dio(inode);
+	inode_dio_wait(inode);
 	err = ext4_flush_completed_IO(inode);
 	if (err)
-		return err;
+		goto out_dio;
 
 	credits = ext4_writepage_trans_blocks(inode);
 	handle = ext4_journal_start(inode, credits);
-	if (IS_ERR(handle))
-		return PTR_ERR(handle);
+	if (IS_ERR(handle)) {
+		err = PTR_ERR(handle);
+		goto out_dio;
+	}
 
 
 	/*
@@ -4706,6 +4718,10 @@ out:
 	inode->i_mtime = inode->i_ctime = ext4_current_time(inode);
 	ext4_mark_inode_dirty(handle, inode);
 	ext4_journal_stop(handle);
+out_dio:
+	ext4_inode_resume_unlocked_dio(inode);
+out_mutex:
+	mutex_unlock(&inode->i_mutex);
 	return err;
 }
 int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,