From patchwork Tue Jun 7 15:07:55 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Amir G." X-Patchwork-Id: 99250 X-Patchwork-Delegate: tytso@mit.edu Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 1DA5CB6FBB for ; Wed, 8 Jun 2011 01:10:17 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756597Ab1FGPKP (ORCPT ); Tue, 7 Jun 2011 11:10:15 -0400 Received: from mail-ww0-f44.google.com ([74.125.82.44]:41767 "EHLO mail-ww0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755640Ab1FGPKO (ORCPT ); Tue, 7 Jun 2011 11:10:14 -0400 Received: by mail-ww0-f44.google.com with SMTP id 36so5035787wwa.1 for ; Tue, 07 Jun 2011 08:10:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:sender:from:to:cc:subject:date:message-id :x-mailer:in-reply-to:references; bh=uFNF1+V7tC4NxbtGCa5tt6VhJEKSM80HCTK0NnnEbzA=; b=I97cz3fVhuvf/vfZ7Jw+XHZJOVw6Hte/Ibscbs1+CNe9LfM7r2Kjl/8UD4JyMeZHfg UJXlg0KqnGWo9BXppwo8HEOQX2D9zXLb4jAYVedEuXuw5eFq+wTP03GK6kk95/ezA8WU F1u0dpgUgb41EhvoSJhLZ8hLWDMsaYpzvhqXw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:from:to:cc:subject:date:message-id:x-mailer:in-reply-to :references; b=NtJ8FZkcdeqM7JLCdW/vgSbAn7WpsdkpOT8RexaW/rJWOZtGkbSZ7jy9fy/g6l4HBM 6HqFZBTYGZbbvbwnADx9SwyhVdDPERliF2QcgAf8rFgaTyAlaPWByOa47kb01Kdsho6S XzINNBmNveqcrpuuuXb9bTd20f9d8Tk1/flEM= Received: by 10.227.43.9 with SMTP id u9mr6298517wbe.74.1307459413458; Tue, 07 Jun 2011 08:10:13 -0700 (PDT) Received: from localhost.localdomain (bzq-218-153-66.cablep.bezeqint.net [81.218.153.66]) by mx.google.com with ESMTPS id en1sm3622645wbb.52.2011.06.07.08.10.11 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 07 Jun 2011 08:10:13 -0700 (PDT) From: amir73il@users.sourceforge.net To: linux-ext4@vger.kernel.org Cc: tytso@mit.edu, lczerner@redhat.com, Amir Goldstein , Yongqiang Yang Subject: [PATCH v1 28/36] ext4: snapshot list - read through to previous snapshot Date: Tue, 7 Jun 2011 18:07:55 +0300 Message-Id: <1307459283-22130-29-git-send-email-amir73il@users.sourceforge.net> X-Mailer: git-send-email 1.7.4.1 In-Reply-To: <1307459283-22130-1-git-send-email-amir73il@users.sourceforge.net> References: <1307459283-22130-1-git-send-email-amir73il@users.sourceforge.net> Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org From: Amir Goldstein On snapshot page read, the function ext4_get_block() is called to map the page to a disk block. If the page is not mapped in the snapshot file, the newer snapshots on the list are checked and the oldest found mapping is returned. If the page is not mapped in any of the newer snapshots, a direct mapping to the block device is returned. Signed-off-by: Amir Goldstein Signed-off-by: Yongqiang Yang --- fs/ext4/snapshot_inode.c | 74 +++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 73 insertions(+), 1 deletions(-) diff --git a/fs/ext4/snapshot_inode.c b/fs/ext4/snapshot_inode.c index 74b455d..a97411e 100644 --- a/fs/ext4/snapshot_inode.c +++ b/fs/ext4/snapshot_inode.c @@ -46,12 +46,62 @@ * in which case 'prev_snapshot' is pointed to the previous snapshot * on the list or set to NULL to indicate read through to block device. */ +/* + * In-memory snapshot list manipulation is protected by snapshot_mutex. + * In this function we read the in-memory snapshot list without holding + * snapshot_mutex, because we don't want to slow down snapshot read performance. + * Following is a proof, that even though we don't hold snapshot_mutex here, + * reading the list is safe from races with snapshot list delete and add (take). + * + * Proof of no race with snapshot delete: + * -------------------------------------- + * We get here only when reading from an enabled snapshot or when reading + * through from an enabled snapshot to a newer snapshot. Snapshot delete + * operation is only allowed for a disabled snapshot, when no older enabled + * snapshot exists (i.e., the deleted snapshot in not 'in-use'). Hence, + * read through is safe from races with snapshot list delete operations. + * + * Proof of no race with snapshot take: + * ------------------------------------ + * Snapshot B take is composed of the following steps: + * ext4_snapshot_create(): + * - Add snapshot B to head of list (active_snapshot is A). + * - Allocate and copy snapshot B initial blocks. + * ext4_snapshot_take(): + * - Freeze FS + * - Clear snapshot A 'active' flag. + * - Set snapshot B 'list'+'active' flags. + * - Set snapshot B as active snapshot (active_snapshot=B). + * - Unfreeze FS + * + * Note that we do not need to rely on correct order of instructions within + * each of the functions above, but we can assume that Freeze FS will provide + * a strong barrier between adding B to list and the ops inside snapshot_take. + * + * When reading from snapshot A during snapshot B take, we have 2 cases: + * 1. is_active(A) is tested before setting active_snapshot=B - + * read through from A to block device. + * 2. is_active(A) is tested after setting active_snapshot=B - + * read through from A to B. + * + * When reading from snapshot B during snapshot B take, we have 2 cases: + * 1. B->flags and B->prev are read before adding B to list + * AND/OR before setting the 'list'+'active' flags - + * access to B denied. + * 2. is_active(B) is tested after setting active_snapshot=B + * AND/OR after setting the 'list'+'active' flags - + * read through from B to block device. + */ static int ext4_snapshot_get_block_access(struct inode *inode, struct inode **prev_snapshot) { struct ext4_inode_info *ei = EXT4_I(inode); unsigned long flags = ext4_get_snapstate_flags(inode); + struct list_head *prev = ei->i_snaplist.prev; + if (!(flags & 1UL<i_snaplist) + /* not on snapshots list? */ + return -EIO; + + if (prev == &EXT4_SB(inode->i_sb)->s_snapshot_list) + /* active snapshot not found on list? */ + return -EIO; + + /* read through to prev snapshot on the list */ + ei = list_entry(prev, struct ext4_inode_info, i_snaplist); + *prev_snapshot = &ei->vfs_inode; + + if (!ext4_snapshot_file(*prev_snapshot)) + /* non snapshot file on the list? */ + return -EIO; + + return 0; } #ifdef CONFIG_EXT4_DEBUG @@ -122,6 +188,7 @@ static int ext4_snapshot_read_through(struct inode *inode, sector_t iblock, map.m_pblk = 0; map.m_len = bh_result->b_size >> inode->i_blkbits; +get_block: prev_snapshot = NULL; /* request snapshot file read access */ err = ext4_snapshot_get_block_access(inode, &prev_snapshot); @@ -134,6 +201,11 @@ static int ext4_snapshot_read_through(struct inode *inode, sector_t iblock, prev_snapshot ? prev_snapshot->i_generation : 0); if (err < 0) return err; + if (!err && prev_snapshot) { + /* hole in snapshot - check again with prev snapshot */ + inode = prev_snapshot; + goto get_block; + } if (!err) /* hole in active snapshot - read though to block device */ return 0;