diff mbox

[v2] ext4: always pre-allocate max depth for path

Message ID 55AFDADC-DF82-41FC-8C5C-A0E3D97EF51A@intel.com
State New, archived
Headers show

Commit Message

Faccini, Bruno June 23, 2016, 11:52 a.m. UTC
From: Bruno Faccini <bruno.faccini@intel.com>

I have first found this way to fix holes in previous ext4 layers versions
where an array of struct ext4_ext_path had been allocated with an arbitrary
evaluated size and finally could overrun upon ext_depth() growth outside
i_data_sem protection. But it seems it can still help with recent ext4
version, to avoid re-allocation need and overhead when it can be allocated
to max possible extent depth (ie, 5 presently) at first time and for a low
cost regarding its memory foot-print, it should also avoid further invalid
dereference by underlying callers sharing same ppath (with present
inter-routines path re-use scheme), and also upon re-allocation error.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Cc: Andreas Dilger <adilger@dilger.ca>
Cc: linux-ext4@vger.kernel.org
—
 ext4.h        |    5 +-
 extents.c     |  117 +++++++++++++++++++++++++++++++++++++++-------------------
 move_extent.c |   10 +---
 super.c       |    3 -
 4 files changed, 90 insertions(+), 45 deletions(-)

—

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris, 
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Comments

Theodore Ts'o July 7, 2016, 4:39 p.m. UTC | #1
On Thu, Jun 23, 2016 at 11:52:29AM +0000, Faccini, Bruno wrote:
> From: Bruno Faccini <bruno.faccini@intel.com>
> 
> I have first found this way to fix holes in previous ext4 layers versions
> where an array of struct ext4_ext_path had been allocated with an arbitrary
> evaluated size and finally could overrun upon ext_depth() growth outside
> i_data_sem protection. But it seems it can still help with recent ext4
> version, to avoid re-allocation need and overhead when it can be allocated
> to max possible extent depth (ie, 5 presently) at first time and for a low
> cost regarding its memory foot-print, it should also avoid further invalid
> dereference by underlying callers sharing same ppath (with present
> inter-routines path re-use scheme), and also upon re-allocation error.
> 
> Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
> Cc: Andreas Dilger <adilger@dilger.ca>
> Cc: linux-ext4@vger.kernel.org

Since this came up on the ext4 call, let me give a quick update about
my concerns about this patch.

The problem with the max possible extent depth assumption is that this
assumes non-pathological trees.  Unfortunately, at the moment we don't
dont ever shrink the extent tree as we delete entries from the tree,
and we aren't obeying the requirements of a formal B+ tree, which is
that all nodes (except for a trivial tree consiting of a single leaf
node at the root) must be at least half-full.  So while it is highly
unlikely, it is possible to create highly pathological trees that
could potentially be deeper than five deep.

They are extremely unlikely to happen in practice, granted, but if we
are relying on this to prevent array bound overflow attacks, a
malicious attacker could potentially be very happy to arrange such as
situation.

So at least in the short run, we may be better off finding all of the
places where we drop i_data_sem after we've allocated the struct path
array, and after we grab it again for writing, double check to see if
we need to reallocate it.  For performance reasons I'm happy always
allocating an extra array element or two to minimize the need to do
the reallocation, but for correctness's sake it would be good if we
could easily test the code path where we need to do a reallocation, as
well as demonstrate that we do the right thing if the reallocation
fails...

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 1e20fa9..73d5866 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1340,6 +1340,9 @@  struct ext4_sb_info {
 	unsigned long s_ext_extents;
 #endif
 
+	/* maximum possible extents tree depth, to be computed at mount time */
+	unsigned int s_max_ext_tree_depth;
+
 	/* for buddy allocator */
 	struct ext4_group_info ***s_group_info;
 	struct inode *s_buddy_cache;
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index b52fea3..da9142d 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -871,19 +871,18 @@  ext4_find_extent(struct inode *inode, ext4_lblk_t block,
 
 	if (path) {
 		ext4_ext_drop_refs(path);
-		if (depth > path[0].p_maxdepth) {
-			kfree(path);
-			*orig_path = path = NULL;
-		}
-	}
-	if (!path) {
+		/* path has been allocated for the max possible depth
+		 * so we can simply reuse it */
+		/* XXX need to clear/zero old path content? */
+	} else {
 		/* account possible depth increase */
-		path = kzalloc(sizeof(struct ext4_ext_path) * (depth + 2),
+		path = kzalloc(sizeof(struct ext4_ext_path) *
+				EXT4_SB(inode->i_sb)->s_max_ext_tree_depth,
 				GFP_NOFS);
 		if (unlikely(!path))
 			return ERR_PTR(-ENOMEM);
-		path[0].p_maxdepth = depth + 1;
 	}
+	path[0].p_maxdepth = depth + 1;
 	path[0].p_hdr = eh;
 	path[0].p_bh = NULL;
 
@@ -934,9 +933,11 @@  ext4_find_extent(struct inode *inode, ext4_lblk_t block,
 
 err:
 	ext4_ext_drop_refs(path);
-	kfree(path);
-	if (orig_path)
-		*orig_path = NULL;
+
+	/* do not free *orig_path, it is likely to be referenced by callers */
+	if (!orig_path)
+		kfree(path);
+
 	return ERR_PTR(ret);
 }
 
@@ -2147,7 +2148,8 @@  static int ext4_fill_fiemap_extents(struct inode *inode,
 				    ext4_lblk_t block, ext4_lblk_t num,
 				    struct fiemap_extent_info *fieinfo)
 {
-	struct ext4_ext_path *path = NULL;
+	struct ext4_ext_path *path = NULL, *orig_path = NULL;
+	struct ext4_ext_path **orig_ppath = &orig_path;
 	struct ext4_extent *ex;
 	struct extent_status es;
 	ext4_lblk_t next, next_del, start = 0, end = 0;
@@ -2161,13 +2163,16 @@  static int ext4_fill_fiemap_extents(struct inode *inode,
 		/* find extent for this block */
 		down_read(&EXT4_I(inode)->i_data_sem);
 
-		path = ext4_find_extent(inode, block, &path, 0);
+		path = ext4_find_extent(inode, block, orig_ppath, 0);
 		if (IS_ERR(path)) {
 			up_read(&EXT4_I(inode)->i_data_sem);
 			err = PTR_ERR(path);
 			path = NULL;
 			break;
 		}
+		if (!orig_path)
+			orig_path = path;
+		/* XXX else BUG_ON(path != orig_path); */
 
 		depth = ext_depth(inode);
 		if (unlikely(path[depth].p_hdr == NULL)) {
@@ -2287,8 +2292,8 @@  static int ext4_fill_fiemap_extents(struct inode *inode,
 		block = es.es_lblk + es.es_len;
 	}
 
-	ext4_ext_drop_refs(path);
-	kfree(path);
+	ext4_ext_drop_refs(orig_path);
+	kfree(orig_path);
 	return err;
 }
 
@@ -2910,7 +2915,8 @@  again:
 			path[k].p_block =
 				le16_to_cpu(path[k].p_hdr->eh_entries)+1;
 	} else {
-		path = kzalloc(sizeof(struct ext4_ext_path) * (depth + 1),
+		path = kzalloc(sizeof(struct ext4_ext_path) *
+			       EXT4_SB(inode->i_sb)->s_max_ext_tree_depth,
 			       GFP_NOFS);
 		if (path == NULL) {
 			ext4_journal_stop(handle);
@@ -3051,13 +3057,21 @@  out:
  */
 void ext4_ext_init(struct super_block *sb)
 {
+	ext4_fsblk_t maxblocks;
+	unsigned max_ext_entry_size = max(sizeof(struct ext4_extent),
+					  sizeof(struct ext4_extent_idx));
+	unsigned min_ext_per_node = (sb->s_blocksize -
+				     sizeof(struct ext4_extent_header)) /
+				    max_ext_entry_size;
+
 	/*
 	 * possible initialization would be here
 	 */
 
 	if (ext4_has_feature_extents(sb)) {
-#if defined(AGGRESSIVE_TEST) || defined(CHECK_BINSEARCH) || defined(EXTENTS_STATS)
-		printk(KERN_INFO "EXT4-fs: file extents enabled"
+#if defined(AGGRESSIVE_TEST) || defined(CHECK_BINSEARCH) || \
+    defined(EXTENTS_STATS) || defined(EXT_DEBUG)
+		printk(KERN_INFO "EXT4-fs (%s): file extents enabled"
 #ifdef AGGRESSIVE_TEST
 		       ", aggressive tests"
 #endif
@@ -3067,7 +3081,32 @@  void ext4_ext_init(struct super_block *sb)
 #ifdef EXTENTS_STATS
 		       ", stats"
 #endif
-		       "\n");
+		       "\n", sb->s_id);
+#endif
+
+
+		EXT4_SB(sb)->s_max_ext_tree_depth = 1;
+
+		maxblocks = sb->s_maxbytes >> EXT4_BLOCK_SIZE_BITS(sb);
+
+		/* 1st/root level/node of extents tree stands in i_data and
+		 * entries stored in tree nodes can be of type ext4_extent
+		 * (leaf node) or ext4_extent_idx (internal node) */
+		do_div(maxblocks,
+		       (sizeof(((struct ext4_inode_info *)NULL)->i_data) -
+			sizeof(struct ext4_extent_header)) /
+		       max_ext_entry_size);
+
+		/* compute maximum extents tree depth for a fully populated
+		 * file of max size made of only minimal/1-block extents */
+		while (maxblocks > 0) {
+			do_div(maxblocks, min_ext_per_node);
+			EXT4_SB(sb)->s_max_ext_tree_depth++;
+		}
+
+#ifdef EXT_DEBUG
+		printk(KERN_INFO "EXT4-fs (%s): maximum tree depth=%u\n",
+		       sb->s_id, EXT4_SB(sb)->s_max_ext_tree_depth);
 #endif
 #ifdef EXTENTS_STATS
 		spin_lock_init(&EXT4_SB(sb)->s_ext_stats_lock);
@@ -5348,18 +5387,18 @@  ext4_ext_shift_extents(struct inode *inode, handle_t *handle,
 		       ext4_lblk_t start, ext4_lblk_t shift,
 		       enum SHIFT_DIRECTION SHIFT)
 {
-	struct ext4_ext_path *path;
+	struct ext4_ext_path *path, *orig_path;
 	int ret = 0, depth;
 	struct ext4_extent *extent;
 	ext4_lblk_t stop, *iterator, ex_start, ex_end;
 
 	/* Let path point to the last extent */
-	path = ext4_find_extent(inode, EXT_MAX_BLOCKS - 1, NULL, 0);
-	if (IS_ERR(path))
-		return PTR_ERR(path);
+	orig_path = ext4_find_extent(inode, EXT_MAX_BLOCKS - 1, NULL, 0);
+	if (IS_ERR(orig_path))
+		return PTR_ERR(orig_path);
 
-	depth = path->p_depth;
-	extent = path[depth].p_ext;
+	depth = orig_path->p_depth;
+	extent = orig_path[depth].p_ext;
 	if (!extent)
 		goto out;
 
@@ -5371,9 +5410,11 @@  ext4_ext_shift_extents(struct inode *inode, handle_t *handle,
 	 * sure the hole is big enough to accommodate the shift.
 	*/
 	if (SHIFT == SHIFT_LEFT) {
-		path = ext4_find_extent(inode, start - 1, &path, 0);
-		if (IS_ERR(path))
-			return PTR_ERR(path);
+		path = ext4_find_extent(inode, start - 1, &orig_path, 0);
+		if (IS_ERR(path)) {
+			ret = PTR_ERR(path);
+			goto out;
+		}
 		depth = path->p_depth;
 		extent =  path[depth].p_ext;
 		if (extent) {
@@ -5387,9 +5428,8 @@  ext4_ext_shift_extents(struct inode *inode, handle_t *handle,
 
 		if ((start == ex_start && shift > ex_start) ||
 		    (shift > start - ex_end)) {
-			ext4_ext_drop_refs(path);
-			kfree(path);
-			return -EINVAL;
+			ret = -EINVAL;
+			goto out;
 		}
 	}
 
@@ -5405,15 +5445,18 @@  ext4_ext_shift_extents(struct inode *inode, handle_t *handle,
 
 	/* Its safe to start updating extents */
 	while (start < stop) {
-		path = ext4_find_extent(inode, *iterator, &path, 0);
-		if (IS_ERR(path))
-			return PTR_ERR(path);
+		path = ext4_find_extent(inode, *iterator, &orig_path, 0);
+		if (IS_ERR(path)) {
+			ret = PTR_ERR(path);
+			goto out;
+		}
 		depth = path->p_depth;
 		extent = path[depth].p_ext;
 		if (!extent) {
 			EXT4_ERROR_INODE(inode, "unexpected hole at %lu",
 					 (unsigned long) *iterator);
-			return -EFSCORRUPTED;
+			ret =  -EFSCORRUPTED;
+			goto out;
 		}
 		if (SHIFT == SHIFT_LEFT && *iterator >
 		    le32_to_cpu(extent->ee_block)) {
@@ -5445,8 +5488,8 @@  ext4_ext_shift_extents(struct inode *inode, handle_t *handle,
 			break;
 	}
 out:
-	ext4_ext_drop_refs(path);
-	kfree(path);
+	ext4_ext_drop_refs(orig_path);
+	kfree(orig_path);
 	return ret;
 }
 
diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index fb6f117..7c319c5 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -39,13 +39,11 @@  get_ext_path(struct inode *inode, ext4_lblk_t lblock,
 	path = ext4_find_extent(inode, lblock, ppath, EXT4_EX_NOCACHE);
 	if (IS_ERR(path))
 		return PTR_ERR(path);
-	if (path[ext_depth(inode)].p_ext == NULL) {
-		ext4_ext_drop_refs(path);
-		kfree(path);
-		*ppath = NULL;
+	if (path[ext_depth(inode)].p_ext == NULL)
 		return -ENODATA;
-	}
-	*ppath = path;
+	if (*ppath == NULL)
+		*ppath = path;
+	/* XXX else BUG_ON(path != *ppath); */
 	return 0;
 }
 
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 486e869..199157a 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3701,6 +3701,8 @@  static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 		if (ext4_multi_mount_protect(sb, le64_to_cpu(es->s_mmp_block)))
 			goto failed_mount3a;
 
+	ext4_ext_init(sb); /* needed before using extent-mapped journal */
+
 	/*
 	 * The first inode we look at is the journal inode.  Don't try
 	 * root first: it may be modified in the journal!
@@ -3894,7 +3896,6 @@  no_journal:
 		goto failed_mount4a;
 	}
 
-	ext4_ext_init(sb);
 	err = ext4_mb_init(sb);
 	if (err) {
 		ext4_msg(sb, KERN_ERR, "failed to initialize mballoc (%d)",