diff mbox series

ubifs: Handle re-linking of inodes correctly while recovery

Message ID 20181028214407.20965-1-richard@nod.at
State Superseded
Delegated to: Richard Weinberger
Headers show
Series ubifs: Handle re-linking of inodes correctly while recovery | expand

Commit Message

Richard Weinberger Oct. 28, 2018, 9:44 p.m. UTC
UBIFS's recovery code strictly assumes that a deleted inode will never
come back, therefore it removes all data which belongs to that inode
as soon it faces an inode with link count 0 in the replay list.
Before O_TMPFILE this assumption was perfectly fine. With O_TMPFILE
it can lead to data loss upon a power-cut.

Consider a journal with entries like:
0: inode X (nlink = 0) /* O_TMPFILE was created */
1: data for inode X /* Someone writes to the temp file */
2: inode X (nlink = 0) /* inode was changed, xattr, chmod, … */
3: inode X (nlink = 1) /* inode was re-linked via linkat() */

Upon replay of entry #2 UBIFS will drop all data that belongs to inode X,
this will lead to an empty file after mounting.

As solution for this problem, scan the replay list for a re-link entry
before dropping data.

Fixes: 474b93704f32 ("ubifs: Implement O_TMPFILE")
Cc: stable@vger.kernel.org
Reported-by: Russell Senior <russell@personaltelco.net>
Reported-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
---
Russel, Rafał,

please give this patch another testing.
I'll also run it on different test systems before merging.

Thanks,
//richard
---
 fs/ubifs/replay.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

Comments

Russell Senior Oct. 29, 2018, 9:18 a.m. UTC | #1
UBIFS's recovery code strictly assumes that a deleted inode will never
come back, therefore it removes all data which belongs to that inode
as soon it faces an inode with link count 0 in the replay list.
Before O_TMPFILE this assumption was perfectly fine. With O_TMPFILE
it can lead to data loss upon a power-cut.

Consider a journal with entries like:
0: inode X (nlink = 0) /* O_TMPFILE was created */
1: data for inode X /* Someone writes to the temp file */
2: inode X (nlink = 0) /* inode was changed, xattr, chmod, … */
3: inode X (nlink = 1) /* inode was re-linked via linkat() */

Upon replay of entry #2 UBIFS will drop all data that belongs to inode X,
this will lead to an empty file after mounting.

As solution for this problem, scan the replay list for a re-link entry
before dropping data.

Fixes: 474b93704f32 ("ubifs: Implement O_TMPFILE")
Cc: stable@vger.kernel.org
Reported-by: Russell Senior <russell@personaltelco.net>
Reported-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Tested-by: Russell Senior <russell@personaltelco.net>
---
Russel, Rafał,

please give this patch another testing.
I'll also run it on different test systems before merging.

Thanks,
//richard
---
 fs/ubifs/replay.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/fs/ubifs/replay.c b/fs/ubifs/replay.c
index 4844538eb926..65a780685b82 100644
--- a/fs/ubifs/replay.c
+++ b/fs/ubifs/replay.c
@@ -209,6 +209,34 @@ static int trun_remove_range(struct ubifs_info *c, struct replay_entry *r)
 	return ubifs_tnc_remove_range(c, &min_key, &max_key);
 }
 
+/**
+ * inode_relinked - check whether inode in question will be re-linked.
+ * @c: UBIFS file-system description object
+ * @rino: replay entry to test
+ *
+ * O_TMPFILE files can be re-linked, this means link count goes from 0 to 1.
+ * This case needs special care, otherwise all references to the inode will
+ * be removed upon the first replay entry of an inode with link count 0
+ * is found.
+ */
+static bool inode_relinked(struct ubifs_info *c, struct replay_entry *rino)
+{
+	struct replay_entry *r = rino;
+
+	ubifs_assert(c, rino->deletion);
+	ubifs_assert(c, key_type(c, &rino->key) == UBIFS_INO_KEY);
+
+	list_for_each_entry_from(r, &c->replay_list, list) {
+		if (key_inum(c, &r->key) == key_inum(c, &rino->key) &&
+		    r->deletion == 0) {
+			ubifs_assert(c, r->sqnum > rino->sqnum);
+			return true;
+		}
+	}
+
+	return false;
+}
+
 /**
  * apply_replay_entry - apply a replay entry to the TNC.
  * @c: UBIFS file-system description object
@@ -236,6 +264,11 @@ static int apply_replay_entry(struct ubifs_info *c, struct replay_entry *r)
 			{
 				ino_t inum = key_inum(c, &r->key);
 
+				if (inode_relinked(c, r)) {
+					err = 0;
+					break;
+				}
+
 				err = ubifs_tnc_remove_ino(c, inum);
 				break;
 			}
Rafał Miłecki Nov. 1, 2018, 8:55 a.m. UTC | #2
On Sun, 28 Oct 2018 at 22:44, Richard Weinberger <richard@nod.at> wrote:
> UBIFS's recovery code strictly assumes that a deleted inode will never
> come back, therefore it removes all data which belongs to that inode
> as soon it faces an inode with link count 0 in the replay list.
> Before O_TMPFILE this assumption was perfectly fine. With O_TMPFILE
> it can lead to data loss upon a power-cut.
>
> Consider a journal with entries like:
> 0: inode X (nlink = 0) /* O_TMPFILE was created */
> 1: data for inode X /* Someone writes to the temp file */
> 2: inode X (nlink = 0) /* inode was changed, xattr, chmod, … */
> 3: inode X (nlink = 1) /* inode was re-linked via linkat() */
>
> Upon replay of entry #2 UBIFS will drop all data that belongs to inode X,
> this will lead to an empty file after mounting.
>
> As solution for this problem, scan the replay list for a re-link entry
> before dropping data.
>
> Fixes: 474b93704f32 ("ubifs: Implement O_TMPFILE")
> Cc: stable@vger.kernel.org
> Reported-by: Russell Senior <russell@personaltelco.net>
> Reported-by: Rafał Miłecki <zajec5@gmail.com>
> Signed-off-by: Richard Weinberger <richard@nod.at>

Thank you Richard!!!

Tested-by: Rafał Miłecki <rafal@milecki.pl>
Richard Weinberger Nov. 1, 2018, 9:13 a.m. UTC | #3
Am Donnerstag, 1. November 2018, 09:55:53 CET schrieb Rafał Miłecki:
> On Sun, 28 Oct 2018 at 22:44, Richard Weinberger <richard@nod.at> wrote:
> > UBIFS's recovery code strictly assumes that a deleted inode will never
> > come back, therefore it removes all data which belongs to that inode
> > as soon it faces an inode with link count 0 in the replay list.
> > Before O_TMPFILE this assumption was perfectly fine. With O_TMPFILE
> > it can lead to data loss upon a power-cut.
> >
> > Consider a journal with entries like:
> > 0: inode X (nlink = 0) /* O_TMPFILE was created */
> > 1: data for inode X /* Someone writes to the temp file */
> > 2: inode X (nlink = 0) /* inode was changed, xattr, chmod, … */
> > 3: inode X (nlink = 1) /* inode was re-linked via linkat() */
> >
> > Upon replay of entry #2 UBIFS will drop all data that belongs to inode X,
> > this will lead to an empty file after mounting.
> >
> > As solution for this problem, scan the replay list for a re-link entry
> > before dropping data.
> >
> > Fixes: 474b93704f32 ("ubifs: Implement O_TMPFILE")
> > Cc: stable@vger.kernel.org
> > Reported-by: Russell Senior <russell@personaltelco.net>
> > Reported-by: Rafał Miłecki <zajec5@gmail.com>
> > Signed-off-by: Richard Weinberger <richard@nod.at>
> 
> Thank you Richard!!!
> 
> Tested-by: Rafał Miłecki <rafal@milecki.pl>

Thanks for testing and providing the reproducer!
I'll send a v2 of the patch soon where I've optimized the list scanning more.
In fact, the correct and fasted approach is walking the replay list backwards
to find the final link state of an inode.

Thanks,
//richard
diff mbox series

Patch

diff --git a/fs/ubifs/replay.c b/fs/ubifs/replay.c
index 4844538eb926..65a780685b82 100644
--- a/fs/ubifs/replay.c
+++ b/fs/ubifs/replay.c
@@ -209,6 +209,34 @@  static int trun_remove_range(struct ubifs_info *c, struct replay_entry *r)
 	return ubifs_tnc_remove_range(c, &min_key, &max_key);
 }
 
+/**
+ * inode_relinked - check whether inode in question will be re-linked.
+ * @c: UBIFS file-system description object
+ * @rino: replay entry to test
+ *
+ * O_TMPFILE files can be re-linked, this means link count goes from 0 to 1.
+ * This case needs special care, otherwise all references to the inode will
+ * be removed upon the first replay entry of an inode with link count 0
+ * is found.
+ */
+static bool inode_relinked(struct ubifs_info *c, struct replay_entry *rino)
+{
+	struct replay_entry *r = rino;
+
+	ubifs_assert(c, rino->deletion);
+	ubifs_assert(c, key_type(c, &rino->key) == UBIFS_INO_KEY);
+
+	list_for_each_entry_from(r, &c->replay_list, list) {
+		if (key_inum(c, &r->key) == key_inum(c, &rino->key) &&
+		    r->deletion == 0) {
+			ubifs_assert(c, r->sqnum > rino->sqnum);
+			return true;
+		}
+	}
+
+	return false;
+}
+
 /**
  * apply_replay_entry - apply a replay entry to the TNC.
  * @c: UBIFS file-system description object
@@ -236,6 +264,11 @@  static int apply_replay_entry(struct ubifs_info *c, struct replay_entry *r)
 			{
 				ino_t inum = key_inum(c, &r->key);
 
+				if (inode_relinked(c, r)) {
+					err = 0;
+					break;
+				}
+
 				err = ubifs_tnc_remove_ino(c, inum);
 				break;
 			}