Patchwork e2fsck: Don't skip time based checks if broken_system_clock=1 unnecessarily

login
register
mail settings
Submitter Jan Kara
Date March 28, 2014, 3:15 a.m.
Message ID <1395976531-7447-1-git-send-email-jack@suse.cz>
Download mbox | patch
Permalink /patch/334572/
State Rejected
Headers show

Comments

Jan Kara - March 28, 2014, 3:15 a.m.
When option broken_system_clock is set, we unconditionally skip checks
whether e2fsck should check the fs because the fs was last checked too
long ago. Distributions however set broken_system_clock by default as
some of the systems simply have the clocks broken and e2fsck stops the
boot because of that which is harsh given how minor issue that is (at
least for the fs). Thus checking every X days doesn't work with these
distributions.

Change slightly how broken_system_clock works to make it more useful. We
first check in the superblock whether the current time and times stored
in the superblock look sensible. If yes, we also do honor check
interval. If the current time and times in superblock don't correspond
properly, we skip the check interval test.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 e2fsck/super.c | 44 ++++++++++++++++++++++++++------------------
 e2fsck/unix.c  | 11 ++++++-----
 2 files changed, 32 insertions(+), 23 deletions(-)
Theodore Ts'o - April 21, 2014, 12:39 a.m.
On Fri, Mar 28, 2014 at 04:15:31AM +0100, Jan Kara wrote:
> When option broken_system_clock is set, we unconditionally skip checks
> whether e2fsck should check the fs because the fs was last checked too
> long ago. Distributions however set broken_system_clock by default as
> some of the systems simply have the clocks broken and e2fsck stops the
> boot because of that which is harsh given how minor issue that is (at
> least for the fs). Thus checking every X days doesn't work with these
> distributions.

There are a couple of different ways in which the time can be
considered "broken":
>
1) The system has no battery-backed block and so the time is always
January 1, 1970 on boot.  These systems are taken care of by the
TIME_INSANE checks.

2) The distribution the timezone is incorrect when e2fsck is run, so
depending on whether you are east or west of GMT, the clock could
appear to be up to 24 hours in the future, which would falsely trigger
the e2fsck check.  That is now taken care of by the accept_time_fudge
hack, which is enabled by default.

3) The clock is totally insane, which means that it could be any
value; it could have a time in 2037; it could have a time in the
1970's.  This is what broken_system_clock means now.

If the system clock is completely untrustworthy, you really do want to
skip all time based checks, because there is no way to tell whether
the times look "sensible" or not.  If the clock warps forward by N
years, it's still going to look sensible, and it can force a time
based check when one shouldn't be needed.


I think the issue is that there may be some distributions that set
broken_system_lock way back in the past, before we added the
accept_time_fudge (which was added in 2009).  So if the concern is
dealing with people who have systems in Western Europe whose RTC ticks
localtime (as opposed to UTC), that problem was fixed by
accept_time_fudge.  And if the concern is systems in state #1, that
was dealt with by the TIME_INSANE checks in 2010.

So if those are the main concerns, you can simply get rid of setting
broken_system_clock, since those problems are now solved by default.

But if your worry is RTC clocks which are _totally_ bonkers, then we
really have to do what we are currently doing.  Besides, I thought
these days, most folks aren't enabling time based checks any more.
IMHO, the real right answer is that distributions should be setting
folks to use LVM by default, and to do checks using snapshots every
month or so run out of cron.

						 - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara - April 22, 2014, 3:47 p.m.
On Sun 20-04-14 20:39:56, Ted Tso wrote:
> On Fri, Mar 28, 2014 at 04:15:31AM +0100, Jan Kara wrote:
> > When option broken_system_clock is set, we unconditionally skip checks
> > whether e2fsck should check the fs because the fs was last checked too
> > long ago. Distributions however set broken_system_clock by default as
> > some of the systems simply have the clocks broken and e2fsck stops the
> > boot because of that which is harsh given how minor issue that is (at
> > least for the fs). Thus checking every X days doesn't work with these
> > distributions.
> 
> There are a couple of different ways in which the time can be
> considered "broken":
> >
> 1) The system has no battery-backed block and so the time is always
> January 1, 1970 on boot.  These systems are taken care of by the
> TIME_INSANE checks.
> 
> 2) The distribution the timezone is incorrect when e2fsck is run, so
> depending on whether you are east or west of GMT, the clock could
> appear to be up to 24 hours in the future, which would falsely trigger
> the e2fsck check.  That is now taken care of by the accept_time_fudge
> hack, which is enabled by default.
> 
> 3) The clock is totally insane, which means that it could be any
> value; it could have a time in 2037; it could have a time in the
> 1970's.  This is what broken_system_clock means now.
> 
> If the system clock is completely untrustworthy, you really do want to
> skip all time based checks, because there is no way to tell whether
> the times look "sensible" or not.  If the clock warps forward by N
> years, it's still going to look sensible, and it can force a time
> based check when one shouldn't be needed.
>
> I think the issue is that there may be some distributions that set
> broken_system_lock way back in the past, before we added the
> accept_time_fudge (which was added in 2009).  So if the concern is
> dealing with people who have systems in Western Europe whose RTC ticks
> localtime (as opposed to UTC), that problem was fixed by
> accept_time_fudge.  And if the concern is systems in state #1, that
> was dealt with by the TIME_INSANE checks in 2010.
> 
> So if those are the main concerns, you can simply get rid of setting
> broken_system_clock, since those problems are now solved by default.
  So looking into our bugzilla we added broken_system_clock=1 to
e2fsck.conf because of a report of a machine where the time went few days
backward after a reboot (the most likely explanation to me seems that the
system time wasn't getting properly written to the hw clock for some
reason).

Anyway, thinking about this again maybe my point should have been: Isn't it
too harsh to declare mount time in future as requiring manual intervention?
Why don't we just reset it (or just leave it alone) in preen mode without
requiring user interaction? I understand it may indicate some corruption
problem but we don't really care about corruption in this particular field
much...

								Honza

Patch

diff --git a/e2fsck/super.c b/e2fsck/super.c
index e9892e2db6b3..f2bdf98f26dc 100644
--- a/e2fsck/super.c
+++ b/e2fsck/super.c
@@ -840,28 +840,36 @@  void check_super_block(e2fsck_t ctx)
 	 * Check to see if the superblock last mount time or last
 	 * write time is in the future.
 	 */
-	if (!broken_system_clock &&
-	    !(ctx->flags & E2F_FLAG_TIME_INSANE) &&
+	if (!(ctx->flags & E2F_FLAG_TIME_INSANE) &&
 	    fs->super->s_mtime > (__u32) ctx->now) {
-		pctx.num = fs->super->s_mtime;
-		problem = PR_0_FUTURE_SB_LAST_MOUNT;
-		if (fs->super->s_mtime <= (__u32) ctx->now + ctx->time_fudge)
-			problem = PR_0_FUTURE_SB_LAST_MOUNT_FUDGED;
-		if (fix_problem(ctx, problem, &pctx)) {
-			fs->super->s_mtime = ctx->now;
-			fs->flags |= EXT2_FLAG_DIRTY;
+		if (broken_system_clock)
+			ctx->flags |= E2F_FLAG_TIME_INSANE;
+		else {
+			pctx.num = fs->super->s_mtime;
+			problem = PR_0_FUTURE_SB_LAST_MOUNT;
+			if (fs->super->s_mtime <=
+			    (__u32) ctx->now + ctx->time_fudge)
+				problem = PR_0_FUTURE_SB_LAST_MOUNT_FUDGED;
+			if (fix_problem(ctx, problem, &pctx)) {
+				fs->super->s_mtime = ctx->now;
+				fs->flags |= EXT2_FLAG_DIRTY;
+			}
 		}
 	}
-	if (!broken_system_clock &&
-	    !(ctx->flags & E2F_FLAG_TIME_INSANE) &&
+	if (!(ctx->flags & E2F_FLAG_TIME_INSANE) &&
 	    fs->super->s_wtime > (__u32) ctx->now) {
-		pctx.num = fs->super->s_wtime;
-		problem = PR_0_FUTURE_SB_LAST_WRITE;
-		if (fs->super->s_wtime <= (__u32) ctx->now + ctx->time_fudge)
-			problem = PR_0_FUTURE_SB_LAST_WRITE_FUDGED;
-		if (fix_problem(ctx, problem, &pctx)) {
-			fs->super->s_wtime = ctx->now;
-			fs->flags |= EXT2_FLAG_DIRTY;
+		if (broken_system_clock)
+			ctx->flags |= E2F_FLAG_TIME_INSANE;
+		else {
+			pctx.num = fs->super->s_wtime;
+			problem = PR_0_FUTURE_SB_LAST_WRITE;
+			if (fs->super->s_wtime <=
+			    (__u32) ctx->now + ctx->time_fudge)
+				problem = PR_0_FUTURE_SB_LAST_WRITE_FUDGED;
+			if (fix_problem(ctx, problem, &pctx)) {
+				fs->super->s_wtime = ctx->now;
+				fs->flags |= EXT2_FLAG_DIRTY;
+			}
 		}
 	}
 
diff --git a/e2fsck/unix.c b/e2fsck/unix.c
index b39383d7ef70..9caa17c02f9c 100644
--- a/e2fsck/unix.c
+++ b/e2fsck/unix.c
@@ -371,13 +371,13 @@  static void check_if_skip(e2fsck_t ctx)
 		if (batt && (fs->super->s_mnt_count <
 			     (unsigned) fs->super->s_max_mnt_count*2))
 			reason = 0;
-	} else if (!broken_system_clock && fs->super->s_checkinterval &&
-		   (ctx->now < lastcheck)) {
+	} else if (!(ctx->flags & E2F_FLAG_TIME_INSANE) &&
+		   fs->super->s_checkinterval && (ctx->now < lastcheck)) {
 		reason = _(" has filesystem last checked time in the future");
 		if (batt)
 			reason = 0;
-	} else if (!broken_system_clock && fs->super->s_checkinterval &&
-		   ((ctx->now - lastcheck) >=
+	} else if (!(ctx->flags & E2F_FLAG_TIME_INSANE) &&
+		   fs->super->s_checkinterval && ((ctx->now - lastcheck) >=
 		    ((time_t) fs->super->s_checkinterval))) {
 		reason = _(" has gone %u days without being checked");
 		reason_arg = (ctx->now - fs->super->s_lastcheck)/(3600*24);
@@ -434,7 +434,8 @@  static void check_if_skip(e2fsck_t ctx)
 		if (next_check <= 0)
 			next_check = 1;
 	}
-	if (!broken_system_clock && fs->super->s_checkinterval &&
+	if (!(ctx->flags & E2F_FLAG_TIME_INSANE) &&
+	    fs->super->s_checkinterval &&
 	    ((ctx->now - fs->super->s_lastcheck) >= fs->super->s_checkinterval))
 		next_check = 1;
 	if (next_check <= 5) {