diff mbox

[v5,2/5] vfs: Add checks for filesystem timestamp limits

Message ID 1491680267-11171-3-git-send-email-deepa.kernel@gmail.com
State Superseded
Headers show

Commit Message

Deepa Dinamani April 8, 2017, 7:37 p.m. UTC
Allow read only mounts for filesystems that do not
have maximum timestamps beyond the y2038 expiry
timestamp.

Also, allow a sysctl override to all such filesystems
to be mounted with write permissions.
A boot param supports initial override of these
checks from the early boot without recompilation.

Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
---
 Documentation/admin-guide/kernel-parameters.txt |  8 ++++++++
 fs/inode.c                                      | 15 +++++++++++++++
 fs/internal.h                                   |  2 ++
 fs/namespace.c                                  | 12 ++++++++++++
 fs/super.c                                      |  7 +++++++
 include/linux/fs.h                              |  1 +
 include/linux/time64.h                          |  4 ++++
 include/uapi/linux/fs.h                         |  6 +++++-
 kernel/sysctl.c                                 |  7 +++++++
 9 files changed, 61 insertions(+), 1 deletion(-)

Comments

Linus Torvalds April 8, 2017, 8:04 p.m. UTC | #1
On Sat, Apr 8, 2017 at 12:37 PM, Deepa Dinamani <deepa.kernel@gmail.com> wrote:
> Allow read only mounts for filesystems that do not
> have maximum timestamps beyond the y2038 expiry
> timestamp.

This option seems arbitrary and pointless.

Nobody sane should ever enable it except for testing, but for testing
it would be much better to simply specify what the limit should be:
2038 is not magical for all filesystems, because the base may be
different.

And honestly, for testing, it would be much better to just make it a
mount option rather than some crazy system-wide one.

                   Linus
Deepa Dinamani April 9, 2017, 2:58 a.m. UTC | #2
>> Allow read only mounts for filesystems that do not
>> have maximum timestamps beyond the y2038 expiry
>> timestamp.
>
> This option seems arbitrary and pointless.
>
> Nobody sane should ever enable it except for testing, but for testing
> it would be much better to simply specify what the limit should be:
> 2038 is not magical for all filesystems, because the base may be
> different.

Yes, the way the patch is right now, it is meant only for testing
y2038 readiness.
The feature is meant for system wide tests and not individual filesystem tests.

The original idea was to disallow writes on all filesystem mounts that
were not able to update times at the time of mount, meaning max time
supported by the filesystem should be greater than current system
time. But, then we end up with the problem of what to do about mounts
whose max time exceeds current time after mount. This can be handled
by some logic while updating inode times. But, maybe this level of
complexity is not required and we could just stick to the former use
case. And, just print a warning in the latter case. This is what
pushes the feature to be something more than y2038 readiness.

> And honestly, for testing, it would be much better to just make it a
> mount option rather than some crazy system-wide one.

The patch allows the y2038 number to be changed at compile time. I can
extend the sysctl and boot option to allow changing of this limit also
if that is preferred.

We also proposed the mount option route in the RFC. But, we received
no preferences/ comments. We proceeded with the sysctl option because
this allows us to extend this feature into disallowing writes on non
updatable time filesystems.

I could change this to providing a mount option instead if you think
that is better.

-Deepa
Arnd Bergmann April 25, 2017, 7:47 p.m. UTC | #3
On Sun, Apr 9, 2017 at 4:58 AM, Deepa Dinamani <deepa.kernel@gmail.com> wrote:
>>> Allow read only mounts for filesystems that do not
>>> have maximum timestamps beyond the y2038 expiry
>>> timestamp.
>>
>> This option seems arbitrary and pointless.
>>
>> Nobody sane should ever enable it except for testing, but for testing
>> it would be much better to simply specify what the limit should be:
>> 2038 is not magical for all filesystems, because the base may be
>> different.
>
> Yes, the way the patch is right now, it is meant only for testing
> y2038 readiness.
> The feature is meant for system wide tests and not individual filesystem tests.

There is one global option that I want to see, and that is for completely
disabling all components that are known to be broken in y2038.

We could do this with just a compile-time option that primarily
turns off all drivers using the 32-bit time_t, but the same compile-time
option can also force the file system to be read-only.

I don't see this just as something we want to do for testing, but
also as a safeguard for people shipping embedded systems with
long service life: If something can go wrong after write-mounting
an ext3 file system after 2038, it's better to force a behavior now
that can be reasonably expected not to change.

Between doing a compile-time option or a boot-time option, doing
it purely compile-time is probably better as it gives us the possible
additional checking when we hide the time_t definition.

We can do the boot-time option as well, to set a particular limit
other than the one enforced at compile time. Passing a year
number like "fstimestampcheck=2099" would address Linus'
concern about the cutoff being arbitrary.

I would also make the default limit higher than 2038, as at
least the Apple HFS/HFS+ file systems break only a bit later
in 2040. However, I don't think any other file system breaks
until 2099 (some Microsoft file systems), which would be
the next reasonably default cutoff IMO.

>> And honestly, for testing, it would be much better to just make it a
>> mount option rather than some crazy system-wide one.
>
> The patch allows the y2038 number to be changed at compile time. I can
> extend the sysctl and boot option to allow changing of this limit also
> if that is preferred.
>
> We also proposed the mount option route in the RFC. But, we received
> no preferences/ comments. We proceeded with the sysctl option because
> this allows us to extend this feature into disallowing writes on non
> updatable time filesystems.
>
> I could change this to providing a mount option instead if you think
> that is better.

I don't see much value in a mount option that prevents the use,
but maybe a mount option to override the global setting to make
an exception for someone who does want to mount a particular
(known-broken) file system despite having the stricter global setting.

         Arnd
Linus Torvalds April 25, 2017, 8:02 p.m. UTC | #4
On Tue, Apr 25, 2017 at 12:47 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>
> There is one global option that I want to see, and that is for completely
> disabling all components that are known to be broken in y2038.

I really don't see the point.

Don't do it. Make it some local hack, I'm not taking crazy patches.

                    Linus
Arnd Bergmann April 25, 2017, 8:31 p.m. UTC | #5
On Tue, Apr 25, 2017 at 10:02 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Apr 25, 2017 at 12:47 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>>
>> There is one global option that I want to see, and that is for completely
>> disabling all components that are known to be broken in y2038.
>
> I really don't see the point.
>
> Don't do it. Make it some local hack, I'm not taking crazy patches.

I have the local hack , and used it to find all the drivers that use a
32-bit time_t internally (and mark them with a Kconfig dependency
for testing).

Would it be ok to have a simple way of removing the time_t definition (e.g.
by passing '-DREQUIRE_TIME64' to the compiler, but without the Kconfig
option? That way, someone who wants to ship a product can at least
find the obvious dependencies on stuff that remains broken.

       Arnd
Linus Torvalds April 25, 2017, 8:35 p.m. UTC | #6
On Tue, Apr 25, 2017 at 1:31 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>
> Would it be ok to have a simple way of removing the time_t definition (e.g.
> by passing '-DREQUIRE_TIME64' to the compiler, but without the Kconfig
> option? That way, someone who wants to ship a product can at least
> find the obvious dependencies on stuff that remains broken.

How would you find them?

People don't necessarily use "time_t". They might use "int" or whatever.

There is absolutely zero point to making this some kind of crazy
config option, because such an option will prove absolutely *NOTHING*.

Seriously. This whole concept is  completely stupid.

The only possible thing you can do is to

 (a) have an actual test-suite
 (b) set the time to 32+ bits
 (c) see what breaks

because otherwise it seems entirely pointless.

And no, we're not adding random crazy source modifications for pointless crap.

                      Linus
Arnd Bergmann April 25, 2017, 9:23 p.m. UTC | #7
On Tue, Apr 25, 2017 at 10:35 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Apr 25, 2017 at 1:31 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>>
>> Would it be ok to have a simple way of removing the time_t definition (e.g.
>> by passing '-DREQUIRE_TIME64' to the compiler, but without the Kconfig
>> option? That way, someone who wants to ship a product can at least
>> find the obvious dependencies on stuff that remains broken.
>
> How would you find them?
>
> People don't necessarily use "time_t". They might use "int" or whatever.

My main approach has been:

* Assume that all of the time_t based interfaces are broken on 32-bit systems
  (some are not, but almost all are)

* For each interface that exposes a time_t to other files, introduce a
  replacement interface that is known to work

* Change users of the old interface over to the new one, one at a time,
  while manually reviewing all other code this interacts with.

Note that the vast majority of all the in-kernel uses of time_t variables
actually use timespec or timeval structures because they require
sub-second resolution, so we already know that they cannot
accidentally get assigned to 'int'. Also, we typically replace them with
ktime_t for efficiency. In case we replace a timespec with timespec64,
we do have to be careful to ensure that no code just treats the
tv_sec member as 'int' or 'long' though.

      Arnd
diff mbox

Patch

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index c2f220d..57f4a50 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1193,6 +1193,14 @@ 
 			can be changed at run time by the max_graph_depth file
 			in the tracefs tracing directory. default: 0 (no limit)
 
+	fstimestampcheck
+			Enable checking of max filesystem time supported
+			at mount time. The value is checked against y2038
+			date: Mon Jan 18 19:14:07 PST 2038. The option
+			disables rw mount of filesystems that are not able
+			to represent times beyond y2038 time mentioned above.
+			This check is off by default.
+
 	gamecon.map[2|3]=
 			[HW,JOY] Multisystem joystick and NES/SNES/PSX pad
 			support via parallel port (up to 5 devices per port)
diff --git a/fs/inode.c b/fs/inode.c
index a9caf53..a0c1522 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -75,6 +75,21 @@  static DEFINE_PER_CPU(unsigned long, nr_unused);
 
 static struct kmem_cache *inode_cachep __read_mostly;
 
+struct vfs_max_timestamp_check timestamp_check = {
+	.timestamp_supported = Y2038_EXPIRY_TIMESTAMP,
+	.check_on = 0,
+};
+
+static int __init setup_timestamp_check(char *str)
+{
+	if (*str)
+		return 0;
+	timestamp_check.check_on = 1;
+	return 1;
+}
+
+__setup("fstimestampcheck", setup_timestamp_check);
+
 static long get_nr_inodes(void)
 {
 	int i;
diff --git a/fs/internal.h b/fs/internal.h
index cef253a..76fbcde 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -67,6 +67,8 @@  extern int finish_automount(struct vfsmount *, struct path *);
 
 extern int sb_prepare_remount_readonly(struct super_block *);
 
+extern bool sb_file_times_updatable(struct super_block *sb);
+
 extern void __init mnt_init(void);
 
 extern int __mnt_want_write(struct vfsmount *);
diff --git a/fs/namespace.c b/fs/namespace.c
index 6b81c20..fd6e479 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -538,6 +538,18 @@  static void __mnt_unmake_readonly(struct mount *mnt)
 	unlock_mount_hash();
 }
 
+bool sb_file_times_updatable(struct super_block *sb)
+{
+
+	if (!timestamp_check.check_on)
+		return true;
+
+	if (sb->s_time_max > timestamp_check.timestamp_supported)
+		return true;
+
+	return false;
+}
+
 int sb_prepare_remount_readonly(struct super_block *sb)
 {
 	struct mount *mnt;
diff --git a/fs/super.c b/fs/super.c
index f9c2241..4e7577b 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1245,6 +1245,13 @@  mount_fs(struct file_system_type *type, int flags, const char *name, void *data)
 	WARN((sb->s_maxbytes < 0), "%s set sb->s_maxbytes to "
 		"negative value (%lld)\n", type->name, sb->s_maxbytes);
 
+	if (!(sb->s_flags & MS_RDONLY) && !sb_file_times_updatable(sb)) {
+		WARN(1, "File times cannot be updated on the filesystem.\n");
+		WARN(1, "Retry mounting the filesystem readonly.\n");
+		error = -EROFS;
+		goto out_sb;
+	}
+
 	up_write(&sb->s_umount);
 	free_secdata(secdata);
 	return root;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 63f83440..a39dc8e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -68,6 +68,7 @@  extern struct inodes_stat_t inodes_stat;
 extern int leases_enable, lease_break_time;
 extern int sysctl_protected_symlinks;
 extern int sysctl_protected_hardlinks;
+extern struct vfs_max_timestamp_check timestamp_check;
 
 struct buffer_head;
 typedef int (get_block_t)(struct inode *inode, sector_t iblock,
diff --git a/include/linux/time64.h b/include/linux/time64.h
index 25433b18..906e0b3 100644
--- a/include/linux/time64.h
+++ b/include/linux/time64.h
@@ -43,6 +43,10 @@  struct itimerspec64 {
 #define KTIME_MAX			((s64)~((u64)1 << 63))
 #define KTIME_SEC_MAX			(KTIME_MAX / NSEC_PER_SEC)
 
+/* Timestamps on boundary */
+#define Y2038_EXPIRY_TIMESTAMP		S32_MAX /* 2147483647 */
+#define Y2106_EXPIRY_TIMESTAMP		U32_MAX /* 4294967295 */
+
 #if __BITS_PER_LONG == 64
 
 static inline struct timespec timespec64_to_timespec(const struct timespec64 ts64)
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 048a85e..125e4ae 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -91,6 +91,11 @@  struct files_stat_struct {
 	unsigned long max_files;		/* tunable */
 };
 
+struct vfs_max_timestamp_check {
+	time64_t timestamp_supported;
+	int check_on;
+};
+
 struct inodes_stat_t {
 	long nr_inodes;
 	long nr_unused;
@@ -100,7 +105,6 @@  struct inodes_stat_t {
 
 #define NR_FILE  8192	/* this can well be larger on a larger system */
 
-
 /*
  * These are the fs-independent mount-flags: up to 32 flags are supported
  */
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 60474df..d88487c 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1668,6 +1668,13 @@  static struct ctl_table fs_table[] = {
 		.proc_handler	= proc_doulongvec_minmax,
 	},
 	{
+		.procname	= "fs-timestamp-check-on",
+		.data		= &timestamp_check.check_on,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
+	{
 		.procname	= "nr_open",
 		.data		= &sysctl_nr_open,
 		.maxlen		= sizeof(unsigned int),