Message ID | 1455995444-14146-1-git-send-email-alexandre.belloni@free-electrons.com |
---|---|
State | Rejected |
Headers | show |
On Sat, 20 Feb 2016 20:10:44 +0100 Alexandre Belloni <alexandre.belloni@free-electrons.com> wrote: > hctosys is setting the system time from the kernel. This means that 32bit > system can get their time set to a date after the 31bit time_t overflow. > > This is currently an issue as userspace is not yet ready to handle those > dates and may break. For example systemd's usage of timerfd shows that the > timerfd will always fire immediately because it can't be set at a date > after the current date. > > The new RTC_INVALID_2038 option will make sure that date after 03:09:07 on > Jan 19 2038 are invalid. This is 5 minutes before the 31bit overflow. This > leaves enough time for userspace to react and is short enough to make the > issue visible. This is kind of pointless. You replace loading the RTC and discovering the time isn't supported by your system with not loading he RTC and discovering your system clock is magically and almost un-debuggably wrong, and when something like NTP syncs it, breaks anyway. The only way to deal with 2038 is to fix your user space. People need to deal with reality, even if it's not all pink unicorns and rainbows. Alan
On 20/02/2016 at 19:43:10 +0000, One Thousand Gnomes wrote : > On Sat, 20 Feb 2016 20:10:44 +0100 > Alexandre Belloni <alexandre.belloni@free-electrons.com> wrote: > > > hctosys is setting the system time from the kernel. This means that 32bit > > system can get their time set to a date after the 31bit time_t overflow. > > > > This is currently an issue as userspace is not yet ready to handle those > > dates and may break. For example systemd's usage of timerfd shows that the > > timerfd will always fire immediately because it can't be set at a date > > after the current date. > > > > The new RTC_INVALID_2038 option will make sure that date after 03:09:07 on > > Jan 19 2038 are invalid. This is 5 minutes before the 31bit overflow. This > > leaves enough time for userspace to react and is short enough to make the > > issue visible. > > This is kind of pointless. You replace loading the RTC and discovering > the time isn't supported by your system with not loading he RTC and > discovering your system clock is magically and almost un-debuggably > wrong, and when something like NTP syncs it, breaks anyway. > > The only way to deal with 2038 is to fix your user space. People need to > deal with reality, even if it's not all pink unicorns and rainbows. > Actually, I'm not trying to solve the 2038 issue. But in the current state on 32 bit platforms, while the kernel is able to handle a 64bit date, userspace is not. The main issue is that distributions use HCTOSYS so if the RTC is set to a date after 2038 (which we know is currently bogus), the kernel will set a system time to that date. This result in a system that fails when using timerfd, The timerfd will always fire immediately (until, as some people pointed out, we have relative timers). This is know to break systemd [1] but I have had reports for other failing applications. I understand this is a workaround and I plan to switch the default to n once libc and critical userspace is able to handle 64 bit time. The other way of solving that is to get back to a 32 bit time_t internally until we switch the whole userspace to a 64 bit time_t but I don't think this is practical. [1] https://github.com/systemd/systemd/issues/1143
On Saturday 20 February 2016 21:47:15 Alexandre Belloni wrote: > > Actually, I'm not trying to solve the 2038 issue. > > But in the current state on 32 bit platforms, while the kernel is able > to handle a 64bit date, userspace is not. The main issue is that > distributions use HCTOSYS so if the RTC is set to a date after 2038 > (which we know is currently bogus), the kernel will set a system time to > that date. > > This result in a system that fails when using timerfd, The timerfd will > always fire immediately (until, as some people pointed out, we have > relative timers). > > This is know to break systemd [1] but I have had reports for other > failing applications. > > I understand this is a workaround and I plan to switch the default to n > once libc and critical userspace is able to handle 64 bit time. > > The other way of solving that is to get back to a 32 bit time_t > internally until we switch the whole userspace to a 64 bit time_t but I > don't think this is practical. > > [1] https://github.com/systemd/systemd/issues/1143 > I think in both cases you introduce a new 2038 problem though: as long as you have a kernel that tries to support an old 32-bit systemd build, the kernel becomes incompatible with RTC times beyond 2038, even on 64-bit systems and 32-bit systems that have fixed system call table and fixed user space. This is bad because it means we still have to break systemd eventually in order to fix the 2038 overflow. The plan to revert this after glibc has been converted is problematic because a lot of 32-bit distros will likely never recompile with 64-bit time_t in order to avoid breaking backwards compatibility. While we could require that user space and kernel must match here (either support 64-bit time_t everywhere or nowhere), that makes it much harder to deal with the migration, and it has always been a strict requirement that none of the changes for y2038 compatibility break existing user space (which of course is what happened for RTC and what we need to fix here). Has the problem of random RTC times been observed on more than one RTC driver yet? Maybe we can just apply your workaround to that one driver that saw it instead. Have you figured out whether there is a pattern in the reported times? Is it just completely random or could we perhaps detect an RTC that reports an invalid time other than by looking at the year? Arnd
On 20/02/2016 at 23:16:48 +0100, Arnd Bergmann wrote : > On Saturday 20 February 2016 21:47:15 Alexandre Belloni wrote: > > > > Actually, I'm not trying to solve the 2038 issue. > > > > But in the current state on 32 bit platforms, while the kernel is able > > to handle a 64bit date, userspace is not. The main issue is that > > distributions use HCTOSYS so if the RTC is set to a date after 2038 > > (which we know is currently bogus), the kernel will set a system time to > > that date. > > > > This result in a system that fails when using timerfd, The timerfd will > > always fire immediately (until, as some people pointed out, we have > > relative timers). > > > > This is know to break systemd [1] but I have had reports for other > > failing applications. > > > > I understand this is a workaround and I plan to switch the default to n > > once libc and critical userspace is able to handle 64 bit time. > > > > The other way of solving that is to get back to a 32 bit time_t > > internally until we switch the whole userspace to a 64 bit time_t but I > > don't think this is practical. > > > > [1] https://github.com/systemd/systemd/issues/1143 > > > > I think in both cases you introduce a new 2038 problem though: > as long as you have a kernel that tries to support an old > 32-bit systemd build, the kernel becomes incompatible with RTC > times beyond 2038, even on 64-bit systems and 32-bit systems > that have fixed system call table and fixed user space. > It doesn't change anything for 64-bit systems, I've excluded them by using "depends on !64BIT". Right now, it doesn't change anything for 32-bit systems because either way, they will fail in 2038. > This is bad because it means we still have to break systemd > eventually in order to fix the 2038 overflow. > Won't we have to recompile every application to support 64-bit time on 32-bit system anyway? That will be a good time to remove that option. > The plan to revert this after glibc has been converted is > problematic because a lot of 32-bit distros will likely never > recompile with 64-bit time_t in order to avoid breaking > backwards compatibility. While we could require that user > space and kernel must match here (either support 64-bit time_t > everywhere or nowhere), that makes it much harder to deal > with the migration, and it has always been a strict requirement > that none of the changes for y2038 compatibility break existing > user space (which of course is what happened for RTC and what > we need to fix here). If the distribution don't recompile to support a 64-bit time, then the 32-bit systems will break in 2038 anyway and they will absolutely require my patch or something along those lines to still boot using systemd. > > Has the problem of random RTC times been observed on more than > one RTC driver yet? Maybe we can just apply your workaround > to that one driver that saw it instead. > > Have you figured out whether there is a pattern in the reported > times? Is it just completely random or could we perhaps > detect an RTC that reports an invalid time other than by > looking at the year? > All the failures seem quite random to me but the reports I get are not that precise. I know it happens with PCF8523 and that can be true because the datasheet says the date is undefined at reset. The handling of the OS bit (that bit indicates the date is probably wrong) has to be fixed and I will take care of that. If that is the only RTC with that particular issue, we clearly don't need my workaround. But As I say, I can ask but I don't have a clear list of the affected system and their RTC.
On 21/02/2016 at 00:17:02 +0100, Alexandre Belloni wrote : > All the failures seem quite random to me but the reports I get are not > that precise. > > I know it happens with PCF8523 and that can be true because the > datasheet says the date is undefined at reset. The handling of the OS > bit (that bit indicates the date is probably wrong) has to be fixed and > I will take care of that. > > If that is the only RTC with that particular issue, we clearly don't > need my workaround. But As I say, I can ask but I don't have a clear > list of the affected system and their RTC. > I add that it also seem to affect the Allwinner A20 RTC. I'm not sure how much we can trust the datasheet but it claims the reset value is 0 and I don't think there is a way to know whether the date is correct or not.
> It doesn't change anything for 64-bit systems, I've excluded them by > using "depends on !64BIT". Right now, it doesn't change anything for > 32-bit systems because either way, they will fail in 2038. Which realistically won't actually matter because in 22 years time nobody will be able to find a 32bit system in common use. If you look at x86 platforms today a Pentium Pro is already a collectors item. All of todays locked down half-maintained embedded and phone devices will be at best the digital equivalent of toxic waste if connected to anything. > Won't we have to recompile every application to support 64-bit time on > 32-bit system anyway? That will be a good time to remove that option. How will you know when everyone has ? There's no "autodetect which distribution I am running" feature. > If the distribution don't recompile to support a 64-bit time, then the > 32-bit systems will break in 2038 anyway and they will absolutely > require my patch or something along those lines to still boot using > systemd. I disagree. Systemd has a serious robustness bug. Patch systemd to handle timerd going off early and to take appropriate recovery action. If you fix the systemd bug you'll also deal with a load of other weird cornercases like 32bit guests on a 64bit host that accidentally ended up post 2038, and every other freaky rtc failure. Alan
On 21/02/2016 at 12:40:20 +0000, One Thousand Gnomes wrote : > > It doesn't change anything for 64-bit systems, I've excluded them by > > using "depends on !64BIT". Right now, it doesn't change anything for > > 32-bit systems because either way, they will fail in 2038. > > Which realistically won't actually matter because in 22 years time nobody > will be able to find a 32bit system in common use. If you look at x86 > platforms today a Pentium Pro is already a collectors item. All of todays > locked down half-maintained embedded and phone devices will be at best > the digital equivalent of toxic waste if connected to anything. > The current 32 bit systems are not only phones. I'm not concerned by those, they have an average 18 month live and the manufacturers are already switching to 64 bit. But there are long lived products like cars, parking ticket machines, insulin pumps, automated external defibrillators, home automation controllers, point of sales, etc... Some of those may still be in use in 22 years. Or maybe we want to ensure that there is a Y2038 bug, that can be a good retirement plan for the whole IT industry ;) > > Won't we have to recompile every application to support 64-bit time on > > 32-bit system anyway? That will be a good time to remove that option. > > How will you know when everyone has ? There's no "autodetect which > distribution I am running" feature. > I have the hope that the distribution maintainers know how to configure a kernel and will ship a kernel with that option unset once they switched to a 64 bit time userspace. > > If the distribution don't recompile to support a 64-bit time, then the > > 32-bit systems will break in 2038 anyway and they will absolutely > > require my patch or something along those lines to still boot using > > systemd. > > I disagree. Systemd has a serious robustness bug. Patch systemd to handle > timerd going off early and to take appropriate recovery action. > > If you fix the systemd bug you'll also deal with a load of other weird > cornercases like 32bit guests on a 64bit host that accidentally ended up > post 2038, and every other freaky rtc failure. > Actually, I agree with Lennart that this is definitively a kernel bug that has to be fixed. systemd is not the only affected application, any user of timerfd is failing (actually, the first report I got was not related to systemd at all). I can also agree that systemd could be a bit more robust there but you'll have to convince Lennart...
On Mon, 22 Feb 2016 14:00:14 +0100 Alexandre Belloni <alexandre.belloni@free-electrons.com> wrote: . > But there are long lived products like cars, parking ticket machines, > insulin pumps, automated external defibrillators, home automation > controllers, point of sales, etc... Some of those may still be in use in > 22 years. And if so their vendors will have provided updates. Your "fix" doesn't help anything, it just means the user sees it fail in a different way. > I can also agree that systemd could be a bit more robust there but > you'll have to convince Lennart... That's a systemd problem. If their code isn't robust then the distributiosn will just have to keep patching it. The only problem that can actually be "fixed" is the case where it isn't 2038 yet and the user has a scrambled RTC. In that case your init tools need to be robust enough to handle the problem or use APIs that don't break. The kernel can't actually "fix" this because it never knows whether your userspace is sane or not. I'd argue btw that any code using timerfd_create with TFD_TIMER_ABSTIME and passing it a value that wraps the range permitted by that time representation *is* buggy. It's the applications responsibility to use values that are within the defined behavioural range of the function. Far more constructive would I think be to add a TFD_TIME64 flag to timerfd_create that allows the use of 64bit time in timerfd_*. Systemd can then adopt that safely even on 32bit legacy systems, while on 64bit TFD_TIME64 would presumably be 0 and the 64/32bit time structs would match. Alan
On Monday 22 February 2016 13:43:19 One Thousand Gnomes wrote: > On Mon, 22 Feb 2016 14:00:14 +0100 > Alexandre Belloni <alexandre.belloni@free-electrons.com> wrote: > > I can also agree that systemd could be a bit more robust there but > > you'll have to convince Lennart... > > That's a systemd problem. If their code isn't robust then the > distributiosn will just have to keep patching it. > > The only problem that can actually be "fixed" is the case where it isn't > 2038 yet and the user has a scrambled RTC. In that case your init tools > need to be robust enough to handle the problem or use APIs that don't > break. The kernel can't actually "fix" this because it never knows > whether your userspace is sane or not. > > I'd argue btw that any code using timerfd_create with TFD_TIMER_ABSTIME > and passing it a value that wraps the range permitted by that time > representation *is* buggy. It's the applications responsibility to use > values that are within the defined behavioural range of the function. IIRC, the problem is that user space passes in TIME_T_MAX and the kernel is considering that to be in the past because the clock is set beyond 2038. I find it hard to blame user space for that, but I don't have a good idea for solving this either. In case of systemd, it is literally the first thing that runs on the kernel after booting, so we could fall back to setting the time to some known working state (1970 or 2016 or something), but that would be a rather bad default policy once the system has been running for a while. The best we can do for a workaround localized to timerfd might be to make absolute timers behave differently when they come from a 32-bit process and the current time has already overflown. > Far more constructive would I think be to add a TFD_TIME64 flag to > timerfd_create that allows the use of 64bit time in timerfd_*. Systemd > can then adopt that safely even on 32bit legacy systems, while on 64bit > TFD_TIME64 would presumably be 0 and the 64/32bit time structs would > match. I should really dust off my syscall series, I'd rather not have any partial solutions to merged here. Arnd
On 22/02/2016 at 16:44:32 +0100, Arnd Bergmann wrote : > On Monday 22 February 2016 13:43:19 One Thousand Gnomes wrote: > > On Mon, 22 Feb 2016 14:00:14 +0100 > > Alexandre Belloni <alexandre.belloni@free-electrons.com> wrote: > > > I can also agree that systemd could be a bit more robust there but > > > you'll have to convince Lennart... > > > > That's a systemd problem. If their code isn't robust then the > > distributiosn will just have to keep patching it. > > > > The only problem that can actually be "fixed" is the case where it isn't > > 2038 yet and the user has a scrambled RTC. In that case your init tools > > need to be robust enough to handle the problem or use APIs that don't > > break. The kernel can't actually "fix" this because it never knows > > whether your userspace is sane or not. > > > > I'd argue btw that any code using timerfd_create with TFD_TIMER_ABSTIME > > and passing it a value that wraps the range permitted by that time > > representation *is* buggy. It's the applications responsibility to use > > values that are within the defined behavioural range of the function. > > IIRC, the problem is that user space passes in TIME_T_MAX and the kernel > is considering that to be in the past because the clock is set beyond 2038. > > I find it hard to blame user space for that, but I don't have a good > idea for solving this either. > > In case of systemd, it is literally the first thing that runs on the kernel > after booting, so we could fall back to setting the time to some known > working state (1970 or 2016 or something), but that would be a rather > bad default policy once the system has been running for a while. > Also, how would you know that it is an invalid time, some RTC doesn't provide that information. One other workaround is to asked distributions using systemd to stop using HCTOSYS so userspace would be responsible to set the system time and in that case we won't have the 32/64 discrepancy.
On Monday 22 February 2016 16:56:53 Alexandre Belloni wrote: > On 22/02/2016 at 16:44:32 +0100, Arnd Bergmann wrote : > > On Monday 22 February 2016 13:43:19 One Thousand Gnomes wrote: > > > On Mon, 22 Feb 2016 14:00:14 +0100 > > > Alexandre Belloni <alexandre.belloni@free-electrons.com> wrote: > > > > I can also agree that systemd could be a bit more robust there but > > > > you'll have to convince Lennart... > > > > > > That's a systemd problem. If their code isn't robust then the > > > distributiosn will just have to keep patching it. > > > > > > The only problem that can actually be "fixed" is the case where it isn't > > > 2038 yet and the user has a scrambled RTC. In that case your init tools > > > need to be robust enough to handle the problem or use APIs that don't > > > break. The kernel can't actually "fix" this because it never knows > > > whether your userspace is sane or not. > > > > > > I'd argue btw that any code using timerfd_create with TFD_TIMER_ABSTIME > > > and passing it a value that wraps the range permitted by that time > > > representation *is* buggy. It's the applications responsibility to use > > > values that are within the defined behavioural range of the function. > > > > IIRC, the problem is that user space passes in TIME_T_MAX and the kernel > > is considering that to be in the past because the clock is set beyond 2038. > > > > I find it hard to blame user space for that, but I don't have a good > > idea for solving this either. > > > > In case of systemd, it is literally the first thing that runs on the kernel > > after booting, so we could fall back to setting the time to some known > > working state (1970 or 2016 or something), but that would be a rather > > bad default policy once the system has been running for a while. > > > > Also, how would you know that it is an invalid time, some RTC doesn't > provide that information. What I meant was encountering a time past the 2038 date, which is invalid as seen from current 32-bit user space, but not necessarily from the kernel. > One other workaround is to asked distributions > using systemd to stop using HCTOSYS so userspace would be responsible to > set the system time and in that case we won't have the 32/64 discrepancy. I'm missing a bit of background here. This seems like a fairly useful piece of infrastructure for the majority of the use cases (working RTC) How would the time get set when this is disabled? Is systemd able to read the rtc and write it back to the kernel? That could in fact be a nicer workaround for the problem, if it just does this before setting up the timerfd. Arnd
On 22/02/2016 at 17:18:03 +0100, Arnd Bergmann wrote : > > > IIRC, the problem is that user space passes in TIME_T_MAX and the kernel > > > is considering that to be in the past because the clock is set beyond 2038. > > > > > > I find it hard to blame user space for that, but I don't have a good > > > idea for solving this either. > > > > > > In case of systemd, it is literally the first thing that runs on the kernel > > > after booting, so we could fall back to setting the time to some known > > > working state (1970 or 2016 or something), but that would be a rather > > > bad default policy once the system has been running for a while. > > > > > > > Also, how would you know that it is an invalid time, some RTC doesn't > > provide that information. > > What I meant was encountering a time past the 2038 date, which is invalid > as seen from current 32-bit user space, but not necessarily from the > kernel. > I'm not completely sure how this would be different from my current patch... > > One other workaround is to asked distributions > > using systemd to stop using HCTOSYS so userspace would be responsible to > > set the system time and in that case we won't have the 32/64 discrepancy. > > I'm missing a bit of background here. This seems like a fairly useful > piece of infrastructure for the majority of the use cases (working RTC) > > How would the time get set when this is disabled? Is systemd able > to read the rtc and write it back to the kernel? That could in fact > be a nicer workaround for the problem, if it just does this before > setting up the timerfd. > I didn't check other distribution but debian and poky have /etc/init.d/hwclock.sh that reads the rtc and sets the system time at startup. It also saves the time to the RTC on shutdown.
On 2016-02-22 11:18, Arnd Bergmann wrote: > On Monday 22 February 2016 16:56:53 Alexandre Belloni wrote: >> One other workaround is to asked distributions >> using systemd to stop using HCTOSYS so userspace would be responsible to >> set the system time and in that case we won't have the 32/64 discrepancy. > > I'm missing a bit of background here. This seems like a fairly useful > piece of infrastructure for the majority of the use cases (working RTC) > > How would the time get set when this is disabled? Is systemd able > to read the rtc and write it back to the kernel? That could in fact > be a nicer workaround for the problem, if it just does this before > setting up the timerfd. Traditional init systems on Linux have the option of using hwclock from util-linux to set the system time. This is what Gentoo does by default, and I think Arch does it too, and I'm relatively certain that Debian and Ubuntu used to do it before they switched to systemd (I have no idea what they do now). Based on the manpage for hwclock, it looks like systemd mandates that HCTOSYS is enabled in the kernel configuration, and then just calls hwclock to set the system timezone and correct for UTC.
On 22/02/2016 at 11:41:01 -0500, Austin S. Hemmelgarn wrote : > On 2016-02-22 11:18, Arnd Bergmann wrote: > >On Monday 22 February 2016 16:56:53 Alexandre Belloni wrote: > >>One other workaround is to asked distributions > >>using systemd to stop using HCTOSYS so userspace would be responsible to > >>set the system time and in that case we won't have the 32/64 discrepancy. > > > >I'm missing a bit of background here. This seems like a fairly useful > >piece of infrastructure for the majority of the use cases (working RTC) > > > >How would the time get set when this is disabled? Is systemd able > >to read the rtc and write it back to the kernel? That could in fact > >be a nicer workaround for the problem, if it just does this before > >setting up the timerfd. > Traditional init systems on Linux have the option of using hwclock from > util-linux to set the system time. This is what Gentoo does by default, and > I think Arch does it too, and I'm relatively certain that Debian and Ubuntu > used to do it before they switched to systemd (I have no idea what they do > now). Based on the manpage for hwclock, it looks like systemd mandates that > HCTOSYS is enabled in the kernel configuration, and then just calls hwclock > to set the system timezone and correct for UTC. I'm not sure it mandates HCTOSYS. On debian there is a udev rule that needs the system time to already be set but from my point of view, it doesn't matter whether it is from hwclock or HCTOSYS. systemd already seems to read the RTC so it may as well set the system time if it needs it.
diff --git a/drivers/rtc/Kconfig b/drivers/rtc/Kconfig index 376322f71fd5..fc087855e6a9 100644 --- a/drivers/rtc/Kconfig +++ b/drivers/rtc/Kconfig @@ -73,6 +73,16 @@ config RTC_DEBUG Say yes here to enable debugging support in the RTC framework and individual RTC drivers. +config RTC_INVALID_2038 + bool "Invalidate dates after 2038" + depends on !64BIT + default y + help + Saying yes here will make any date after 03:09:07 on Jan 19 2038 + invalid (this is 5 minutes before the 31 bits overflow of a time_t). + This is useful if your userspace is not yet ready to handle 64 bits + times. + comment "RTC interfaces" config RTC_INTF_SYSFS diff --git a/drivers/rtc/rtc-lib.c b/drivers/rtc/rtc-lib.c index e6bfb9c42a10..1ba148256afc 100644 --- a/drivers/rtc/rtc-lib.c +++ b/drivers/rtc/rtc-lib.c @@ -107,7 +107,10 @@ int rtc_valid_tm(struct rtc_time *tm) || ((unsigned)tm->tm_min) >= 60 || ((unsigned)tm->tm_sec) >= 60) return -EINVAL; - +#ifdef CONFIG_RTC_INVALID_2038 + if (rtc_tm_to_time64(tm) > 0x7FFFFED4) /* 5 minutes before overflow */ + return -EINVAL; +#endif return 0; } EXPORT_SYMBOL(rtc_valid_tm);
hctosys is setting the system time from the kernel. This means that 32bit system can get their time set to a date after the 31bit time_t overflow. This is currently an issue as userspace is not yet ready to handle those dates and may break. For example systemd's usage of timerfd shows that the timerfd will always fire immediately because it can't be set at a date after the current date. The new RTC_INVALID_2038 option will make sure that date after 03:09:07 on Jan 19 2038 are invalid. This is 5 minutes before the 31bit overflow. This leaves enough time for userspace to react and is short enough to make the issue visible. Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> --- drivers/rtc/Kconfig | 10 ++++++++++ drivers/rtc/rtc-lib.c | 5 ++++- 2 files changed, 14 insertions(+), 1 deletion(-)