Message ID | 20231218073849.35294-1-peter.hilber@opensynergy.com |
---|---|
Headers | show |
Series | Add virtio_rtc module and related changes | expand |
On Mon, 2023-12-18 at 08:38 +0100, Peter Hilber wrote: > RFC v3 updates > -------------- > > This series implements a driver for a virtio-rtc device conforming to spec > RFC v3 [1]. It now includes an RTC class driver with alarm, in addition to > the PTP clock driver already present before. > > This patch series depends on the patch series "treewide: Use clocksource id > for get_device_system_crosststamp()" [3]. Pull [4] to get the combined > series on top of mainline. > > Overview > -------- > > This patch series adds the virtio_rtc module, and related bugfixes. The > virtio_rtc module implements a driver compatible with the proposed Virtio > RTC device specification [1]. The Virtio RTC (Real Time Clock) device > provides information about current time. The device can provide different > clocks, e.g. for the UTC or TAI time standards, or for physical time > elapsed since some past epoch. Hm, should we allow UTC? If you tell me the time in UTC, then (sometimes) I still don't actually know what the time is, because some UTC seconds occur twice. UTC only makes sense if you provide the TAI offset, surely? Should the virtio_rtc specification make it mandatory to provide such? Otherwise you're just designing it to allow crappy hypervisors to expose incomplete information. > PTP clock interface > ------------------- > > virtio_rtc exposes clocks as PTP clocks to userspace, similar to ptp_kvm. > If both the Virtio RTC device and this driver have special support for the > current clocksource, time synchronization programs can use > cross-timestamping using ioctl PTP_SYS_OFFSET_PRECISE2 aka > PTP_SYS_OFFSET_PRECISE. Similar to ptp_kvm, system time synchronization > with single-digit ns precision is possible with a quiescent reference clock > (from the Virtio RTC device). This works even when the Virtio device > response is slow compared to ptp_kvm hypercalls. Is PTP the right mechanism for this? As I understand it, PTP is a way to precisely synchronize one clock with another. But in the case of virt guests synchronizing against the host, it isn't really *another* clock. It really is the *same* underlying clock. As the host clock varies with temperature, for example, so does the guest clock. The only difference is an offset and (on x86 perhaps) a mathematical scaling of the frequency. I was looking at this another way, when I came across this virtio-rtc work. My idea was just for the hypervisor to expose its own timekeeping information — the counter/TSC value and TAI time at a given moment, frequency of the counter, and the precision of both that frequency (±PPM) and the TAI timestamp (±µs). By putting that in a host/guest shared data structure with a seqcount for lockless updates, we can update it as time synchronization on the host is refined, and we can even cleanly handle live migration where the guest ends up on a completely different host. It allows for use cases which *really* care (e.g. timestamping financial transactions) to ensure that there is never even a moment of getting *wrong* timestamps if they haven't yet resynced after a migration. Now I'm trying to work out if I should attempt to reconcile with your existing virtio-rtc work, or just decide that virtio-rtc isn't trying to solve the actual problem that we have, and go ahead with something different... ?
On 07.03.24 15:02, David Woodhouse wrote: > On Mon, 2023-12-18 at 08:38 +0100, Peter Hilber wrote: >> RFC v3 updates >> -------------- >> >> This series implements a driver for a virtio-rtc device conforming to spec >> RFC v3 [1]. It now includes an RTC class driver with alarm, in addition to >> the PTP clock driver already present before. >> >> This patch series depends on the patch series "treewide: Use clocksource id >> for get_device_system_crosststamp()" [3]. Pull [4] to get the combined >> series on top of mainline. >> >> Overview >> -------- >> >> This patch series adds the virtio_rtc module, and related bugfixes. The >> virtio_rtc module implements a driver compatible with the proposed Virtio >> RTC device specification [1]. The Virtio RTC (Real Time Clock) device >> provides information about current time. The device can provide different >> clocks, e.g. for the UTC or TAI time standards, or for physical time >> elapsed since some past epoch. > > Hm, should we allow UTC? If you tell me the time in UTC, then > (sometimes) I still don't actually know what the time is, because some > UTC seconds occur twice. UTC only makes sense if you provide the TAI > offset, surely? Should the virtio_rtc specification make it mandatory > to provide such? > > Otherwise you're just designing it to allow crappy hypervisors to > expose incomplete information. > Hi David, (adding virtio-comment@lists.oasis-open.org for spec discussion), thank you for your insightful comments. I think I take a broadly similar view. The reason why the current spec and driver is like this is that I took a pragmatic approach at first and only included features which work out-of-the-box for the current Linux ecosystem. The current virtio_rtc features work similar to ptp_kvm, and therefore can work out-of-the-box with time sync daemons such as chrony. As of RFC spec v3, UTC clock only is allowed. If mandating a TAI clock as well, I am afraid that - in some (embedded) scenarios, the TAI clock may not be available - crappy hypervisors will pass off the UTC clock as the TAI clock. For the same reasons, I am also not sure about adding a *mandatory* TAI offset to each readout. I don't know user-space software which would leverage this already (at least not through the PTP clock interface). And why would such software not go straight for the TAI clock instead? How about adding a requirement to the spec that the virtio-rtc device SHOULD expose the TAI clock whenever it is available - would this address your concerns? >> PTP clock interface >> ------------------- >> >> virtio_rtc exposes clocks as PTP clocks to userspace, similar to ptp_kvm. >> If both the Virtio RTC device and this driver have special support for the >> current clocksource, time synchronization programs can use >> cross-timestamping using ioctl PTP_SYS_OFFSET_PRECISE2 aka >> PTP_SYS_OFFSET_PRECISE. Similar to ptp_kvm, system time synchronization >> with single-digit ns precision is possible with a quiescent reference clock >> (from the Virtio RTC device). This works even when the Virtio device >> response is slow compared to ptp_kvm hypercalls. > > Is PTP the right mechanism for this? As I understand it, PTP is a way > to precisely synchronize one clock with another. But in the case of > virt guests synchronizing against the host, it isn't really *another* > clock. It really is the *same* underlying clock. As the host clock > varies with temperature, for example, so does the guest clock. The only > difference is an offset and (on x86 perhaps) a mathematical scaling of > the frequency. > > I was looking at this another way, when I came across this virtio-rtc > work. > > My idea was just for the hypervisor to expose its own timekeeping > information — the counter/TSC value and TAI time at a given moment, > frequency of the counter, and the precision of both that frequency > (±PPM) and the TAI timestamp (±µs). > > By putting that in a host/guest shared data structure with a seqcount > for lockless updates, we can update it as time synchronization on the > host is refined, and we can even cleanly handle live migration where > the guest ends up on a completely different host. It allows for use > cases which *really* care (e.g. timestamping financial transactions) to > ensure that there is never even a moment of getting *wrong* timestamps > if they haven't yet resynced after a migration. I considered a similar approach as well, but integrating that with the kernel timekeeping seemed too much effort for the first step. However, reading the clock from user space would be much simpler. > > Now I'm trying to work out if I should attempt to reconcile with your > existing virtio-rtc work, or just decide that virtio-rtc isn't trying > to solve the actual problem that we have, and go ahead with something > different... ? > We are certainly interested into the discussed, say, "virtual timekeeper" mechanism, which would also solve a lot of problems for us (especially if it would be integrated with kernel timekeeping). Even without Linux kernel timekeeping, the virtual timekeeper would be useful to us for guests with simpler timekeeping, and potentially for user space applications. Our current intent is to at first try to upstream the current (RFC spec v3) feature set. I think the virtual timekeeper would be suitable as an optional feature of virtio_rtc (with Virtio, this could easily be added after initial upstreaming). It is also possible to have a virtio-rtc device only implement the virtual timekeeper, but not the other clock reading methods, if these are of no interest. Best regards, Peter
On Fri, 2024-03-08 at 11:32 +0100, Peter Hilber wrote: > On 07.03.24 15:02, David Woodhouse wrote: > > On Mon, 2023-12-18 at 08:38 +0100, Peter Hilber wrote: > > > RFC v3 updates > > > -------------- > > > > > > This series implements a driver for a virtio-rtc device conforming to spec > > > RFC v3 [1]. It now includes an RTC class driver with alarm, in addition to > > > the PTP clock driver already present before. > > > > > > This patch series depends on the patch series "treewide: Use clocksource id > > > for get_device_system_crosststamp()" [3]. Pull [4] to get the combined > > > series on top of mainline. > > > > > > Overview > > > -------- > > > > > > This patch series adds the virtio_rtc module, and related bugfixes. The > > > virtio_rtc module implements a driver compatible with the proposed Virtio > > > RTC device specification [1]. The Virtio RTC (Real Time Clock) device > > > provides information about current time. The device can provide different > > > clocks, e.g. for the UTC or TAI time standards, or for physical time > > > elapsed since some past epoch. > > > > Hm, should we allow UTC? If you tell me the time in UTC, then > > (sometimes) I still don't actually know what the time is, because some > > UTC seconds occur twice. UTC only makes sense if you provide the TAI > > offset, surely? Should the virtio_rtc specification make it mandatory > > to provide such? > > > > Otherwise you're just designing it to allow crappy hypervisors to > > expose incomplete information. > > > > Hi David, > > (adding virtio-comment@lists.oasis-open.org for spec discussion), > > thank you for your insightful comments. I think I take a broadly similar > view. The reason why the current spec and driver is like this is that I > took a pragmatic approach at first and only included features which work > out-of-the-box for the current Linux ecosystem. > > The current virtio_rtc features work similar to ptp_kvm, and therefore can > work out-of-the-box with time sync daemons such as chrony. > > As of RFC spec v3, UTC clock only is allowed. If mandating a TAI clock as > well, I am afraid that > > - in some (embedded) scenarios, the TAI clock may not be available > > - crappy hypervisors will pass off the UTC clock as the TAI clock. > > For the same reasons, I am also not sure about adding a *mandatory* TAI > offset to each readout. I don't know user-space software which would > leverage this already (at least not through the PTP clock interface). And > why would such software not go straight for the TAI clock instead? > > How about adding a requirement to the spec that the virtio-rtc device > SHOULD expose the TAI clock whenever it is available - would this address > your concerns? I think that would be too easy for implementors to miss, or decide not to obey. Or to get *wrong*, by exposing a TAI clock but actually putting UTC in it. I think I prefer to mandate the tai_offset field with the UTC clock. Crappy implementations will just set it to zero, but at least that gives a clear signal to the guests that it's *their* problem to resolve. > > > PTP clock interface > > > ------------------- > > > > > > virtio_rtc exposes clocks as PTP clocks to userspace, similar to ptp_kvm. > > > If both the Virtio RTC device and this driver have special support for the > > > current clocksource, time synchronization programs can use > > > cross-timestamping using ioctl PTP_SYS_OFFSET_PRECISE2 aka > > > PTP_SYS_OFFSET_PRECISE. Similar to ptp_kvm, system time synchronization > > > with single-digit ns precision is possible with a quiescent reference clock > > > (from the Virtio RTC device). This works even when the Virtio device > > > response is slow compared to ptp_kvm hypercalls. > > > > Is PTP the right mechanism for this? As I understand it, PTP is a way > > to precisely synchronize one clock with another. But in the case of > > virt guests synchronizing against the host, it isn't really *another* > > clock. It really is the *same* underlying clock. As the host clock > > varies with temperature, for example, so does the guest clock. The only > > difference is an offset and (on x86 perhaps) a mathematical scaling of > > the frequency. > > > > I was looking at this another way, when I came across this virtio-rtc > > work. > > > > My idea was just for the hypervisor to expose its own timekeeping > > information — the counter/TSC value and TAI time at a given moment, > > frequency of the counter, and the precision of both that frequency > > (±PPM) and the TAI timestamp (±µs). > > > > By putting that in a host/guest shared data structure with a seqcount > > for lockless updates, we can update it as time synchronization on the > > host is refined, and we can even cleanly handle live migration where > > the guest ends up on a completely different host. It allows for use > > cases which *really* care (e.g. timestamping financial transactions) to > > ensure that there is never even a moment of getting *wrong* timestamps > > if they haven't yet resynced after a migration. > > I considered a similar approach as well, but integrating that with the > kernel timekeeping seemed too much effort for the first step. However, > reading the clock from user space would be much simpler. Right. In fact my *first* use case was userspace, specifically in the context of https://github.com/aws/clock-bound — but anything we design for this absolutely has to be usable for kernel timekeeping too. It's also critical to solve the Live Migration problem. But is it so hard to integrate into the kernel timekeeping? My plan would have given us effectively an infinite number of cross-reads of the realtime clock vs. TSC. You don't have to actually read from a virtio device; you just read the TSC and do the maths, using the values in the shared memory region. Couldn't that be used to present a PTP device to the guest kernel just the same as you do at the moment? You could probably even simulate PPS with it. Typically with PPS we have to catch the hardware interrupt and then read the TSC as soon as possible thereafter. With this, you'd be able to *calculate* the TSC value at the start of the next second, and wouldn't have to suffer the real hardware latency :) > > > > Now I'm trying to work out if I should attempt to reconcile with your > > existing virtio-rtc work, or just decide that virtio-rtc isn't trying > > to solve the actual problem that we have, and go ahead with something > > different... ? > > > > We are certainly interested into the discussed, say, "virtual timekeeper" > mechanism, which would also solve a lot of problems for us (especially if > it would be integrated with kernel timekeeping). Even without Linux kernel > timekeeping, the virtual timekeeper would be useful to us for guests with > simpler timekeeping, and potentially for user space applications. > > Our current intent is to at first try to upstream the current (RFC spec v3) > feature set. I think the virtual timekeeper would be suitable as an > optional feature of virtio_rtc (with Virtio, this could easily be added > after initial upstreaming). It is also possible to have a virtio-rtc device > only implement the virtual timekeeper, but not the other clock reading > methods, if these are of no interest. Yeah, that might make sense. I was thinking of a simple ACPI/DT device exposing a page of memory and *maybe* an interrupt for when an update happens. (With the caveat that the interrupt would always occur too late by definition, so it's no substitute for using the seqlock correctly in applications that *really* care and are going to get fined millions of dollars for mis-timestamping their transactions.) But using the virtio-rtc device as the vehicle for that shared memory page is reasonable too. It's not even mutually exclusive; we could expose the *same* data structure in memory via whatever mechanisms we wanted. One other thing to note is I think we're being very naïve about the TSC on x86 hosts. Theoretically, the TSC for every vCPU might run at a different frequency, and even if they run at the same frequency they might be offset from each other. I'm happy to be naïve but I think we should be *explicitly* so, and just say for example that it's defined against vCPU0 so if other vCPUs are different then all bets are off. We *can* cope with TSC frequencies changing. Fundamentally, that's the whole *point*; NTP calibrates itself as the underlying frequency does change due to temperature changes, etc. — so a deliberate frequency scaling, or even a live migration, are just a slightly special case of the same thing. One thing I have added to the memory region is a migration counter. In the ideal case, guests will be happy just to use the hypervisor's synchronization. But in some cases the guests may want to do NTP (or PPS, PTP or something else) for themselves, to have more precise timekeeping than the host. Even if the host is advertising itself as stratum 16, the guest still needs to know of *migration*, because it has to consider itself unsynchronized when that happens.
On 08.03.24 13:33, David Woodhouse wrote: > On Fri, 2024-03-08 at 11:32 +0100, Peter Hilber wrote: >> On 07.03.24 15:02, David Woodhouse wrote: >>> On Mon, 2023-12-18 at 08:38 +0100, Peter Hilber wrote: >>>> RFC v3 updates >>>> -------------- >>>> >>>> This series implements a driver for a virtio-rtc device conforming to >>>> spec >>>> RFC v3 [1]. It now includes an RTC class driver with alarm, in >>>> addition to >>>> the PTP clock driver already present before. >>>> >>>> This patch series depends on the patch series "treewide: Use >>>> clocksource id >>>> for get_device_system_crosststamp()" [3]. Pull [4] to get the combined >>>> series on top of mainline. >>>> >>>> Overview >>>> -------- >>>> >>>> This patch series adds the virtio_rtc module, and related bugfixes. >>>> The >>>> virtio_rtc module implements a driver compatible with the proposed >>>> Virtio >>>> RTC device specification [1]. The Virtio RTC (Real Time Clock) device >>>> provides information about current time. The device can provide >>>> different >>>> clocks, e.g. for the UTC or TAI time standards, or for physical time >>>> elapsed since some past epoch. >>> >>> Hm, should we allow UTC? If you tell me the time in UTC, then >>> (sometimes) I still don't actually know what the time is, because some >>> UTC seconds occur twice. UTC only makes sense if you provide the TAI >>> offset, surely? Should the virtio_rtc specification make it mandatory >>> to provide such? >>> >>> Otherwise you're just designing it to allow crappy hypervisors to >>> expose incomplete information. >>> >> >> Hi David, >> >> (adding virtio-comment@lists.oasis-open.org for spec discussion), >> >> thank you for your insightful comments. I think I take a broadly similar >> view. The reason why the current spec and driver is like this is that I >> took a pragmatic approach at first and only included features which work >> out-of-the-box for the current Linux ecosystem. >> >> The current virtio_rtc features work similar to ptp_kvm, and therefore >> can >> work out-of-the-box with time sync daemons such as chrony. >> >> As of RFC spec v3, UTC clock only is allowed. If mandating a TAI clock >> as >> well, I am afraid that >> >> - in some (embedded) scenarios, the TAI clock may not be available >> >> - crappy hypervisors will pass off the UTC clock as the TAI clock. >> >> For the same reasons, I am also not sure about adding a *mandatory* TAI >> offset to each readout. I don't know user-space software which would >> leverage this already (at least not through the PTP clock interface). >> And >> why would such software not go straight for the TAI clock instead? >> >> How about adding a requirement to the spec that the virtio-rtc device >> SHOULD expose the TAI clock whenever it is available - would this >> address >> your concerns? > > I think that would be too easy for implementors to miss, or decide not > to obey. Or to get *wrong*, by exposing a TAI clock but actually > putting UTC in it. > > I think I prefer to mandate the tai_offset field with the UTC clock. > Crappy implementations will just set it to zero, but at least that > gives a clear signal to the guests that it's *their* problem to > resolve. To me there are some open questions regarding how this would work. Is there a use case for this with the v3 clock reading methods, or would it be enough to address this with the Virtio timekeeper? Looking at clock_adjtime(2), the tai_offset could be exposed, but probably best alongside some additional information about leap seconds. I am not aware about any user-space user. In addition, leap second smearing should also be addressed. > > > > >>>> PTP clock interface >>>> ------------------- >>>> >>>> virtio_rtc exposes clocks as PTP clocks to userspace, similar to >>>> ptp_kvm. >>>> If both the Virtio RTC device and this driver have special support for >>>> the >>>> current clocksource, time synchronization programs can use >>>> cross-timestamping using ioctl PTP_SYS_OFFSET_PRECISE2 aka >>>> PTP_SYS_OFFSET_PRECISE. Similar to ptp_kvm, system time >>>> synchronization >>>> with single-digit ns precision is possible with a quiescent reference >>>> clock >>>> (from the Virtio RTC device). This works even when the Virtio device >>>> response is slow compared to ptp_kvm hypercalls. >>> >>> Is PTP the right mechanism for this? As I understand it, PTP is a way >>> to precisely synchronize one clock with another. But in the case of >>> virt guests synchronizing against the host, it isn't really *another* >>> clock. It really is the *same* underlying clock. As the host clock >>> varies with temperature, for example, so does the guest clock. The only >>> difference is an offset and (on x86 perhaps) a mathematical scaling of >>> the frequency. >>> >>> I was looking at this another way, when I came across this virtio-rtc >>> work. >>> >>> My idea was just for the hypervisor to expose its own timekeeping >>> information — the counter/TSC value and TAI time at a given moment, >>> frequency of the counter, and the precision of both that frequency >>> (±PPM) and the TAI timestamp (±µs). >>> >>> By putting that in a host/guest shared data structure with a seqcount >>> for lockless updates, we can update it as time synchronization on the >>> host is refined, and we can even cleanly handle live migration where >>> the guest ends up on a completely different host. It allows for use >>> cases which *really* care (e.g. timestamping financial transactions) to >>> ensure that there is never even a moment of getting *wrong* timestamps >>> if they haven't yet resynced after a migration. >> >> I considered a similar approach as well, but integrating that with the >> kernel timekeeping seemed too much effort for the first step. However, >> reading the clock from user space would be much simpler. > > Right. In fact my *first* use case was userspace, specifically in the > context of https://github.com/aws/clock-bound — but anything we design > for this absolutely has to be usable for kernel timekeeping too. > > It's also critical to solve the Live Migration problem. > > But is it so hard to integrate into the kernel timekeeping? My plan > would have given us effectively an infinite number of cross-reads of > the realtime clock vs. TSC. You don't have to actually read from a > virtio device; you just read the TSC and do the maths, using the values > in the shared memory region. Couldn't that be used to present a PTP > device to the guest kernel just the same as you do at the moment? Yes, and it would also decrease the clock reading overhead (saving at least the Virtio response interrupt and associated scheduling). Would make sense to me. To be clear, with "kernel timekeeping integration" I meant to have timekeeping.c derive the time directly from the Virtio timekeeper. > > You could probably even simulate PPS with it. Typically with PPS we > have to catch the hardware interrupt and then read the TSC as soon as > possible thereafter. With this, you'd be able to *calculate* the TSC > value at the start of the next second, and wouldn't have to suffer the > real hardware latency :) > >>> >>> Now I'm trying to work out if I should attempt to reconcile with your >>> existing virtio-rtc work, or just decide that virtio-rtc isn't trying >>> to solve the actual problem that we have, and go ahead with something >>> different... ? >>> >> >> We are certainly interested into the discussed, say, "virtual >> timekeeper" >> mechanism, which would also solve a lot of problems for us (especially >> if >> it would be integrated with kernel timekeeping). Even without Linux >> kernel >> timekeeping, the virtual timekeeper would be useful to us for guests >> with >> simpler timekeeping, and potentially for user space applications. >> >> Our current intent is to at first try to upstream the current (RFC spec >> v3) >> feature set. I think the virtual timekeeper would be suitable as an >> optional feature of virtio_rtc (with Virtio, this could easily be added >> after initial upstreaming). It is also possible to have a virtio-rtc >> device >> only implement the virtual timekeeper, but not the other clock reading >> methods, if these are of no interest. > > Yeah, that might make sense. I was thinking of a simple ACPI/DT device > exposing a page of memory and *maybe* an interrupt for when an update > happens. (With the caveat that the interrupt would always occur too > late by definition, so it's no substitute for using the seqlock > correctly in applications that *really* care and are going to get fined > millions of dollars for mis-timestamping their transactions.) > > But using the virtio-rtc device as the vehicle for that shared memory > page is reasonable too. It's not even mutually exclusive; we could > expose the *same* data structure in memory via whatever mechanisms we > wanted. > > One other thing to note is I think we're being very naïve about the TSC > on x86 hosts. Theoretically, the TSC for every vCPU might run at a > different frequency, and even if they run at the same frequency they > might be offset from each other. I'm happy to be naïve but I think we > should be *explicitly* so, and just say for example that it's defined > against vCPU0 so if other vCPUs are different then all bets are off. ATM Virtio has no notion of vCPUs, or vCPU topology. So I wonder if you have an opinion on how to represent this in a platform-independent way. Thank you for the comments, Peter > > We *can* cope with TSC frequencies changing. Fundamentally, that's the > whole *point*; NTP calibrates itself as the underlying frequency does > change due to temperature changes, etc. — so a deliberate frequency > scaling, or even a live migration, are just a slightly special case of > the same thing. > > One thing I have added to the memory region is a migration counter. In > the ideal case, guests will be happy just to use the hypervisor's > synchronization. But in some cases the guests may want to do NTP (or > PPS, PTP or something else) for themselves, to have more precise > timekeeping than the host. Even if the host is advertising itself as > stratum 16, the guest still needs to know of *migration*, because it > has to consider itself unsynchronized when that happens
On Mon, 2024-03-11 at 19:24 +0100, Peter Hilber wrote: > On 08.03.24 13:33, David Woodhouse wrote: > > On Fri, 2024-03-08 at 11:32 +0100, Peter Hilber wrote: > > > On 07.03.24 15:02, David Woodhouse wrote: > > > > Hm, should we allow UTC? If you tell me the time in UTC, then > > > > (sometimes) I still don't actually know what the time is, because some > > > > UTC seconds occur twice. UTC only makes sense if you provide the TAI > > > > offset, surely? Should the virtio_rtc specification make it mandatory > > > > to provide such? > > > > > > > > Otherwise you're just designing it to allow crappy hypervisors to > > > > expose incomplete information. > > > > > > > > > > Hi David, > > > > > > (adding virtio-comment@lists.oasis-open.org for spec discussion), > > > > > > thank you for your insightful comments. I think I take a broadly similar > > > view. The reason why the current spec and driver is like this is that I > > > took a pragmatic approach at first and only included features which work > > > out-of-the-box for the current Linux ecosystem. > > > > > > The current virtio_rtc features work similar to ptp_kvm, and therefore > > > can work out-of-the-box with time sync daemons such as chrony. > > > > > > As of RFC spec v3, UTC clock only is allowed. If mandating a TAI clock > > > as well, I am afraid that > > > > > > - in some (embedded) scenarios, the TAI clock may not be available > > > > > > - crappy hypervisors will pass off the UTC clock as the TAI clock. > > > > > > For the same reasons, I am also not sure about adding a *mandatory* TAI > > > offset to each readout. I don't know user-space software which would > > > leverage this already (at least not through the PTP clock interface). > > > And why would such software not go straight for the TAI clock instead? > > > > > > How about adding a requirement to the spec that the virtio-rtc device > > > SHOULD expose the TAI clock whenever it is available - would this > > > address your concerns? > > > > I think that would be too easy for implementors to miss, or decide not > > to obey. Or to get *wrong*, by exposing a TAI clock but actually > > putting UTC in it. > > > > I think I prefer to mandate the tai_offset field with the UTC clock. > > Crappy implementations will just set it to zero, but at least that > > gives a clear signal to the guests that it's *their* problem to > > resolve. > > To me there are some open questions regarding how this would work. Is there > a use case for this with the v3 clock reading methods, or would it be > enough to address this with the Virtio timekeeper? > > Looking at clock_adjtime(2), the tai_offset could be exposed, but probably > best alongside some additional information about leap seconds. I am not > aware about any user-space user. In addition, leap second smearing should > also be addressed. > Is there even a standard yet for leap-smearing? Will it be linear over 1000 seconds like UTC-SLS? Or semi-raised-cosine over 24 hours, which I think is what Google does? Meta does something different again, don't they? Exposing UTC as the only clock reference is bad enough; when leap seconds happen there's a whole second during which you don't *know* which second it is. It seems odd to me, for a precision clock to be deliberately ambiguous about what the time is! But if the virtio-rtc clock is defined as UTC and then expose something *different* in it, that's even worse. You potentially end up providing inaccurate time for a whole *day* leading up to the leap second. I think you're right that leap second smearing should be addressed. At the very least, by making it clear that the virtio-rtc clock which advertises UTC shall be used *only* for UTC, never UTC-SLS or any other yet-to-be-defined variant. Please make it explicit that any hypervisor which wants to advertise a smeared clock shall define a new type which specifies the precise smearing algorithm and cannot be conflated with the one you're defining here. > > One other thing to note is I think we're being very naïve about the TSC > > on x86 hosts. Theoretically, the TSC for every vCPU might run at a > > different frequency, and even if they run at the same frequency they > > might be offset from each other. I'm happy to be naïve but I think we > > should be *explicitly* so, and just say for example that it's defined > > against vCPU0 so if other vCPUs are different then all bets are off. > > ATM Virtio has no notion of vCPUs, or vCPU topology. So I wonder if you > have an opinion on how to represent this in a platform-independent way. Well, it doesn't have a notion of TSCs either; you include that by implicit reference don't you?
On 12.03.24 18:15, David Woodhouse wrote: > On Mon, 2024-03-11 at 19:24 +0100, Peter Hilber wrote: >> On 08.03.24 13:33, David Woodhouse wrote: >>> On Fri, 2024-03-08 at 11:32 +0100, Peter Hilber wrote: >>>> On 07.03.24 15:02, David Woodhouse wrote: >>>>> Hm, should we allow UTC? If you tell me the time in UTC, then >>>>> (sometimes) I still don't actually know what the time is, because some >>>>> UTC seconds occur twice. UTC only makes sense if you provide the TAI >>>>> offset, surely? Should the virtio_rtc specification make it mandatory >>>>> to provide such? >>>>> >>>>> Otherwise you're just designing it to allow crappy hypervisors to >>>>> expose incomplete information. >>>>> >>>> >>>> Hi David, >>>> >>>> (adding virtio-comment@lists.oasis-open.org for spec discussion), >>>> >>>> thank you for your insightful comments. I think I take a broadly similar >>>> view. The reason why the current spec and driver is like this is that I >>>> took a pragmatic approach at first and only included features which work >>>> out-of-the-box for the current Linux ecosystem. >>>> >>>> The current virtio_rtc features work similar to ptp_kvm, and therefore >>>> can work out-of-the-box with time sync daemons such as chrony. >>>> >>>> As of RFC spec v3, UTC clock only is allowed. If mandating a TAI clock >>>> as well, I am afraid that >>>> >>>> - in some (embedded) scenarios, the TAI clock may not be available >>>> >>>> - crappy hypervisors will pass off the UTC clock as the TAI clock. >>>> >>>> For the same reasons, I am also not sure about adding a *mandatory* TAI >>>> offset to each readout. I don't know user-space software which would >>>> leverage this already (at least not through the PTP clock interface). >>>> And why would such software not go straight for the TAI clock instead? >>>> >>>> How about adding a requirement to the spec that the virtio-rtc device >>>> SHOULD expose the TAI clock whenever it is available - would this >>>> address your concerns? >>> >>> I think that would be too easy for implementors to miss, or decide not >>> to obey. Or to get *wrong*, by exposing a TAI clock but actually >>> putting UTC in it. >>> >>> I think I prefer to mandate the tai_offset field with the UTC clock. >>> Crappy implementations will just set it to zero, but at least that >>> gives a clear signal to the guests that it's *their* problem to >>> resolve. >> >> To me there are some open questions regarding how this would work. Is there >> a use case for this with the v3 clock reading methods, or would it be >> enough to address this with the Virtio timekeeper? >> >> Looking at clock_adjtime(2), the tai_offset could be exposed, but probably >> best alongside some additional information about leap seconds. I am not >> aware about any user-space user. In addition, leap second smearing should >> also be addressed. >> > > Is there even a standard yet for leap-smearing? Will it be linear over > 1000 seconds like UTC-SLS? Or semi-raised-cosine over 24 hours, which I > think is what Google does? Meta does something different again, don't > they? > > Exposing UTC as the only clock reference is bad enough; when leap > seconds happen there's a whole second during which you don't *know* > which second it is. It seems odd to me, for a precision clock to be > deliberately ambiguous about what the time is! Just to be clear, the device can perfectly expose only a TAI reference clock (or both UTC and TAI), the spec is just completely open about this, as it tries to work for diverse use cases. > > But if the virtio-rtc clock is defined as UTC and then expose something > *different* in it, that's even worse. You potentially end up providing > inaccurate time for a whole *day* leading up to the leap second. > > I think you're right that leap second smearing should be addressed. At > the very least, by making it clear that the virtio-rtc clock which > advertises UTC shall be used *only* for UTC, never UTC-SLS or any other > yet-to-be-defined variant. > Agreed. > Please make it explicit that any hypervisor which wants to advertise a > smeared clock shall define a new type which specifies the precise > smearing algorithm and cannot be conflated with the one you're defining > here. > I will add a requirement that the UTC clock can never have smeared/smoothed leap seconds. I think that not every vendor would bother to first add a definition of a smearing algorithm. Also, I think in some cases knowing the precise smearing algorithm might not be important (when having the same time as the hypervisor is enough and accuracy w.r.t. actual time is less important). So maybe I should add a VIRTIO_RTC_CLOCK_UTC_SMEARED clock type, which for now could catch every UTC-like clock which smears/smoothes leap seconds, where the vendor cannot be bothered to add the smearing algorithm to spec and implementations. As for UTC-SLS, this *could* also be added, although [1] says It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." [1] https://datatracker.ietf.org/doc/html/draft-kuhn-leapsecond-00 >>> One other thing to note is I think we're being very naïve about the TSC >>> on x86 hosts. Theoretically, the TSC for every vCPU might run at a >>> different frequency, and even if they run at the same frequency they >>> might be offset from each other. I'm happy to be naïve but I think we >>> should be *explicitly* so, and just say for example that it's defined >>> against vCPU0 so if other vCPUs are different then all bets are off. >> >> ATM Virtio has no notion of vCPUs, or vCPU topology. So I wonder if you >> have an opinion on how to represent this in a platform-independent way. > > Well, it doesn't have a notion of TSCs either; you include that by > implicit reference don't you? I think I can add a SHOULD requirement which vaguely refers to vCPU 0, or boot vCPU. But the Virtio device is not necessarily hosted by a hypervisor, so the device might not even know which vCPUs there are. E.g. there is even interest to make virtio-rtc work as part of the virtio-net device (which might be implemented in hardware). Thanks for the comments, Peter
On 13/03/2024 10:45:54+0100, Peter Hilber wrote: > > Exposing UTC as the only clock reference is bad enough; when leap > > seconds happen there's a whole second during which you don't *know* > > which second it is. It seems odd to me, for a precision clock to be > > deliberately ambiguous about what the time is! > > Just to be clear, the device can perfectly expose only a TAI reference > clock (or both UTC and TAI), the spec is just completely open about this, > as it tries to work for diverse use cases. > > > > > But if the virtio-rtc clock is defined as UTC and then expose something > > *different* in it, that's even worse. You potentially end up providing > > inaccurate time for a whole *day* leading up to the leap second. > > > > I think you're right that leap second smearing should be addressed. At > > the very least, by making it clear that the virtio-rtc clock which > > advertises UTC shall be used *only* for UTC, never UTC-SLS or any other > > yet-to-be-defined variant. > > > > Agreed. > > > Please make it explicit that any hypervisor which wants to advertise a > > smeared clock shall define a new type which specifies the precise > > smearing algorithm and cannot be conflated with the one you're defining > > here. > > > > I will add a requirement that the UTC clock can never have smeared/smoothed > leap seconds. > > I think that not every vendor would bother to first add a definition of a > smearing algorithm. Also, I think in some cases knowing the precise > smearing algorithm might not be important (when having the same time as the > hypervisor is enough and accuracy w.r.t. actual time is less important). > > So maybe I should add a VIRTIO_RTC_CLOCK_UTC_SMEARED clock type, which for > now could catch every UTC-like clock which smears/smoothes leap seconds, > where the vendor cannot be bothered to add the smearing algorithm to spec > and implementations. > I still don't know anything about virtio but under Linux, an RTC is always UTC (or localtime when dual booting but let's not care) and never accounts for leap seconds. Having an RTC and RTC driver behaving differently would be super inconvenient. Why don't you leave this to userspace? I guess I'm still questioning whether this is the correct interface to expose the host system time instead of an actual RTC.
On Wed, 2024-03-13 at 12:18 +0100, Alexandre Belloni wrote: > > I still don't know anything about virtio but under Linux, an RTC is > always UTC (or localtime when dual booting but let's not care) and never > accounts for leap seconds. Having an RTC and RTC driver behaving > differently would be super inconvenient. Why don't you leave this to > userspace? Well yes, we don't need to expose *anything* from the hypervisor and we can leave it all to guest userspace. We can run NTP on every single one of *hundreds* of guests, leaving them all to duplicate the work of calibrating the *same* underlying oscillator. I thought we were trying to avoid that, by having the hypervisor tell them what the time was. If we're going to do that, we need it to be sufficiently precise (and some clients want to *know* the precision), and above all we need it to be *unambiguous*. If the hypervisor says that the time is 3692217600.001, then the guest doesn't actually know *which* 3692217600.001 it is, and thus it still doesn't know the time to an accuracy better than 1 second. And if we start allowing the hypervisor to smear clocks in some other underspecified ways, then we end up with errors of up to 1 second in the clock for long periods of time *around* the leap second. We need to avoid that ambiguity. > I guess I'm still questioning whether this is the correct interface to > expose the host system time instead of an actual RTC. If an RTC device is able to report '23:59:60' as the time of day, I suppose that *could* resolve the ambiguity. But talking to a device is slow; we want guests to be able to know the time — accurately — with a simple counter/tsc read and some arithmetic. Which means *paired* reads of 'RTC' and the counter, and a precise indication of the counter frequency.
On Wed, 2024-03-13 at 10:45 +0100, Peter Hilber wrote: > On 12.03.24 18:15, David Woodhouse wrote: > > On Mon, 2024-03-11 at 19:24 +0100, Peter Hilber wrote: > > > On 08.03.24 13:33, David Woodhouse wrote: > > > > On Fri, 2024-03-08 at 11:32 +0100, Peter Hilber wrote: > > > > > On 07.03.24 15:02, David Woodhouse wrote: > > > > > > Hm, should we allow UTC? If you tell me the time in UTC, then > > > > > > (sometimes) I still don't actually know what the time is, because some > > > > > > UTC seconds occur twice. UTC only makes sense if you provide the TAI > > > > > > offset, surely? Should the virtio_rtc specification make it mandatory > > > > > > to provide such? > > > > > > > > > > > > Otherwise you're just designing it to allow crappy hypervisors to > > > > > > expose incomplete information. > > > > > > > > > > > > > > > > Hi David, > > > > > > > > > > (adding virtio-comment@lists.oasis-open.org for spec discussion), > > > > > > > > > > thank you for your insightful comments. I think I take a broadly similar > > > > > view. The reason why the current spec and driver is like this is that I > > > > > took a pragmatic approach at first and only included features which work > > > > > out-of-the-box for the current Linux ecosystem. > > > > > > > > > > The current virtio_rtc features work similar to ptp_kvm, and therefore > > > > > can work out-of-the-box with time sync daemons such as chrony. > > > > > > > > > > As of RFC spec v3, UTC clock only is allowed. If mandating a TAI clock > > > > > as well, I am afraid that > > > > > > > > > > - in some (embedded) scenarios, the TAI clock may not be available > > > > > > > > > > - crappy hypervisors will pass off the UTC clock as the TAI clock. > > > > > > > > > > For the same reasons, I am also not sure about adding a *mandatory* TAI > > > > > offset to each readout. I don't know user-space software which would > > > > > leverage this already (at least not through the PTP clock interface). > > > > > And why would such software not go straight for the TAI clock instead? > > > > > > > > > > How about adding a requirement to the spec that the virtio-rtc device > > > > > SHOULD expose the TAI clock whenever it is available - would this > > > > > address your concerns? > > > > > > > > I think that would be too easy for implementors to miss, or decide not > > > > to obey. Or to get *wrong*, by exposing a TAI clock but actually > > > > putting UTC in it. > > > > > > > > I think I prefer to mandate the tai_offset field with the UTC clock. > > > > Crappy implementations will just set it to zero, but at least that > > > > gives a clear signal to the guests that it's *their* problem to > > > > resolve. > > > > > > To me there are some open questions regarding how this would work. Is there > > > a use case for this with the v3 clock reading methods, or would it be > > > enough to address this with the Virtio timekeeper? > > > > > > Looking at clock_adjtime(2), the tai_offset could be exposed, but probably > > > best alongside some additional information about leap seconds. I am not > > > aware about any user-space user. In addition, leap second smearing should > > > also be addressed. > > > > > > > Is there even a standard yet for leap-smearing? Will it be linear over > > 1000 seconds like UTC-SLS? Or semi-raised-cosine over 24 hours, which I > > think is what Google does? Meta does something different again, don't > > they? > > > > Exposing UTC as the only clock reference is bad enough; when leap > > seconds happen there's a whole second during which you don't *know* > > which second it is. It seems odd to me, for a precision clock to be > > deliberately ambiguous about what the time is! > > Just to be clear, the device can perfectly expose only a TAI reference > clock (or both UTC and TAI), the spec is just completely open about this, > as it tries to work for diverse use cases. As long as the guest *knows* what it's getting, sure. > > > > But if the virtio-rtc clock is defined as UTC and then expose something > > *different* in it, that's even worse. You potentially end up providing > > inaccurate time for a whole *day* leading up to the leap second. > > > > I think you're right that leap second smearing should be addressed. At > > the very least, by making it clear that the virtio-rtc clock which > > advertises UTC shall be used *only* for UTC, never UTC-SLS or any other > > yet-to-be-defined variant. > > > > Agreed. > > > Please make it explicit that any hypervisor which wants to advertise a > > smeared clock shall define a new type which specifies the precise > > smearing algorithm and cannot be conflated with the one you're defining > > here. > > > > I will add a requirement that the UTC clock can never have smeared/smoothed > leap seconds. Thanks. > I think that not every vendor would bother to first add a definition of a > smearing algorithm. Also, I think in some cases knowing the precise > smearing algorithm might not be important (when having the same time as the > hypervisor is enough and accuracy w.r.t. actual time is less important). > > So maybe I should add a VIRTIO_RTC_CLOCK_UTC_SMEARED clock type, which for > now could catch every UTC-like clock which smears/smoothes leap seconds, > where the vendor cannot be bothered to add the smearing algorithm to spec > and implementations. Please $DEITY no. Surely the whole point of this effort is to provide guests with precise and *unambiguous* knowledge of what the time is? Using UTC is bad enough, because for a UTC timestamp in the middle of a leap second the guest can't know know *which* occurrence of that leap second it is, so it might be wrong by a second. To resolve that ambiguity needs a leap indicator and/or tai_offset field. But if you allow and encourage the use of smeared time without even a specification of *how* it's smeared... that's even worse. You have an unknown inaccuracy of up to a second for whole periods of time around a leap second. That's surely the *antithesis* of what we're trying to do here? Without an actual definition of the smearing, how is a guest actually supposed to know what time it is? (I suppose you could add a tai_offset_nanoseconds field? I don't know that I want to *encourage* that thought process...) > As for UTC-SLS, this *could* also be added, although [1] says > > It is inappropriate to use Internet-Drafts as reference material or > to cite them other than as "work in progress." > > [1] https://datatracker.ietf.org/doc/html/draft-kuhn-leapsecond-00 > > > > > One other thing to note is I think we're being very naïve about the TSC > > > > on x86 hosts. Theoretically, the TSC for every vCPU might run at a > > > > different frequency, and even if they run at the same frequency they > > > > might be offset from each other. I'm happy to be naïve but I think we > > > > should be *explicitly* so, and just say for example that it's defined > > > > against vCPU0 so if other vCPUs are different then all bets are off. > > > > > > ATM Virtio has no notion of vCPUs, or vCPU topology. So I wonder if you > > > have an opinion on how to represent this in a platform-independent way. > > > > Well, it doesn't have a notion of TSCs either; you include that by > > implicit reference don't you? > > I think I can add a SHOULD requirement which vaguely refers to vCPU 0, or > boot vCPU. But the Virtio device is not necessarily hosted by a hypervisor, > so the device might not even know which vCPUs there are. E.g. there is even > interest to make virtio-rtc work as part of the virtio-net device (which > might be implemented in hardware). Sure, but those implementations aren't going to offer the TSC pairing at all, are they?
On 13/03/2024 12:29:38+0000, David Woodhouse wrote: > On Wed, 2024-03-13 at 12:18 +0100, Alexandre Belloni wrote: > > > > I still don't know anything about virtio but under Linux, an RTC is > > always UTC (or localtime when dual booting but let's not care) and never > > accounts for leap seconds. Having an RTC and RTC driver behaving > > differently would be super inconvenient. Why don't you leave this to > > userspace? > > Well yes, we don't need to expose *anything* from the hypervisor and we > can leave it all to guest userspace. We can run NTP on every single one > of *hundreds* of guests, leaving them all to duplicate the work of > calibrating the *same* underlying oscillator. > Really, I see the point of sharing the time accurately between the host and the guest and not duplicating the effort. This is not what I am objecting to. > I thought we were trying to avoid that, by having the hypervisor tell > them what the time was. If we're going to do that, we need it to be > sufficiently precise (and some clients want to *know* the precision), > and above all we need it to be *unambiguous*. > > If the hypervisor says that the time is 3692217600.001, then the guest > doesn't actually know *which* 3692217600.001 it is, and thus it still > doesn't know the time to an accuracy better than 1 second. > The RTC subsystem has a 1 second resolution and this is not going to change because there is no point doing so for the hardware, to get the time precisely, UIE MUST be used there is no vendor that will just support reading the time/date or timestamp as this is way too imprecise. > And if we start allowing the hypervisor to smear clocks in some other > underspecified ways, then we end up with errors of up to 1 second in > the clock for long periods of time *around* the leap second. > > We need to avoid that ambiguity. Exactly my point and I said, reading an RTC is always UTC and never handles leap seconds so if userspace doesn't handle the leap second and updates the RTC, too bad. There are obviously no RTC that will smear clock unless instructed to, so the hypervisor must not smear the clock. Note that this is not an issue for the subsystem because if you are not capable to track an accurate clock, the RTC itself will have drifted by much more than a second in between leap second inclusions. > > > I guess I'm still questioning whether this is the correct interface to > > expose the host system time instead of an actual RTC. > > If an RTC device is able to report '23:59:60' as the time of day, I > suppose that *could* resolve the ambiguity. But talking to a device is > slow; we want guests to be able to know the time — accurately — with a > simple counter/tsc read and some arithmetic. Which means *paired* reads > of 'RTC' and the counter, and a precise indication of the counter > frequency. 23:59:60 is not and will never be allowed in the RTC subsystem as this is an invalid value for the hardware. The TSC or whatever CPU counter/clock that is used to keep the system time is not an RTC, I don't get why it has to be exposed as such to the guests. PTP is fine and precise, RTC is not.
On Wed, 2024-03-13 at 13:58 +0100, Alexandre Belloni wrote: > The TSC or whatever CPU counter/clock that is used to keep the system > time is not an RTC, I don't get why it has to be exposed as such to the > guests. PTP is fine and precise, RTC is not. Ah, I see. But the point of the virtio_rtc is not really to expose that CPU counter. The point is to report the wallclock time, just like an actual RTC. The real difference is the *precision*. The virtio_rtc device has a facility to *also* expose the counter, because that's what we actually need to gain that precision... Applications don't read the RTC every time they want to know what the time is. These days, they don't even make a system call; it's done entirely in userspace mode. The kernel exposes some shared memory, essentially saying "the counter was X at time Y, and runs at Z Hz". Then applications just read the CPU counter and do some arithmetic. As we require more and more precision in the calibration, it becomes important to get *paired* readings of the CPU counter and the wallclock time at precisely the same moment. If the guest has to read one and then the other, potentially taking interrupts, getting preempted and suffering steal/SMI time in the middle, that introduces an error which is increasingly significant as we increasingly care about precision. Peter's proposal exposes the pairs of {X,Y} and leaves *all* the guest kernels having to repeat readings over time and perform the calibration as the underlying hardware oscillator frequency (Z) drifts with temperature. I'm trying to get him to let the hypervisor expose the calibrated frequency Z too. Along with *error* bounds for ±δX and ±δZ. Which aside from reducing the duplication of effort, will *also* fix the problem of live migration where *all* those things suffer a step change and leave the guest with an inaccurate clock but not knowing it. But that part isn't relevant to the RTC, as you say. RTC doesn't care about that level of precision; it's just what the system uses to know roughly what year it is, when it powers up. (And isn't even really trusted for that, which is a large part of why I added the X509_V_FLAG_NO_CHECK_TIME flag to OpenSSL, so Secure Boot doesn't break when the RTC is catastrophically wrong :) If you're asking why patch 7/7 in Peter's series exists to expose the virtio clock through RTC, and you're not particularly interested in the first six, I suppose that's a fair question. As is the question of "why is it called virtio_rtc not virtio_ptp?". But let me turn it around: if the kernel has access to this virtio device and *not* any other RTC, why *wouldn't* the kernel use the time from it? The fact that it can optionally *also* provide paired readings with the CPU counter doesn't actually *hurt* for the RTC use case, does it?
On 13.03.24 12:18, Alexandre Belloni wrote: > On 13/03/2024 10:45:54+0100, Peter Hilber wrote: >>> Exposing UTC as the only clock reference is bad enough; when leap >>> seconds happen there's a whole second during which you don't *know* >>> which second it is. It seems odd to me, for a precision clock to be >>> deliberately ambiguous about what the time is! >> >> Just to be clear, the device can perfectly expose only a TAI reference >> clock (or both UTC and TAI), the spec is just completely open about this, >> as it tries to work for diverse use cases. >> >>> >>> But if the virtio-rtc clock is defined as UTC and then expose something >>> *different* in it, that's even worse. You potentially end up providing >>> inaccurate time for a whole *day* leading up to the leap second. >>> >>> I think you're right that leap second smearing should be addressed. At >>> the very least, by making it clear that the virtio-rtc clock which >>> advertises UTC shall be used *only* for UTC, never UTC-SLS or any other >>> yet-to-be-defined variant. >>> >> >> Agreed. >> >>> Please make it explicit that any hypervisor which wants to advertise a >>> smeared clock shall define a new type which specifies the precise >>> smearing algorithm and cannot be conflated with the one you're defining >>> here. >>> >> >> I will add a requirement that the UTC clock can never have smeared/smoothed >> leap seconds. >> >> I think that not every vendor would bother to first add a definition of a >> smearing algorithm. Also, I think in some cases knowing the precise >> smearing algorithm might not be important (when having the same time as the >> hypervisor is enough and accuracy w.r.t. actual time is less important). >> >> So maybe I should add a VIRTIO_RTC_CLOCK_UTC_SMEARED clock type, which for >> now could catch every UTC-like clock which smears/smoothes leap seconds, >> where the vendor cannot be bothered to add the smearing algorithm to spec >> and implementations. >> > > I still don't know anything about virtio but under Linux, an RTC is > always UTC (or localtime when dual booting but let's not care) and never > accounts for leap seconds. Having an RTC and RTC driver behaving > differently would be super inconvenient. Why don't you leave this to > userspace? > > I guess I'm still questioning whether this is the correct interface to > expose the host system time instead of an actual RTC. > virtio_rtc only registers RTC class devices for virtio_rtc clock type UTC, so adding an UTC_SMEARED clock type would avoid leap seconds for the RTC class. But I understand that there are more concerns and I will re-consider. From my POV CLOCK_{REALTIME,MONOTONIC}_ALARM support is very important however. So the only alternative to me seems to be adding an alternative alarmtimer backend (and time synchronization through user space). Thanks for the comment, Peter
On 13/03/2024 14:06:42+0000, David Woodhouse wrote: > If you're asking why patch 7/7 in Peter's series exists to expose the > virtio clock through RTC, and you're not particularly interested in the > first six, I suppose that's a fair question. As is the question of "why > is it called virtio_rtc not virtio_ptp?". > Exactly my question, thanks :) > But let me turn it around: if the kernel has access to this virtio > device and *not* any other RTC, why *wouldn't* the kernel use the time > from it? The fact that it can optionally *also* provide paired readings > with the CPU counter doesn't actually *hurt* for the RTC use case, does > it? As long as it doesn't behave differently from the other RTC, I'm fine with this. This is important because I don't want to carry any special infrastructure for this driver or to have to special case this driver later on because it is incompatible with some evolution of the subsystem.
On 13.03.24 13:45, David Woodhouse wrote: > On Wed, 2024-03-13 at 10:45 +0100, Peter Hilber wrote: >> On 12.03.24 18:15, David Woodhouse wrote: >>> On Mon, 2024-03-11 at 19:24 +0100, Peter Hilber wrote: >>>> On 08.03.24 13:33, David Woodhouse wrote: >>>>> On Fri, 2024-03-08 at 11:32 +0100, Peter Hilber wrote: >>>>>> On 07.03.24 15:02, David Woodhouse wrote: >>>>>>> Hm, should we allow UTC? If you tell me the time in UTC, then >>>>>>> (sometimes) I still don't actually know what the time is, because some >>>>>>> UTC seconds occur twice. UTC only makes sense if you provide the TAI >>>>>>> offset, surely? Should the virtio_rtc specification make it mandatory >>>>>>> to provide such? >>>>>>> >>>>>>> Otherwise you're just designing it to allow crappy hypervisors to >>>>>>> expose incomplete information. >>>>>>> >>>>>> >>>>>> Hi David, >>>>>> >>>>>> (adding virtio-comment@lists.oasis-open.org for spec discussion), >>>>>> >>>>>> thank you for your insightful comments. I think I take a broadly similar >>>>>> view. The reason why the current spec and driver is like this is that I >>>>>> took a pragmatic approach at first and only included features which work >>>>>> out-of-the-box for the current Linux ecosystem. >>>>>> >>>>>> The current virtio_rtc features work similar to ptp_kvm, and therefore >>>>>> can work out-of-the-box with time sync daemons such as chrony. >>>>>> >>>>>> As of RFC spec v3, UTC clock only is allowed. If mandating a TAI clock >>>>>> as well, I am afraid that >>>>>> >>>>>> - in some (embedded) scenarios, the TAI clock may not be available >>>>>> >>>>>> - crappy hypervisors will pass off the UTC clock as the TAI clock. >>>>>> >>>>>> For the same reasons, I am also not sure about adding a *mandatory* TAI >>>>>> offset to each readout. I don't know user-space software which would >>>>>> leverage this already (at least not through the PTP clock interface). >>>>>> And why would such software not go straight for the TAI clock instead? >>>>>> >>>>>> How about adding a requirement to the spec that the virtio-rtc device >>>>>> SHOULD expose the TAI clock whenever it is available - would this >>>>>> address your concerns? >>>>> >>>>> I think that would be too easy for implementors to miss, or decide not >>>>> to obey. Or to get *wrong*, by exposing a TAI clock but actually >>>>> putting UTC in it. >>>>> >>>>> I think I prefer to mandate the tai_offset field with the UTC clock. >>>>> Crappy implementations will just set it to zero, but at least that >>>>> gives a clear signal to the guests that it's *their* problem to >>>>> resolve. >>>> >>>> To me there are some open questions regarding how this would work. Is there >>>> a use case for this with the v3 clock reading methods, or would it be >>>> enough to address this with the Virtio timekeeper? >>>> >>>> Looking at clock_adjtime(2), the tai_offset could be exposed, but probably >>>> best alongside some additional information about leap seconds. I am not >>>> aware about any user-space user. In addition, leap second smearing should >>>> also be addressed. >>>> >>> >>> Is there even a standard yet for leap-smearing? Will it be linear over >>> 1000 seconds like UTC-SLS? Or semi-raised-cosine over 24 hours, which I >>> think is what Google does? Meta does something different again, don't >>> they? >>> >>> Exposing UTC as the only clock reference is bad enough; when leap >>> seconds happen there's a whole second during which you don't *know* >>> which second it is. It seems odd to me, for a precision clock to be >>> deliberately ambiguous about what the time is! >> >> Just to be clear, the device can perfectly expose only a TAI reference >> clock (or both UTC and TAI), the spec is just completely open about this, >> as it tries to work for diverse use cases. > > As long as the guest *knows* what it's getting, sure. > >>> >>> But if the virtio-rtc clock is defined as UTC and then expose something >>> *different* in it, that's even worse. You potentially end up providing >>> inaccurate time for a whole *day* leading up to the leap second. >>> >>> I think you're right that leap second smearing should be addressed. At >>> the very least, by making it clear that the virtio-rtc clock which >>> advertises UTC shall be used *only* for UTC, never UTC-SLS or any other >>> yet-to-be-defined variant. >>> >> >> Agreed. >> >>> Please make it explicit that any hypervisor which wants to advertise a >>> smeared clock shall define a new type which specifies the precise >>> smearing algorithm and cannot be conflated with the one you're defining >>> here. >>> >> >> I will add a requirement that the UTC clock can never have smeared/smoothed >> leap seconds. > > Thanks. > >> I think that not every vendor would bother to first add a definition of a >> smearing algorithm. Also, I think in some cases knowing the precise >> smearing algorithm might not be important (when having the same time as the >> hypervisor is enough and accuracy w.r.t. actual time is less important). >> >> So maybe I should add a VIRTIO_RTC_CLOCK_UTC_SMEARED clock type, which for >> now could catch every UTC-like clock which smears/smoothes leap seconds, >> where the vendor cannot be bothered to add the smearing algorithm to spec >> and implementations. > > Please $DEITY no. > > Surely the whole point of this effort is to provide guests with precise > and *unambiguous* knowledge of what the time is? I would say, a fundamental point of this effort is to enable such implementations, and to detect if a device is promising to support this. Where we might differ is as to whether the Virtio clock *for every implementation* has to be *continuously* accurate w.r.t. a time standard, or whether *for some implementations* it could be enough that all guests in the local system have the same, precise local notion of time, which might be off from the actual time standard. Also, cf. ptp_kvm, which AFAIU doesn't address leap seconds at all... With your described use case the UTC_SMEARED clock should of course not be used. The UTC_SMEARED clock would get a distinct name through udev, like /dev/ptp_virtio_utc_smeared, so the incompatibility could at least be detected. > > Using UTC is bad enough, because for a UTC timestamp in the middle of a > leap second the guest can't know know *which* occurrence of that leap > second it is, so it might be wrong by a second. To resolve that > ambiguity needs a leap indicator and/or tai_offset field. I agree that virtio-rtc should communicate this. The question is, what exactly, and for which clock read request? As for PTP clocks: - It doesn't fit into the ioctl PTP_SYS_OFFSET_PRECISE2. - The clock_adjtime(2) tai_offset and return value could be set (if upstream will accept this). Would this help? As discussed, user space would need to interpret this (and currently no dynamic POSIX clock sets this). > > But if you allow and encourage the use of smeared time without even a > specification of *how* it's smeared... that's even worse. You have an > unknown inaccuracy of up to a second for whole periods of time around a > leap second. That's surely the *antithesis* of what we're trying to do > here? Without an actual definition of the smearing, how is a guest > actually supposed to know what time it is? As discussed above, I think in some use cases it is enough for the guest to have a precise notion of time shared with the other guests. > > (I suppose you could add a tai_offset_nanoseconds field? I don't know > that I want to *encourage* that thought process...) > >> As for UTC-SLS, this *could* also be added, although [1] says >> >> It is inappropriate to use Internet-Drafts as reference material or >> to cite them other than as "work in progress." >> >> [1] https://datatracker.ietf.org/doc/html/draft-kuhn-leapsecond-00 >> >>>>> One other thing to note is I think we're being very naïve about the TSC >>>>> on x86 hosts. Theoretically, the TSC for every vCPU might run at a >>>>> different frequency, and even if they run at the same frequency they >>>>> might be offset from each other. I'm happy to be naïve but I think we >>>>> should be *explicitly* so, and just say for example that it's defined >>>>> against vCPU0 so if other vCPUs are different then all bets are off. >>>> >>>> ATM Virtio has no notion of vCPUs, or vCPU topology. So I wonder if you >>>> have an opinion on how to represent this in a platform-independent way. >>> >>> Well, it doesn't have a notion of TSCs either; you include that by >>> implicit reference don't you? >> >> I think I can add a SHOULD requirement which vaguely refers to vCPU 0, or >> boot vCPU. But the Virtio device is not necessarily hosted by a hypervisor, >> so the device might not even know which vCPUs there are. E.g. there is even >> interest to make virtio-rtc work as part of the virtio-net device (which >> might be implemented in hardware). > > Sure, but those implementations aren't going to offer the TSC pairing > at all, are they? > They could offer an Intel ART pairing (some physical PTP NICs are already doing this, look for the convert_art_to_tsc() users). Thanks for the comments, Peter
On 13.03.24 15:06, David Woodhouse wrote: > On Wed, 2024-03-13 at 13:58 +0100, Alexandre Belloni wrote: >> The TSC or whatever CPU counter/clock that is used to keep the system >> time is not an RTC, I don't get why it has to be exposed as such to the >> guests. PTP is fine and precise, RTC is not. > > Ah, I see. But the point of the virtio_rtc is not really to expose that > CPU counter. The point is to report the wallclock time, just like an > actual RTC. The real difference is the *precision*. > > The virtio_rtc device has a facility to *also* expose the counter, > because that's what we actually need to gain that precision... > > Applications don't read the RTC every time they want to know what the > time is. These days, they don't even make a system call; it's done > entirely in userspace mode. The kernel exposes some shared memory, > essentially saying "the counter was X at time Y, and runs at Z Hz". > Then applications just read the CPU counter and do some arithmetic. > > As we require more and more precision in the calibration, it becomes > important to get *paired* readings of the CPU counter and the wallclock > time at precisely the same moment. If the guest has to read one and > then the other, potentially taking interrupts, getting preempted and > suffering steal/SMI time in the middle, that introduces an error which > is increasingly significant as we increasingly care about precision. > > Peter's proposal exposes the pairs of {X,Y} and leaves *all* the guest > kernels having to repeat readings over time and perform the calibration > as the underlying hardware oscillator frequency (Z) drifts with > temperature. I'm trying to get him to let the hypervisor expose the > calibrated frequency Z too. Along with *error* bounds for ±δX and ±δZ. > Which aside from reducing the duplication of effort, will *also* fix > the problem of live migration where *all* those things suffer a step > change and leave the guest with an inaccurate clock but not knowing it. I am already convinced that this would work significantly better than the {X,Y} pair (but would be a bit more effort to implement): 1. when accessed by user space, obviously 2. when backing the PTP clock, it saves CPU time and makes non-paired reads more precise. I would just prefer to try upstreaming the {X,Y} pairing first. I think the {X,Y,Z...} pairing could be discussed and developed in parallel. Thanks for the comments, Peter
On 13 March 2024 17:50:48 GMT, Peter Hilber <peter.hilber@opensynergy.com> wrote: >On 13.03.24 13:45, David Woodhouse wrote: >> Surely the whole point of this effort is to provide guests with precise >> and *unambiguous* knowledge of what the time is? > >I would say, a fundamental point of this effort is to enable such >implementations, and to detect if a device is promising to support this. > >Where we might differ is as to whether the Virtio clock *for every >implementation* has to be *continuously* accurate w.r.t. a time standard, >or whether *for some implementations* it could be enough that all guests in >the local system have the same, precise local notion of time, which might >be off from the actual time standard. That makes sense, but remember I don't just want {X, Y, Z} but *also* the error bounds of ±deltaY and ±deltaZ too. So your example just boils down to "I'm calling it UTC, and it's really precise, but we make no promises about its *accuracy*". And that's fine. >Also, cf. ptp_kvm, which AFAIU doesn't address leap seconds at all... KVM is not an exemplar of good time practices. Not in *any* respect :) >With your described use case the UTC_SMEARED clock should of course not be >used. The UTC_SMEARED clock would get a distinct name through udev, like >/dev/ptp_virtio_utc_smeared, so the incompatibility could at least be >detected. As long as it's clear to all concerned that this is fundamentally not usable as an accurate time source, and is only for the local-sync case you described, sure. >> Using UTC is bad enough, because for a UTC timestamp in the middle of a >> leap second the guest can't know know *which* occurrence of that leap >> second it is, so it might be wrong by a second. To resolve that >> ambiguity needs a leap indicator and/or tai_offset field. > >I agree that virtio-rtc should communicate this. The question is, what >exactly, and for which clock read request? Are we now conflating software architecture (and Linux in particular) with "hardware" design? To a certain extent, as long as the virtio-rtc device is designed to expose time precisely and unambiguously, it's less important if the Linux kernel *today* can use that. Although of course we should strive for that. Let's be...well, *unambiguous*, I suppose... that we've changed topics to discuss that though. >As for PTP clocks: > >- It doesn't fit into the ioctl PTP_SYS_OFFSET_PRECISE2. > >- The clock_adjtime(2) tai_offset and return value could be set (if > upstream will accept this). Would this help? As discussed, user space > would need to interpret this (and currently no dynamic POSIX clock sets > this). Hm, maybe? >>> I think I can add a SHOULD requirement which vaguely refers to vCPU 0, or >>> boot vCPU. But the Virtio device is not necessarily hosted by a hypervisor, >>> so the device might not even know which vCPUs there are. E.g. there is even >>> interest to make virtio-rtc work as part of the virtio-net device (which >>> might be implemented in hardware). >> >> Sure, but those implementations aren't going to offer the TSC pairing >> at all, are they? >> > >They could offer an Intel ART pairing (some physical PTP NICs are already >doing this, look for the convert_art_to_tsc() users). Right, but isn't that software's problem? The time pairing is defined against the ART in that case.
> As long as it doesn't behave differently from the other RTC, I'm fine > with this. This is important because I don't want to carry any special > infrastructure for this driver or to have to special case this driver > later on because it is incompatible with some evolution of the > subsystem. Maybe deliberately throw away all the sub-second accuracy when accessing the time via the RTC API? That helps to make it look like an RTC. And when doing the rounding, add a constant offset of 10ms to emulate the virtual i2c bus it is hanging off, like most RTCs? Andrew
On 13.03.24 21:12, Andrew Lunn wrote: >> As long as it doesn't behave differently from the other RTC, I'm fine >> with this. This is important because I don't want to carry any special >> infrastructure for this driver or to have to special case this driver >> later on because it is incompatible with some evolution of the >> subsystem. > > Maybe deliberately throw away all the sub-second accuracy when > accessing the time via the RTC API? That helps to make it look like an > RTC. And when doing the rounding, add a constant offset of 10ms to > emulate the virtual i2c bus it is hanging off, like most RTCs? > > Andrew The truncating to whole seconds is already done. As to the offset, I do not understand how this would help. I can read out my CMOS RTC in ~50 us. Thanks for the comment, Peter
Now CC'ing the previous commenters to the virtio-rtc spec draft, since this discussion is mostly about the spec, and the Virtio mailing lists still seem to be in a migration hiatus... On 13.03.24 19:18, David Woodhouse wrote: > On 13 March 2024 17:50:48 GMT, Peter Hilber <peter.hilber@opensynergy.com> wrote: >> On 13.03.24 13:45, David Woodhouse wrote: >>> Surely the whole point of this effort is to provide guests with precise >>> and *unambiguous* knowledge of what the time is? >> >> I would say, a fundamental point of this effort is to enable such >> implementations, and to detect if a device is promising to support this. >> >> Where we might differ is as to whether the Virtio clock *for every >> implementation* has to be *continuously* accurate w.r.t. a time standard, >> or whether *for some implementations* it could be enough that all guests in >> the local system have the same, precise local notion of time, which might >> be off from the actual time standard. > > That makes sense, but remember I don't just want {X, Y, Z} but *also* the error bounds of ±deltaY and ±deltaZ too. > > So your example just boils down to "I'm calling it UTC, and it's really precise, but we make no promises about its *accuracy*". And that's fine. > >> Also, cf. ptp_kvm, which AFAIU doesn't address leap seconds at all... > > KVM is not an exemplar of good time practices. > Not in *any* respect :) > >> With your described use case the UTC_SMEARED clock should of course not be >> used. The UTC_SMEARED clock would get a distinct name through udev, like >> /dev/ptp_virtio_utc_smeared, so the incompatibility could at least be >> detected. > > As long as it's clear to all concerned that this is fundamentally not usable as an accurate time source, and is only for the local-sync case you described, sure. > >>> Using UTC is bad enough, because for a UTC timestamp in the middle of a >>> leap second the guest can't know know *which* occurrence of that leap >>> second it is, so it might be wrong by a second. To resolve that >>> ambiguity needs a leap indicator and/or tai_offset field. >> >> I agree that virtio-rtc should communicate this. The question is, what >> exactly, and for which clock read request? > > Are we now conflating software architecture (and Linux in particular) with "hardware" design? > > To a certain extent, as long as the virtio-rtc device is designed to expose time precisely and unambiguously, it's less important if the Linux kernel *today* can use that. Although of course we should strive for that. Let's be...well, *unambiguous*, I suppose... that we've changed topics to discuss that though. > As Virtio is extensible (unlike hardware), my approach is to mostly specify only what also has a PoC user and a use case. >> As for PTP clocks: >> >> - It doesn't fit into the ioctl PTP_SYS_OFFSET_PRECISE2. >> >> - The clock_adjtime(2) tai_offset and return value could be set (if >> upstream will accept this). Would this help? As discussed, user space >> would need to interpret this (and currently no dynamic POSIX clock sets >> this). > > Hm, maybe? > > >>>> I think I can add a SHOULD requirement which vaguely refers to vCPU 0, or >>>> boot vCPU. But the Virtio device is not necessarily hosted by a hypervisor, >>>> so the device might not even know which vCPUs there are. E.g. there is even >>>> interest to make virtio-rtc work as part of the virtio-net device (which >>>> might be implemented in hardware). >>> >>> Sure, but those implementations aren't going to offer the TSC pairing >>> at all, are they? >>> >> >> They could offer an Intel ART pairing (some physical PTP NICs are already >> doing this, look for the convert_art_to_tsc() users). > > Right, but isn't that software's problem? The time pairing is defined against the ART in that case. My point was that such a device would then not necessarily have an idea what vCPU 0 is. But let's just say that this will be phrased as a SHOULD best-effort requirement anyway. Thanks for the comments, Peter
On 14 March 2024 11:13:37 CET, Peter Hilber <peter.hilber@opensynergy.com> wrote: >> To a certain extent, as long as the virtio-rtc device is designed to expose time precisely and unambiguously, it's less important if the Linux kernel *today* can use that. Although of course we should strive for that. Let's be...well, *unambiguous*, I suppose... that we've changed topics to discuss that though. >> > >As Virtio is extensible (unlike hardware), my approach is to mostly specify >only what also has a PoC user and a use case. If we get memory-mapped (X, Y, Z, ±x, ±y) I'll have a user and a use case on day one. Otherwise, as I said in my first response, I can go do that as a separate device and decide that virtio_rtc doesn't meet our needs (especially for maintaining accuracy over LM). My main concern for virto_rtc is that we avoid *ambiguity*. Yes, I get that it's extensible but we don't want a v1.0 of the spec, implemented by various hypervisors, which still leaves guests not knowing what the actual time is. That would not be good. And even UTC without a leap second indicator has that problem.
While the virtio-comment list is not available, now also CC'ing Parav, which may be interested in this virtio-rtc spec related discussion thread. On 14.03.24 15:19, David Woodhouse wrote: > On 14 March 2024 11:13:37 CET, Peter Hilber <peter.hilber@opensynergy.com> wrote: >>> To a certain extent, as long as the virtio-rtc device is designed to expose time precisely and unambiguously, it's less important if the Linux kernel *today* can use that. Although of course we should strive for that. Let's be...well, *unambiguous*, I suppose... that we've changed topics to discuss that though. >>> >> >> As Virtio is extensible (unlike hardware), my approach is to mostly specify >> only what also has a PoC user and a use case. > > If we get memory-mapped (X, Y, Z, ±x, ±y) I'll have a user and a use case on day one. Otherwise, as I said in my first response, I can go do that as a separate device and decide that virtio_rtc doesn't meet our needs (especially for maintaining accuracy over LM). We plan to add - leap second indication, - UTC-to-TAI offset, - clock smearing indication (including the noon-to-noon linear smearing variant which seems to be somewhat popular), and - clock accuracy indication to the initial spec and to the PoC implementation. However, due to resource restrictions, we cannot ourselves add the memory-mapped clock to the initial spec. Everyone is very welcome to contribute the memory-mapped clock to the spec, and I think it might then still make it to the initial version. > > My main concern for virto_rtc is that we avoid *ambiguity*. Yes, I get that it's extensible but we don't want a v1.0 of the spec, implemented by various hypervisors, which still leaves guests not knowing what the actual time is. That would not be good. And even UTC without a leap second indicator has that problem. Agreed. That should be addressed by the above changes. Best regards, Peter
On Tue, 2024-03-19 at 14:47 +0100, Peter Hilber wrote: > While the virtio-comment list is not available, now also CC'ing Parav, > which may be interested in this virtio-rtc spec related discussion thread. > > On 14.03.24 15:19, David Woodhouse wrote: > > On 14 March 2024 11:13:37 CET, Peter Hilber <peter.hilber@opensynergy.com> wrote: > > > > To a certain extent, as long as the virtio-rtc device is designed to expose time precisely and unambiguously, it's less important if the Linux kernel *today* can use that. Although of course we should strive for that. Let's be...well, *unambiguous*, I suppose... that we've changed topics to discuss that though. > > > > > > > > > > As Virtio is extensible (unlike hardware), my approach is to mostly specify > > > only what also has a PoC user and a use case. > > > > If we get memory-mapped (X, Y, Z, ±x, ±y) I'll have a user and a use case on day one. Otherwise, as I said in my first response, I can go do that as a separate device and decide that virtio_rtc doesn't meet our needs (especially for maintaining accuracy over LM). > > We plan to add > > - leap second indication, > > - UTC-to-TAI offset, > > - clock smearing indication (including the noon-to-noon linear smearing > variant which seems to be somewhat popular), and > > - clock accuracy indication > > to the initial spec and to the PoC implementation. Sounds good, thanks! I look forward to seeing the new revision. I'm hoping Julien can give feedback on the clock accuracy parts. > However, due to resource restrictions, we cannot ourselves add the > memory-mapped clock to the initial spec. > > Everyone is very welcome to contribute the memory-mapped clock to the spec, > and I think it might then still make it to the initial version. Makes sense. That is my primary target, so I'm *hoping* we can converge and get that into your initial spec, otherwise for expediency I'm going to have to define an ACPI or DT or PCI device of our own and expose the memory region through that instead. (Even if I have to do that in the short term to stop the bleeding with customers' clocks and live migration, I'd still aspire to migrate to a virtio_rtc version of it in future)