Patchwork [NET-NEXT,02/10] time sync: generic infrastructure to map between time stamps generated by a time counter and system time

login
register
mail settings
Submitter Patrick Ohly
Date Feb. 9, 2009, 5:02 p.m.
Message ID <1234198922.20325.170.camel@ecld0pohly>
Download mbox | patch
Permalink /patch/22750/
State RFC
Delegated to: David Miller
Headers show

Comments

Patrick Ohly - Feb. 9, 2009, 5:02 p.m.
On Thu, 2009-02-05 at 10:21 +0000, Patrick Ohly wrote:
> On Wed, 2009-02-04 at 21:44 +0200, john stultz wrote:
> > On Wed, 2009-02-04 at 14:01 +0100, Patrick Ohly wrote:
> > I sort of object to the name clocksync, as you're not really doing
> > anything to sync clocks in the code. One, "clock" is an way overloaded
> > term in the kernel. Two, you're really seem to be just providing deltas
> > and skew rates between notions of time. I want to avoid someone thinking
> > "Oh, NTP must use this code". 
> > 
> > So maybe something like timecompare.c? 
> 
> Fine with me.

As there were no other comments I renamed the file, functions and struct
accordingly. As I said in my mail, I prefer "struct timecompare" over
"struct time_comparator". I also used "timecompare_transform()".

Is this revision of the patch okay? How should the two patches get
included in the main kernel - via netdev-next-2.6?

Bye, Patrick

-------------------------------------------------

Subject: [PATCH NET-NEXT 02/10] timecompare: generic infrastructure to map between two time bases

Mapping from a struct timecounter to a time returned by functions like
ktime_get_real() is implemented. This is sufficient to use this code
in a network device driver which wants to support hardware time
stamping and transformation of hardware time stamps to system time.

The interface could have been made more versatile by not depending on
a time counter, but this wasn't done to avoid writing glue code
elsewhere.

The method implemented here is the one used and analyzed under the name
"assisted PTP" in the LCI PTP paper:
http://www.linuxclustersinstitute.org/conferences/archive/2008/PDF/Ohly_92221.pdf
---
 include/linux/timecompare.h |  125 +++++++++++++++++++++++++++
 kernel/time/Makefile        |    2 +-
 kernel/time/timecompare.c   |  194 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 320 insertions(+), 1 deletions(-)
 create mode 100644 include/linux/timecompare.h
 create mode 100644 kernel/time/timecompare.c
john stultz - Feb. 9, 2009, 7:27 p.m.
On Mon, 2009-02-09 at 18:02 +0100, Patrick Ohly wrote:
> On Thu, 2009-02-05 at 10:21 +0000, Patrick Ohly wrote:
> > On Wed, 2009-02-04 at 21:44 +0200, john stultz wrote:
> > > On Wed, 2009-02-04 at 14:01 +0100, Patrick Ohly wrote:
> > > I sort of object to the name clocksync, as you're not really doing
> > > anything to sync clocks in the code. One, "clock" is an way overloaded
> > > term in the kernel. Two, you're really seem to be just providing deltas
> > > and skew rates between notions of time. I want to avoid someone thinking
> > > "Oh, NTP must use this code". 
> > > 
> > > So maybe something like timecompare.c? 
> > 
> > Fine with me.
> 
> As there were no other comments I renamed the file, functions and struct
> accordingly. As I said in my mail, I prefer "struct timecompare" over
> "struct time_comparator". I also used "timecompare_transform()".
> 
> Is this revision of the patch okay? How should the two patches get
> included in the main kernel - via netdev-next-2.6?
> 
> Bye, Patrick

Small comment below, but otherwise it looks ok to me. I usually push
patches through Andrew, so I'd probably go that way. But I'd leave it to
Dave if he's comfortable pushing them to Linus.

Acked-by: John Stultz <johnstul@us.ibm.com>




> -------------------------------------------------
> 
> Subject: [PATCH NET-NEXT 02/10] timecompare: generic infrastructure to map between two time bases
> 
> Mapping from a struct timecounter to a time returned by functions like
> ktime_get_real() is implemented. This is sufficient to use this code
> in a network device driver which wants to support hardware time
> stamping and transformation of hardware time stamps to system time.
> 
> The interface could have been made more versatile by not depending on
> a time counter, but this wasn't done to avoid writing glue code
> elsewhere.
> 
> The method implemented here is the one used and analyzed under the name
> "assisted PTP" in the LCI PTP paper:
> http://www.linuxclustersinstitute.org/conferences/archive/2008/PDF/Ohly_92221.pdf
> ---
>  include/linux/timecompare.h |  125 +++++++++++++++++++++++++++
>  kernel/time/Makefile        |    2 +-
>  kernel/time/timecompare.c   |  194 +++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 320 insertions(+), 1 deletions(-)
>  create mode 100644 include/linux/timecompare.h
>  create mode 100644 kernel/time/timecompare.c
> 
> diff --git a/include/linux/timecompare.h b/include/linux/timecompare.h
> new file mode 100644
> index 0000000..f88c454
> --- /dev/null
> +++ b/include/linux/timecompare.h
> @@ -0,0 +1,125 @@
> +/*
> + * Utility code which helps transforming between two different time
> + * bases, called "source" and "target" time in this code.
> + *
> + * Source time has to be provided via the timecounter API while target
> + * time is accessed via a function callback whose prototype
> + * intentionally matches ktime_get() and ktime_get_real(). These
> + * interfaces where chosen like this so that the code serves its
> + * initial purpose without additional glue code.
> + *
> + * This purpose is synchronizing a hardware clock in a NIC with system
> + * time, in order to implement the Precision Time Protocol (PTP,
> + * IEEE1588) with more accurate hardware assisted time stamping.  In
> + * that context only synchronization against system time (=
> + * ktime_get_real()) is currently needed. But this utility code might
> + * become useful in other situations, which is why it was written as
> + * general purpose utility code.
> + *
> + * The source timecounter is assumed to return monotonically
> + * increasing time (but this code does its best to compensate if that
> + * is not the case) whereas target time may jump.
> + *
> + * The target time corresponding to a source time is determined by
> + * reading target time, reading source time, reading target time
> + * again, then assuming that average target time corresponds to source
> + * time. In other words, the assumption is that reading the source
> + * time is slow and involves equal time for sending the request and
> + * receiving the reply, whereas reading target time is assumed to be
> + * fast.
> + *
> + * Copyright(c) 2009 Intel Corporation.
> + * Author: Patrick Ohly <patrick.ohly@intel.com>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE. * See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; if not, write to the Free Software Foundation, Inc.,
> + * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
> + */
> +#ifndef _LINUX_TIMECOMPARE_H
> +#define _LINUX_TIMECOMPARE_H
> +
> +#include <linux/clocksource.h>
> +#include <linux/ktime.h>
> +
> +/**
> + * struct timecompare - stores state and configuration for the two clocks
> + *
> + * Initialize to zero, then set source/target/num_samples.
> + *
> + * Transformation between source time and target time is done with:
> + * target_time = source_time + offset +
> + *               (source_time - last_update) * skew /
> + *               TIMECOMPARE_SKEW_RESOLUTION
> + *
> + * @source:          used to get source time stamps via timecounter_read()
> + * @target:          function returning target time (for example, ktime_get
> + *                   for monotonic time, or ktime_get_real for wall clock)
> + * @num_samples:     number of times that source time and target time are to
> + *                   be compared when determining their offset
> + * @offset:          (target time - source time) at the time of the last update
> + * @skew:            average (target time - source time) / delta source time *
> + *                   TIMECOMPARE_SKEW_RESOLUTION
> + * @last_update:     last source time stamp when time offset was measured
> + */
> +struct timecompare {
> +	struct timecounter *source;
> +	ktime_t (*target)(void);
> +	int num_samples;
> +
> +	s64 offset;
> +	s64 skew;
> +	u64 last_update;
> +};
> +
> +/**
> + * timecompare_transform - transform source time stamp into target time base
> + * @sync:            context for time sync
> + * @source_tstamp:   the result of timecounter_read() or
> + *                   timecounter_cyc2time()
> + */
> +extern ktime_t timecompare_transform(struct timecompare *sync,
> +				     u64 source_tstamp);
> +
> +/**
> + * timecompare_offset - measure current (target time - source time) offset
> + * @sync:            context for time sync
> + * @offset:          average offset during sample period returned here
> + * @source_tstamp:   average source time during sample period returned here
> + *
> + * Returns number of samples used. Might be zero (= no result) in the
> + * unlikely case that target time was monotonically decreasing for all
> + * samples (= broken).
> + */
> +extern int timecompare_offset(struct timecompare *sync,
> +			      s64 *offset,
> +			      u64 *source_tstamp);
> +
> +extern void __timecompare_update(struct timecompare *sync,
> +				 u64 source_tstamp);
> +
> +/**
> + * timecompare_update - update offset and skew by measuring current offset
> + * @sync:            context for time sync
> + * @source_tstamp:   the result of timecounter_read() or
> + *                   timecounter_cyc2time(), pass zero to force update
> + *
> + * Updates are only done at most once per second.
> + */
> +static inline void timecompare_update(struct timecompare *sync,
> +				      u64 source_tstamp)
> +{
> +	if (!source_tstamp ||
> +	    (s64)(source_tstamp - sync->last_update) >= NSEC_PER_SEC)
> +		__timecompare_update(sync, source_tstamp);
> +}
> +
> +#endif /* _LINUX_TIMECOMPARE_H */
> diff --git a/kernel/time/Makefile b/kernel/time/Makefile
> index 905b0b5..0b0a636 100644
> --- a/kernel/time/Makefile
> +++ b/kernel/time/Makefile
> @@ -1,4 +1,4 @@
> -obj-y += timekeeping.o ntp.o clocksource.o jiffies.o timer_list.o
> +obj-y += timekeeping.o ntp.o clocksource.o jiffies.o timer_list.o timecompare.o
> 
>  obj-$(CONFIG_GENERIC_CLOCKEVENTS_BUILD)		+= clockevents.o
>  obj-$(CONFIG_GENERIC_CLOCKEVENTS)		+= tick-common.o
> diff --git a/kernel/time/timecompare.c b/kernel/time/timecompare.c
> new file mode 100644
> index 0000000..1e94abc
> --- /dev/null
> +++ b/kernel/time/timecompare.c
> @@ -0,0 +1,194 @@
> +/*
> + * Copyright (C) 2009 Intel Corporation.
> + * Author: Patrick Ohly <patrick.ohly@intel.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
> + */
> +
> +#include <linux/timecompare.h>
> +#include <linux/module.h>
> +#include <linux/math64.h>
> +
> +/*
> + * fixed point arithmetic scale factor for skew
> + *
> + * Usually one would measure skew in ppb (parts per billion, 1e9), but
> + * using a factor of 2 simplifies the math.
> + */
> +#define TIMECOMPARE_SKEW_RESOLUTION (((s64)1)<<30)
> +
> +ktime_t timecompare_transform(struct timecompare *sync,
> +			      u64 source_tstamp)
> +{
> +	u64 nsec;
> +
> +	nsec = source_tstamp + sync->offset;
> +	nsec += (s64)(source_tstamp - sync->last_update) * sync->skew /
> +		TIMECOMPARE_SKEW_RESOLUTION;
> +
> +	return ns_to_ktime(nsec);
> +}
> +EXPORT_SYMBOL(timecompare_transform);
> +
> +int timecompare_offset(struct timecompare *sync,
> +		       s64 *offset,
> +		       u64 *source_tstamp)
> +{
> +	u64 start_source = 0, end_source = 0;
> +	struct {
> +		s64 offset;
> +		s64 duration_target;
> +	} buffer[10], sample, *samples;
> +	int counter = 0, i;
> +	int used;
> +	int index;
> +	int num_samples = sync->num_samples;
> +
> +	if (num_samples > sizeof(buffer)/sizeof(buffer[0])) {
> +		samples = kmalloc(sizeof(*samples) * num_samples, GFP_ATOMIC);
> +		if (!samples) {
> +			samples = buffer;
> +			num_samples = sizeof(buffer)/sizeof(buffer[0]);
> +		}
> +	} else {
> +		samples = buffer;
> +	}
> +
> +	/* run until we have enough valid samples, but do not try forever */
> +	i = 0;
> +	counter = 0;
> +	while (1) {
> +		u64 ts;
> +		ktime_t start, end;
> +
> +		start = sync->target();
> +		ts = timecounter_read(sync->source);
> +		end = sync->target();
> +
> +		if (!i)
> +			start_source = ts;
> +
> +		/* ignore negative durations */
> +		sample.duration_target = ktime_to_ns(ktime_sub(end, start));
> +		if (sample.duration_target >= 0) {

You may also want to checking the bounds on the duration_target. If
preemption hits and the values are too out of whack, the symetric delay
assumption below might be quite invalid.

I guess the outliers removal probably covers this as well, but seems
some sanity checking might be good.

> +			/*
> +			 * assume symetric delay to and from source:
> +			 * average target time corresponds to measured
> +			 * source time
> +			 */
> +			sample.offset =
> +				ktime_to_ns(ktime_add(end, start)) / 2 -
> +				ts;
> +
> +			/* simple insertion sort based on duration */
> +			index = counter - 1;
> +			while (index >= 0) {
> +				if (samples[index].duration_target <
> +				    sample.duration_target)
> +					break;
> +				samples[index + 1] = samples[index];
> +				index--;
> +			}
> +			samples[index + 1] = sample;
> +			counter++;
> +		}
> +
> +		i++;
> +		if (counter >= num_samples || i >= 100000) {
> +			end_source = ts;
> +			break;
> +		}
> +	}
> +
> +	*source_tstamp = (end_source + start_source) / 2;
> +
> +	/* remove outliers by only using 75% of the samples */
> +	used = counter * 3 / 4;
> +	if (!used)
> +		used = counter;
> +	if (used) {
> +		/* calculate average */
> +		s64 off = 0;
> +		for (index = 0; index < used; index++)
> +			off += samples[index].offset;
> +		*offset = div_s64(off, used);
> +	}
> +
> +	if (samples && samples != buffer)
> +		kfree(samples);
> +
> +	return used;
> +}
> +EXPORT_SYMBOL(timecompare_offset);
> +
> +void __timecompare_update(struct timecompare *sync,
> +			  u64 source_tstamp)
> +{
> +	s64 offset;
> +	u64 average_time;
> +
> +	if (!timecompare_offset(sync, &offset, &average_time))
> +		return;
> +
> +	printk(KERN_DEBUG
> +		"average offset: %lld\n", offset);
> +
> +	if (!sync->last_update) {
> +		sync->last_update = average_time;
> +		sync->offset = offset;
> +		sync->skew = 0;
> +	} else {
> +		s64 delta_nsec = average_time - sync->last_update;
> +
> +		/* avoid division by negative or small deltas */
> +		if (delta_nsec >= 10000) {
> +			s64 delta_offset_nsec = offset - sync->offset;
> +			s64 skew; /* delta_offset_nsec *
> +				     TIMECOMPARE_SKEW_RESOLUTION /
> +				     delta_nsec */
> +			u64 divisor;
> +
> +			/* div_s64() is limited to 32 bit divisor */
> +			skew = delta_offset_nsec * TIMECOMPARE_SKEW_RESOLUTION;
> +			divisor = delta_nsec;
> +			while (unlikely(divisor >= ((s64)1) << 32)) {
> +				/* divide both by 2; beware, right shift
> +				   of negative value has undefined
> +				   behavior and can only be used for
> +				   the positive divisor */
> +				skew = div_s64(skew, 2);
> +				divisor >>= 1;
> +			}
> +			skew = div_s64(skew, divisor);
> +
> +			/*
> +			 * Calculate new overall skew as 4/16 the
> +			 * old value and 12/16 the new one. This is
> +			 * a rather arbitrary tradeoff between
> +			 * only using the latest measurement (0/16 and
> +			 * 16/16) and even more weight on past measurements.
> +			 */
> +#define TIMECOMPARE_NEW_SKEW_PER_16 12
> +			sync->skew =
> +				div_s64((16 - TIMECOMPARE_NEW_SKEW_PER_16) *
> +					sync->skew +
> +					TIMECOMPARE_NEW_SKEW_PER_16 * skew,
> +					16);
> +			sync->last_update = average_time;
> +			sync->offset = offset;
> +		}
> +	}
> +}
> +EXPORT_SYMBOL(__timecompare_update);
> -- 
> 1.6.1.2
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick Ohly - Feb. 9, 2009, 9:46 p.m.
On Mon, 2009-02-09 at 21:27 +0200, John Stultz wrote:
> On Mon, 2009-02-09 at 18:02 +0100, Patrick Ohly wrote:
> > Is this revision of the patch okay? How should the two patches get
> > included in the main kernel - via netdev-next-2.6?
>
> Small comment below, but otherwise it looks ok to me. I usually push
> patches through Andrew, so I'd probably go that way. But I'd leave it to
> Dave if he's comfortable pushing them to Linus.

As you don't mind, I suggest to push through Dave as part of the
complete patch series. That way we don't need to worry about
coordinating two subtrees.

> Acked-by: John Stultz <johnstul@us.ibm.com>

Thanks! Will add that.

> > +             /* ignore negative durations */
> > +             sample.duration_target = ktime_to_ns(ktime_sub(end, start));
> > +             if (sample.duration_target >= 0) {
> 
> You may also want to checking the bounds on the duration_target. If
> preemption hits and the values are too out of whack, the symetric delay
> assumption below might be quite invalid.
> 
> I guess the outliers removal probably covers this as well, but seems
> some sanity checking might be good.

That would require more information, like "duration_target is usually in
the xxx-yyy range". This could be determined based on past measurements
or the median of the current sample set, but is this really better than
the current "remove longest 25%"?

In practice I haven't seen such a problem, therefore I'd prefer to keep
the code simple and not change it. It was tested under load conditions
(both CPU and network).
john stultz - Feb. 9, 2009, 9:54 p.m.
On Mon, 2009-02-09 at 22:46 +0100, Patrick Ohly wrote:
> On Mon, 2009-02-09 at 21:27 +0200, John Stultz wrote:
> > On Mon, 2009-02-09 at 18:02 +0100, Patrick Ohly wrote:
> > > Is this revision of the patch okay? How should the two patches get
> > > included in the main kernel - via netdev-next-2.6?
> >
> > Small comment below, but otherwise it looks ok to me. I usually push
> > patches through Andrew, so I'd probably go that way. But I'd leave it to
> > Dave if he's comfortable pushing them to Linus.
> 
> As you don't mind, I suggest to push through Dave as part of the
> complete patch series. That way we don't need to worry about
> coordinating two subtrees.
> 
> > Acked-by: John Stultz <johnstul@us.ibm.com>
> 
> Thanks! Will add that.
> 
> > > +             /* ignore negative durations */
> > > +             sample.duration_target = ktime_to_ns(ktime_sub(end, start));
> > > +             if (sample.duration_target >= 0) {
> > 
> > You may also want to checking the bounds on the duration_target. If
> > preemption hits and the values are too out of whack, the symetric delay
> > assumption below might be quite invalid.
> > 
> > I guess the outliers removal probably covers this as well, but seems
> > some sanity checking might be good.
> 
> That would require more information, like "duration_target is usually in
> the xxx-yyy range". This could be determined based on past measurements
> or the median of the current sample set, but is this really better than
> the current "remove longest 25%"?
> 
> In practice I haven't seen such a problem, therefore I'd prefer to keep
> the code simple and not change it. It was tested under load conditions
> (both CPU and network).

Ok. I was just thinking more along the lines of "throw out duration
times longer then 25us" or 100us or something relatively sane like that.
Again, the largest 25% will probably cover it, but I'm just looking at
this with the realtime preemption patches in mind.

thanks
-john


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Feb. 9, 2009, 10:57 p.m.
From: John Stultz <johnstul@us.ibm.com>
Date: Mon, 09 Feb 2009 11:27:47 -0800

> On Mon, 2009-02-09 at 18:02 +0100, Patrick Ohly wrote:
> > On Thu, 2009-02-05 at 10:21 +0000, Patrick Ohly wrote:
> > > On Wed, 2009-02-04 at 21:44 +0200, john stultz wrote:
> > > > On Wed, 2009-02-04 at 14:01 +0100, Patrick Ohly wrote:
> > > > I sort of object to the name clocksync, as you're not really doing
> > > > anything to sync clocks in the code. One, "clock" is an way overloaded
> > > > term in the kernel. Two, you're really seem to be just providing deltas
> > > > and skew rates between notions of time. I want to avoid someone thinking
> > > > "Oh, NTP must use this code". 
> > > > 
> > > > So maybe something like timecompare.c? 
> > > 
> > > Fine with me.
> > 
> > As there were no other comments I renamed the file, functions and struct
> > accordingly. As I said in my mail, I prefer "struct timecompare" over
> > "struct time_comparator". I also used "timecompare_transform()".
> > 
> > Is this revision of the patch okay? How should the two patches get
> > included in the main kernel - via netdev-next-2.6?
> > 
> > Bye, Patrick
> 
> Small comment below, but otherwise it looks ok to me. I usually push
> patches through Andrew, so I'd probably go that way. But I'd leave it to
> Dave if he's comfortable pushing them to Linus.
> 
> Acked-by: John Stultz <johnstul@us.ibm.com>

Patrick, please submit these two changes fresh now that all of
the issues are resolved and you have the ACKs :-)

I'll queue them up in net-next-2.6
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/include/linux/timecompare.h b/include/linux/timecompare.h
new file mode 100644
index 0000000..f88c454
--- /dev/null
+++ b/include/linux/timecompare.h
@@ -0,0 +1,125 @@ 
+/*
+ * Utility code which helps transforming between two different time
+ * bases, called "source" and "target" time in this code.
+ *
+ * Source time has to be provided via the timecounter API while target
+ * time is accessed via a function callback whose prototype
+ * intentionally matches ktime_get() and ktime_get_real(). These
+ * interfaces where chosen like this so that the code serves its
+ * initial purpose without additional glue code.
+ *
+ * This purpose is synchronizing a hardware clock in a NIC with system
+ * time, in order to implement the Precision Time Protocol (PTP,
+ * IEEE1588) with more accurate hardware assisted time stamping.  In
+ * that context only synchronization against system time (=
+ * ktime_get_real()) is currently needed. But this utility code might
+ * become useful in other situations, which is why it was written as
+ * general purpose utility code.
+ *
+ * The source timecounter is assumed to return monotonically
+ * increasing time (but this code does its best to compensate if that
+ * is not the case) whereas target time may jump.
+ *
+ * The target time corresponding to a source time is determined by
+ * reading target time, reading source time, reading target time
+ * again, then assuming that average target time corresponds to source
+ * time. In other words, the assumption is that reading the source
+ * time is slow and involves equal time for sending the request and
+ * receiving the reply, whereas reading target time is assumed to be
+ * fast.
+ *
+ * Copyright(c) 2009 Intel Corporation.
+ * Author: Patrick Ohly <patrick.ohly@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. * See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+#ifndef _LINUX_TIMECOMPARE_H
+#define _LINUX_TIMECOMPARE_H
+
+#include <linux/clocksource.h>
+#include <linux/ktime.h>
+
+/**
+ * struct timecompare - stores state and configuration for the two clocks
+ *
+ * Initialize to zero, then set source/target/num_samples.
+ *
+ * Transformation between source time and target time is done with:
+ * target_time = source_time + offset +
+ *               (source_time - last_update) * skew /
+ *               TIMECOMPARE_SKEW_RESOLUTION
+ *
+ * @source:          used to get source time stamps via timecounter_read()
+ * @target:          function returning target time (for example, ktime_get
+ *                   for monotonic time, or ktime_get_real for wall clock)
+ * @num_samples:     number of times that source time and target time are to
+ *                   be compared when determining their offset
+ * @offset:          (target time - source time) at the time of the last update
+ * @skew:            average (target time - source time) / delta source time *
+ *                   TIMECOMPARE_SKEW_RESOLUTION
+ * @last_update:     last source time stamp when time offset was measured
+ */
+struct timecompare {
+	struct timecounter *source;
+	ktime_t (*target)(void);
+	int num_samples;
+
+	s64 offset;
+	s64 skew;
+	u64 last_update;
+};
+
+/**
+ * timecompare_transform - transform source time stamp into target time base
+ * @sync:            context for time sync
+ * @source_tstamp:   the result of timecounter_read() or
+ *                   timecounter_cyc2time()
+ */
+extern ktime_t timecompare_transform(struct timecompare *sync,
+				     u64 source_tstamp);
+
+/**
+ * timecompare_offset - measure current (target time - source time) offset
+ * @sync:            context for time sync
+ * @offset:          average offset during sample period returned here
+ * @source_tstamp:   average source time during sample period returned here
+ *
+ * Returns number of samples used. Might be zero (= no result) in the
+ * unlikely case that target time was monotonically decreasing for all
+ * samples (= broken).
+ */
+extern int timecompare_offset(struct timecompare *sync,
+			      s64 *offset,
+			      u64 *source_tstamp);
+
+extern void __timecompare_update(struct timecompare *sync,
+				 u64 source_tstamp);
+
+/**
+ * timecompare_update - update offset and skew by measuring current offset
+ * @sync:            context for time sync
+ * @source_tstamp:   the result of timecounter_read() or
+ *                   timecounter_cyc2time(), pass zero to force update
+ *
+ * Updates are only done at most once per second.
+ */
+static inline void timecompare_update(struct timecompare *sync,
+				      u64 source_tstamp)
+{
+	if (!source_tstamp ||
+	    (s64)(source_tstamp - sync->last_update) >= NSEC_PER_SEC)
+		__timecompare_update(sync, source_tstamp);
+}
+
+#endif /* _LINUX_TIMECOMPARE_H */
diff --git a/kernel/time/Makefile b/kernel/time/Makefile
index 905b0b5..0b0a636 100644
--- a/kernel/time/Makefile
+++ b/kernel/time/Makefile
@@ -1,4 +1,4 @@ 
-obj-y += timekeeping.o ntp.o clocksource.o jiffies.o timer_list.o
+obj-y += timekeeping.o ntp.o clocksource.o jiffies.o timer_list.o timecompare.o
 
 obj-$(CONFIG_GENERIC_CLOCKEVENTS_BUILD)		+= clockevents.o
 obj-$(CONFIG_GENERIC_CLOCKEVENTS)		+= tick-common.o
diff --git a/kernel/time/timecompare.c b/kernel/time/timecompare.c
new file mode 100644
index 0000000..1e94abc
--- /dev/null
+++ b/kernel/time/timecompare.c
@@ -0,0 +1,194 @@ 
+/*
+ * Copyright (C) 2009 Intel Corporation.
+ * Author: Patrick Ohly <patrick.ohly@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include <linux/timecompare.h>
+#include <linux/module.h>
+#include <linux/math64.h>
+
+/*
+ * fixed point arithmetic scale factor for skew
+ *
+ * Usually one would measure skew in ppb (parts per billion, 1e9), but
+ * using a factor of 2 simplifies the math.
+ */
+#define TIMECOMPARE_SKEW_RESOLUTION (((s64)1)<<30)
+
+ktime_t timecompare_transform(struct timecompare *sync,
+			      u64 source_tstamp)
+{
+	u64 nsec;
+
+	nsec = source_tstamp + sync->offset;
+	nsec += (s64)(source_tstamp - sync->last_update) * sync->skew /
+		TIMECOMPARE_SKEW_RESOLUTION;
+
+	return ns_to_ktime(nsec);
+}
+EXPORT_SYMBOL(timecompare_transform);
+
+int timecompare_offset(struct timecompare *sync,
+		       s64 *offset,
+		       u64 *source_tstamp)
+{
+	u64 start_source = 0, end_source = 0;
+	struct {
+		s64 offset;
+		s64 duration_target;
+	} buffer[10], sample, *samples;
+	int counter = 0, i;
+	int used;
+	int index;
+	int num_samples = sync->num_samples;
+
+	if (num_samples > sizeof(buffer)/sizeof(buffer[0])) {
+		samples = kmalloc(sizeof(*samples) * num_samples, GFP_ATOMIC);
+		if (!samples) {
+			samples = buffer;
+			num_samples = sizeof(buffer)/sizeof(buffer[0]);
+		}
+	} else {
+		samples = buffer;
+	}
+
+	/* run until we have enough valid samples, but do not try forever */
+	i = 0;
+	counter = 0;
+	while (1) {
+		u64 ts;
+		ktime_t start, end;
+
+		start = sync->target();
+		ts = timecounter_read(sync->source);
+		end = sync->target();
+
+		if (!i)
+			start_source = ts;
+
+		/* ignore negative durations */
+		sample.duration_target = ktime_to_ns(ktime_sub(end, start));
+		if (sample.duration_target >= 0) {
+			/*
+			 * assume symetric delay to and from source:
+			 * average target time corresponds to measured
+			 * source time
+			 */
+			sample.offset =
+				ktime_to_ns(ktime_add(end, start)) / 2 -
+				ts;
+
+			/* simple insertion sort based on duration */
+			index = counter - 1;
+			while (index >= 0) {
+				if (samples[index].duration_target <
+				    sample.duration_target)
+					break;
+				samples[index + 1] = samples[index];
+				index--;
+			}
+			samples[index + 1] = sample;
+			counter++;
+		}
+
+		i++;
+		if (counter >= num_samples || i >= 100000) {
+			end_source = ts;
+			break;
+		}
+	}
+
+	*source_tstamp = (end_source + start_source) / 2;
+
+	/* remove outliers by only using 75% of the samples */
+	used = counter * 3 / 4;
+	if (!used)
+		used = counter;
+	if (used) {
+		/* calculate average */
+		s64 off = 0;
+		for (index = 0; index < used; index++)
+			off += samples[index].offset;
+		*offset = div_s64(off, used);
+	}
+
+	if (samples && samples != buffer)
+		kfree(samples);
+
+	return used;
+}
+EXPORT_SYMBOL(timecompare_offset);
+
+void __timecompare_update(struct timecompare *sync,
+			  u64 source_tstamp)
+{
+	s64 offset;
+	u64 average_time;
+
+	if (!timecompare_offset(sync, &offset, &average_time))
+		return;
+
+	printk(KERN_DEBUG
+		"average offset: %lld\n", offset);
+
+	if (!sync->last_update) {
+		sync->last_update = average_time;
+		sync->offset = offset;
+		sync->skew = 0;
+	} else {
+		s64 delta_nsec = average_time - sync->last_update;
+
+		/* avoid division by negative or small deltas */
+		if (delta_nsec >= 10000) {
+			s64 delta_offset_nsec = offset - sync->offset;
+			s64 skew; /* delta_offset_nsec *
+				     TIMECOMPARE_SKEW_RESOLUTION /
+				     delta_nsec */
+			u64 divisor;
+
+			/* div_s64() is limited to 32 bit divisor */
+			skew = delta_offset_nsec * TIMECOMPARE_SKEW_RESOLUTION;
+			divisor = delta_nsec;
+			while (unlikely(divisor >= ((s64)1) << 32)) {
+				/* divide both by 2; beware, right shift
+				   of negative value has undefined
+				   behavior and can only be used for
+				   the positive divisor */
+				skew = div_s64(skew, 2);
+				divisor >>= 1;
+			}
+			skew = div_s64(skew, divisor);
+
+			/*
+			 * Calculate new overall skew as 4/16 the
+			 * old value and 12/16 the new one. This is
+			 * a rather arbitrary tradeoff between
+			 * only using the latest measurement (0/16 and
+			 * 16/16) and even more weight on past measurements.
+			 */
+#define TIMECOMPARE_NEW_SKEW_PER_16 12
+			sync->skew =
+				div_s64((16 - TIMECOMPARE_NEW_SKEW_PER_16) *
+					sync->skew +
+					TIMECOMPARE_NEW_SKEW_PER_16 * skew,
+					16);
+			sync->last_update = average_time;
+			sync->offset = offset;
+		}
+	}
+}
+EXPORT_SYMBOL(__timecompare_update);