Patchwork [1/5] ptp: Added a brand new class driver for ptp clocks.

login
register
mail settings
Submitter Richard Cochran
Date Aug. 16, 2010, 11:17 a.m.
Message ID <363bd749a38d0b785d8431e591bf54c38db4c2d7.1281956490.git.richard.cochran@omicron.at>
Download mbox | patch
Permalink /patch/61789/
State Not Applicable
Headers show

Comments

Richard Cochran - Aug. 16, 2010, 11:17 a.m.
This patch adds an infrastructure for hardware clocks that implement
IEEE 1588, the Precision Time Protocol (PTP). A class driver offers a
registration method to particular hardware clock drivers. Each clock is
exposed to user space as a character device with ioctls that allow tuning
of the PTP clock.

Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
---
 Documentation/ptp/ptp.txt        |   95 +++++++
 Documentation/ptp/testptp.c      |  306 ++++++++++++++++++++++
 Documentation/ptp/testptp.mk     |   33 +++
 drivers/Kconfig                  |    2 +
 drivers/Makefile                 |    1 +
 drivers/ptp/Kconfig              |   27 ++
 drivers/ptp/Makefile             |    5 +
 drivers/ptp/ptp_clock.c          |  514 ++++++++++++++++++++++++++++++++++++++
 include/linux/Kbuild             |    1 +
 include/linux/ptp_clock.h        |   79 ++++++
 include/linux/ptp_clock_kernel.h |  137 ++++++++++
 11 files changed, 1200 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/ptp/ptp.txt
 create mode 100644 Documentation/ptp/testptp.c
 create mode 100644 Documentation/ptp/testptp.mk
 create mode 100644 drivers/ptp/Kconfig
 create mode 100644 drivers/ptp/Makefile
 create mode 100644 drivers/ptp/ptp_clock.c
 create mode 100644 include/linux/ptp_clock.h
 create mode 100644 include/linux/ptp_clock_kernel.h
Arnd Bergmann - Aug. 16, 2010, 2:26 p.m.
On Monday 16 August 2010, Richard Cochran wrote:
> This patch adds an infrastructure for hardware clocks that implement
> IEEE 1588, the Precision Time Protocol (PTP). A class driver offers a
> registration method to particular hardware clock drivers. Each clock is
> exposed to user space as a character device with ioctls that allow tuning
> of the PTP clock.

Have you considered integrating the subsystem into the Posix clock/timer
framework?

I can't really tell from reading the source if this is possible or
not, but my feeling is that if it can be done, that would be a much
nicer interface. We already have clock_gettime/clock_settime/
timer_settime/... system calls, and while you'd need to add another
clockid and some syscalls, my feeling is that it will be more
usable in the end.

> +static const struct file_operations ptp_fops = {
> +       .owner          = THIS_MODULE,
> +       .ioctl          = ptp_ioctl,
> +       .open           = ptp_open,
> +       .poll           = ptp_poll,
> +       .read           = ptp_read,
> +       .release        = ptp_release,
> +};

.ioctl is gone, you have to use .unlocked_ioctl and make sure that
your ptp_ioctl function can handle being called concurrently.

You should also add a .compat_ioctl function, ideally one that
points to ptp_ioctl so you don't have to list every command as
being compatible. Right now, some commands are incompatible,
which means you need more changes to the interface.

> +struct ptp_clock_timer {
> +       int alarm_index;       /* Which alarm to query or configure. */
> +       int signum;            /* Requested signal. */
> +       int flags;             /* Zero or TIMER_ABSTIME, see TIMER_SETTIME(2) */
> +       struct itimerspec tsp; /* See TIMER_SETTIME(2) */
> +};

This data structure is incompatible between 32 and 64 bit user space,
so you would need a compat_ioctl() function to convert between the
two formats. Better define the structure in a compatible way, avoiding
variable-length fields and implicit padding.

> +struct ptp_clock_request {
> +       int type;  /* One of the ptp_request_types enumeration values. */
> +       int index; /* Which channel to configure. */
> +       struct timespec ts; /* For period signals, the desired period. */
> +       int flags; /* Bit field for PTP_ENABLE_FEATURE or other flags. */
> +};

Same problem here, timespec is either 64 or 128 bits long and has different
alignment.

> +struct ptp_extts_event {
> +       int index;
> +       struct timespec ts;
> +};

here too.

> +#define PTP_CLOCK_VERSION 0x00000001
> +
> +#define PTP_CLK_MAGIC '='
> +
> +#define PTP_CLOCK_APIVERS _IOR (PTP_CLK_MAGIC, 1, __u32)

Try avoiding versioned ABIs. It's better to just add new ioctls
or syscalls when you need some extra functionality and leave the
old ones around.

> +#define PTP_CLOCK_ADJTIME _IOW (PTP_CLK_MAGIC, 3, struct timespec)
> +#define PTP_CLOCK_GETTIME _IOR (PTP_CLK_MAGIC, 4, struct timespec)
> +#define PTP_CLOCK_SETTIME _IOW (PTP_CLK_MAGIC, 5, struct timespec)

These are also incompatible.

	Arnd
Richard Cochran - Aug. 16, 2010, 7 p.m.
On Mon, Aug 16, 2010 at 04:26:23PM +0200, Arnd Bergmann wrote:
> Have you considered integrating the subsystem into the Posix clock/timer
> framework?

Yes, but see below.
 
> I can't really tell from reading the source if this is possible or
> not, but my feeling is that if it can be done, that would be a much
> nicer interface. We already have clock_gettime/clock_settime/
> timer_settime/... system calls, and while you'd need to add another
> clockid and some syscalls, my feeling is that it will be more
> usable in the end.

You are not the first person to ask about this. See this link for
longer explanation of why I did not go that way:

  http://marc.info/?l=linux-netdev&m=127669810232201&w=2

You *could* offer the PTP clock as a Linux clock source/event device,
and I agree that it would be nicer. However, the problem is, what do
you do with the PHY based clocks?  Just one 16 bit read from a PHY
clock can take 40 usec, and you need four such read operations just to
get the current time value.

Also, I really did not want to add or change any syscalls. I could not
see a practical way to extend the existing syscalls to accommodate PTP
clocks.

Richard
john stultz - Aug. 16, 2010, 7:24 p.m.
On Mon, Aug 16, 2010 at 4:17 AM, Richard Cochran
<richardcochran@gmail.com> wrote:
> This patch adds an infrastructure for hardware clocks that implement
> IEEE 1588, the Precision Time Protocol (PTP). A class driver offers a
> registration method to particular hardware clock drivers. Each clock is
> exposed to user space as a character device with ioctls that allow tuning
> of the PTP clock.
>
> Signed-off-by: Richard Cochran <richard.cochran@omicron.at>

Hey Richard!
   Its very cool to see this work on lkml! I'm excited to see more
work done on ptp.  We had a short private thread discussion earlier (I
got busy and never replied to your last message, my apologies!), but I
wanted to bring up the concerns I have here as well.

A few comments below....

> +** PTP user space API
> +
> +   The class driver creates a character device for each registered PTP
> +   clock. User space programs may control the clock using standardized
> +   ioctls. A program may query, enable, configure, and disable the
> +   ancillary clock features. User space can receive time stamped
> +   events via blocking read() and poll(). One shot and periodic
> +   signals may be configured via an ioctl API with semantics similar
> +   to the POSIX timer_settime() system call.

As I mentioned earlier, I'm not a huge fan of the char device
interface for abstracted PTP clocks.
If it was just the direct hardware access, similar to RTC, which user
apps then use as a timesource, I'd not have much of a problem. But as
I mentioned in an earlier private mail, the abstraction level concerns
me.

1) The driver-like model exposes a char dev for each clock, which
allows for poorly-written userland applications to hit portability
issues  (ie: /dev/hpet vs /dev/rtc). Granted this isn't a huge flaw,
but good APIs should be hard to get wrong.

2) As Arnd already mentioned, the chardev interface seems to duplicate
the clock_gettime/settime() and adjtimex() interfaces.

3) I'm not sure I see the benefit of being able to have multiple
frequency corrected time domains.  In other words, what benefit would
you get from adjusting a PTP clock's frequency instead of just
adjusting the system's time freq? Having the PTP time as a reference
to correct the system time seems reasonable, but I'm not sure I see
why userland would want to adjust the PTP clock's freq.

thanks
-john
john stultz - Aug. 16, 2010, 7:38 p.m.
On Mon, Aug 16, 2010 at 12:24 PM, john stultz <johnstul@us.ibm.com> wrote:
> On Mon, Aug 16, 2010 at 4:17 AM, Richard Cochran
> A few comments below....
>
>> +** PTP user space API
>> +
>> +   The class driver creates a character device for each registered PTP
>> +   clock. User space programs may control the clock using standardized
>> +   ioctls. A program may query, enable, configure, and disable the
>> +   ancillary clock features. User space can receive time stamped
>> +   events via blocking read() and poll(). One shot and periodic
>> +   signals may be configured via an ioctl API with semantics similar
>> +   to the POSIX timer_settime() system call.
>
> As I mentioned earlier, I'm not a huge fan of the char device
> interface for abstracted PTP clocks.
> If it was just the direct hardware access, similar to RTC, which user
> apps then use as a timesource, I'd not have much of a problem. But as
> I mentioned in an earlier private mail, the abstraction level concerns
> me.

[snip]

> 2) As Arnd already mentioned, the chardev interface seems to duplicate
> the clock_gettime/settime() and adjtimex() interfaces.

And maybe just to clarify, as I saw your response to Arnd, I'm not
suggesting using PTP clocks as clocksources for the internal
timekeeping core. Instead I'm trying to understand why PTP clocks need
the equivalent of the existing posix clocks/timer interface. Why would
only having a read-time interface not suffice?

thanks
-john
Arnd Bergmann - Aug. 16, 2010, 7:59 p.m.
On Monday 16 August 2010 21:00:03 Richard Cochran wrote:
> 
> On Mon, Aug 16, 2010 at 04:26:23PM +0200, Arnd Bergmann wrote:
> > Have you considered integrating the subsystem into the Posix clock/timer
> > framework?
> 
> Yes, but see below.
>  
> > I can't really tell from reading the source if this is possible or
> > not, but my feeling is that if it can be done, that would be a much
> > nicer interface. We already have clock_gettime/clock_settime/
> > timer_settime/... system calls, and while you'd need to add another
> > clockid and some syscalls, my feeling is that it will be more
> > usable in the end.
> 
> You are not the first person to ask about this. See this link for
> longer explanation of why I did not go that way:
> 
>   http://marc.info/?l=linux-netdev&m=127669810232201&w=2
> 
> You could offer the PTP clock as a Linux clock source/event device,
> and I agree that it would be nicer. However, the problem is, what do
> you do with the PHY based clocks?  Just one 16 bit read from a PHY
> clock can take 40 usec, and you need four such read operations just to
> get the current time value.

Why does it matter how long it takes to read the clock? I wasn't thinking
of replacing the system clock with this, just exposing the additional
clock as a new clockid_t value that can be accessed using the existing
syscalls.

> Also, I really did not want to add or change any syscalls. I could not
> see a practical way to extend the existing syscalls to accommodate PTP
> clocks.

Why did you not want to add syscalls? Adding ioctls instead of syscalls
does not make the interface better, just less visible.

Out of the ioctl commands you define, we already seem to have half or more:

PTP_CLOCK_APIVERS -> not needed
PTP_CLOCK_ADJFREQ -> new clock_adjfreq
PTP_CLOCK_ADJTIME -> new clock_adjtime
PTP_CLOCK_GETTIME -> clock_gettime
PTP_CLOCK_SETTIME -> clock_settime
PTP_CLOCK_GETCAPS -> new clock_getcaps
PTP_CLOCK_GETTIMER -> timer_gettime
PTP_CLOCK_SETTIMER -> timer_create/timer_settime
PTP_FEATURE_REQUEST -> possibly clock_feature

	Arnd
Richard Cochran - Aug. 17, 2010, 8:32 a.m.
On Mon, Aug 16, 2010 at 09:59:39PM +0200, Arnd Bergmann wrote:
> Why does it matter how long it takes to read the clock? I wasn't thinking
> of replacing the system clock with this, just exposing the additional
> clock as a new clockid_t value that can be accessed using the existing
> syscalls.

Okay, now I see. You are suggesting this:

      clock_gettime(CLOCK_PTP, &ts);
      clock_settime(CLOCK_PTP, &ts);

I like this. If there is agreement about it, I am happy to implement
the PTP stuff that way.

> Why did you not want to add syscalls? Adding ioctls instead of syscalls
> does not make the interface better, just less visible.

I bet that, had I posted patch set with new syscalls, someone would
have said, "You are adding new syscalls. Can't you just use a char
device instead!"

If you add syscalls and introduce CLOCK_PTP, then you add it to
everyone's kernel, even those people who never heard of PTP. A char
device has the advantage that can it be simply ignored. Also, a
syscall has got to have the right form from the very beginning. If the
next generation of PTP hardware looks very different, then it is not
that much of a crime to change an ioctl interface, provided it has
versioning.

Richard
Richard Cochran - Aug. 17, 2010, 8:53 a.m.
On Mon, Aug 16, 2010 at 12:24:48PM -0700, john stultz wrote:
> 3) I'm not sure I see the benefit of being able to have multiple
> frequency corrected time domains.  In other words, what benefit would
> you get from adjusting a PTP clock's frequency instead of just
> adjusting the system's time freq? Having the PTP time as a reference
> to correct the system time seems reasonable, but I'm not sure I see
> why userland would want to adjust the PTP clock's freq.

For PTP enabled hardware, the timestamp on the network packet comes
from from the PTP clock, not from the system time.

Of course, you can always just leave the PTP clock alone, figure the
needed correction, and apply it whenever needed. But this has some
disadvantages. First of all, the (one and only) open source PTPd does
not do it that way. Also, only one program (the PTPd or equivalent)
will know the correct time. Other programs will not be able to ask the
operating system for time services. Instead, they would need to use
IPC to the PTPd.

The PTP protocol (and some PTP hardware) offers a "one step" method,
where the timestamps are inserted by the hardware on the fly. Here you
really do need the PTP clock to be correctly adjusted.

All of the PTP hardware that I am familiar with provides a clock
adjustment method, so it simpler and cleaner just to use this facility
to tune the PTP clock.

Richard
Arnd Bergmann - Aug. 17, 2010, 9:25 a.m.
On Tuesday 17 August 2010, Richard Cochran wrote:
> > Why did you not want to add syscalls? Adding ioctls instead of syscalls
> > does not make the interface better, just less visible.
> 
> I bet that, had I posted patch set with new syscalls, someone would
> have said, "You are adding new syscalls. Can't you just use a char
> device instead!"

Very possible, but after considering both options, I think we would
still end up with the same conclusion.
 
> If you add syscalls and introduce CLOCK_PTP, then you add it to
> everyone's kernel, even those people who never heard of PTP. A char
> device has the advantage that can it be simply ignored.

It's a matter of perspective whether you consider this an advantage
or disadvantage. I would expect that since you are trying to get support
for PTP into the kernel, you'd be happy for people to know about it and
use your code ;-).

> Also, a syscall has got to have the right form from the very beginning.
> If the next generation of PTP hardware looks very different, then it is not
> that much of a crime to change an ioctl interface, provided it has
> versioning.

No, that's just a myth. The rules for ABI stability are pretty much the
same. We try hard to avoid ever changing an existing ABI, for both
syscalls and ioctl. In either case, if you get it wrong, you have to support
the old interface and create a new syscall or ioctl command.
As mentioned, versioning does not solve this, it just adds another
indirection (which we try to avoid).

One difference is that more people take a look at your code when you suggest
a new syscall, so the chances of getting it right in the first upstream
version are higher.
Another difference is that we generally use ioctl for devices that can
be enumerated, while syscalls are for system services that are not tied to
a specific device. This argument works both ways for PTP IMHO: On the one
hand you want to have a reliable clock that you can use without knowing
where it comes from, on the other you might have multiple PTP sources that
you need to differentiate.

	Arnd
Richard Cochran - Aug. 17, 2010, 10:52 a.m.
On Tue, Aug 17, 2010 at 09:25:55AM +0000, Arnd Bergmann wrote:
> Another difference is that we generally use ioctl for devices that can
> be enumerated, while syscalls are for system services that are not tied to
> a specific device. This argument works both ways for PTP IMHO: On the one
> hand you want to have a reliable clock that you can use without knowing
> where it comes from, on the other you might have multiple PTP sources that
> you need to differentiate.

Yes, I agree. In normal use, there will be only one PTP clock in a
system. However, for research purposes, it would be nice to have more
than one.

I've been looking at offering the PTP clock as a posix clock, and it
is not as hard as I first thought. The PTP clock or clocks just have
to be registered as one of the posix_clocks[MAX_CLOCKS] in
posix-timers.c.

My suggestion would be to reserve three clock ids in time.h,
CLOCK_PTP0, CLOCK_PTP1, and CLOCK_PTP2. The first one would be the
same as CLOCK_REALTIME, for SW timestamping, and the other two would
allow two different PTP clocks at the same time, for the research use
case.

Using the clock id will bring another advantage, since it will then be
possible for user space to specify the desired timestamp source for
SO_TIMESTAMPING.

Richard
Arnd Bergmann - Aug. 17, 2010, 11:36 a.m.
On Tuesday 17 August 2010, Richard Cochran wrote:
> On Tue, Aug 17, 2010 at 09:25:55AM +0000, Arnd Bergmann wrote:
> > Another difference is that we generally use ioctl for devices that can
> > be enumerated, while syscalls are for system services that are not tied to
> > a specific device. This argument works both ways for PTP IMHO: On the one
> > hand you want to have a reliable clock that you can use without knowing
> > where it comes from, on the other you might have multiple PTP sources that
> > you need to differentiate.
> 
> Yes, I agree. In normal use, there will be only one PTP clock in a
> system. However, for research purposes, it would be nice to have more
> than one.
> 
> I've been looking at offering the PTP clock as a posix clock, and it
> is not as hard as I first thought. The PTP clock or clocks just have
> to be registered as one of the posix_clocks[MAX_CLOCKS] in
> posix-timers.c.

Ok sounds good.

> My suggestion would be to reserve three clock ids in time.h,
> CLOCK_PTP0, CLOCK_PTP1, and CLOCK_PTP2. The first one would be the
> same as CLOCK_REALTIME, for SW timestamping, and the other two would
> allow two different PTP clocks at the same time, for the research use
> case.

I don't think there is a point in making exactly two independent sources
available. The clockid_t space is not really limited, so we could define
an arbitrary range of ids for ptp sources that could be used simultaneously,
as long as we have space more more ids with a fixed number.

Would it be reasonable to assume that on a machine with a large number
of NICs, you'd want a separate ptp source for each of them for timestamping?
Or would you preferably define just one source in such a setup?

I think both could be done with the use of class device attributes in
sysfs for configuration. Maybe you can have one CLOCK_PTP value for one
global PTP source and use sysfs to configure which device that is.

If you also need simultaneous access to the specific clocks, you could
have run-time configured clockid numbers in a sysfs attribute for each
ptp class device.

> Using the clock id will bring another advantage, since it will then be
> possible for user space to specify the desired timestamp source for
> SO_TIMESTAMPING.

God point.

	Arnd
john stultz - Aug. 18, 2010, 12:22 a.m.
On Tue, 2010-08-17 at 10:53 +0200, Richard Cochran wrote:
> On Mon, Aug 16, 2010 at 12:24:48PM -0700, john stultz wrote:
> > 3) I'm not sure I see the benefit of being able to have multiple
> > frequency corrected time domains.  In other words, what benefit would
> > you get from adjusting a PTP clock's frequency instead of just
> > adjusting the system's time freq? Having the PTP time as a reference
> > to correct the system time seems reasonable, but I'm not sure I see
> > why userland would want to adjust the PTP clock's freq.
> 
> For PTP enabled hardware, the timestamp on the network packet comes
> from from the PTP clock, not from the system time.
> 
> Of course, you can always just leave the PTP clock alone, figure the
> needed correction, and apply it whenever needed. But this has some
> disadvantages. First of all, the (one and only) open source PTPd does
> not do it that way. Also, only one program (the PTPd or equivalent)
> will know the correct time. Other programs will not be able to ask the
> operating system for time services. Instead, they would need to use
> IPC to the PTPd.

Why would system time not be adjusted to the PTP time?

This is my main concern, that we're presenting a fractured API to
userland. Suddenly there isn't just system time, but ptp time as well,
and possibly multiple different ptp times.

Forgive me, as I suspect I'm missing something key here.

> The PTP protocol (and some PTP hardware) offers a "one step" method,
> where the timestamps are inserted by the hardware on the fly. Here you
> really do need the PTP clock to be correctly adjusted.
> 
> All of the PTP hardware that I am familiar with provides a clock
> adjustment method, so it simpler and cleaner just to use this facility
> to tune the PTP clock.

Hmm. So trying to recap here to see if I'm understanding properly:

The PTP clock is a bit of hardware (usually on the NIC) that can put
timestamps on packets (both incoming or outgoing?).

The need to offset/freq correct the PTP clock is important so that the
timestamps (incoming and outgoing) are correct, which makes future
offset calculations simpler.

Hmm. So I'm actually starting to come around to the chardev interface.

In a way, it seems it has some similarities to how the RTC device is
used in NTPd. Granted, NTPd doesn't correct the RTC (the kernel does
when we're synced, but its not a perfect parallel), but it can be used
as an input the steer the clock.

So while to me, it think it would be more ideal (or maybe just less
different) to have a read-only interface (like the RTC), leaving PTPd to
manage offset calculations and use that to steer the system time. I can
acknowledge the need to have some way to correct the freq so the packet
timestamps are corrected.

I still feel a little concerned over the timer/alarm related interfaces.
Could you explain why the alarm interface is necessary? 

So really I think my initial negative gut reaction to this was mostly
out of the fact that you introduced a char dev that provides almost 100%
coverage of the posix-time interface. That is duplication we definitely
don't want. 

Also I think the documentation I've read about PTP (likely just due to
the engineering focus) has an odd inverted sense of priority, focusing
on keeping obscure hardware clocks on NIC cards in sync, rather then the
the more tangible feature of keeping the system time in sync.

This could be comically interpreted as trying to create a shadow-time on
the system that is the "real time" and "yea, maybe we'll let the system
know what time it is, but user-apps who want to know the score can send
a magic ioctl to /dev/something and get the real deal". ;)  I'm sure
that's not the case, but I'd like to keep any confusion in userland
about which time is the best time to a minimum (ie: use the system
time).

thanks
-john
john stultz - Aug. 18, 2010, 12:40 a.m.
On Tue, 2010-08-17 at 13:36 +0200, Arnd Bergmann wrote:
> On Tuesday 17 August 2010, Richard Cochran wrote:
> > On Tue, Aug 17, 2010 at 09:25:55AM +0000, Arnd Bergmann wrote:
> > > Another difference is that we generally use ioctl for devices that can
> > > be enumerated, while syscalls are for system services that are not tied to
> > > a specific device. This argument works both ways for PTP IMHO: On the one
> > > hand you want to have a reliable clock that you can use without knowing
> > > where it comes from, on the other you might have multiple PTP sources that
> > > you need to differentiate.
> > 
> > Yes, I agree. In normal use, there will be only one PTP clock in a
> > system. However, for research purposes, it would be nice to have more
> > than one.
> > 
> > I've been looking at offering the PTP clock as a posix clock, and it
> > is not as hard as I first thought. The PTP clock or clocks just have
> > to be registered as one of the posix_clocks[MAX_CLOCKS] in
> > posix-timers.c.
> 
> Ok sounds good.

Oh no. I'm starting to waffle here. :)

So while I'm not a fan of the duplication of the posix clocks interface
that the proposed chardev interface introduced, I'm not sure if
absorbing the PTP clocks as a clockid is much better.

Mainly due to the fact that userland apps now have to chose between two
clockids that supposedly represent the same thing (the current wall
time).

> > My suggestion would be to reserve three clock ids in time.h,
> > CLOCK_PTP0, CLOCK_PTP1, and CLOCK_PTP2. The first one would be the
> > same as CLOCK_REALTIME, for SW timestamping, and the other two would
> > allow two different PTP clocks at the same time, for the research use
> > case.

Why would you have CLOCK_PTP0 == CLOCK_REALTIME? Whats the point of
that?

> I don't think there is a point in making exactly two independent sources
> available. The clockid_t space is not really limited, so we could define
> an arbitrary range of ids for ptp sources that could be used simultaneously,
> as long as we have space more more ids with a fixed number.
> 
> Would it be reasonable to assume that on a machine with a large number
> of NICs, you'd want a separate ptp source for each of them for timestamping?
> Or would you preferably define just one source in such a setup?
> 
> I think both could be done with the use of class device attributes in
> sysfs for configuration. Maybe you can have one CLOCK_PTP value for one
> global PTP source and use sysfs to configure which device that is.
> 
> If you also need simultaneous access to the specific clocks, you could
> have run-time configured clockid numbers in a sysfs attribute for each
> ptp class device.
> 
> > Using the clock id will bring another advantage, since it will then be
> > possible for user space to specify the desired timestamp source for
> > SO_TIMESTAMPING.

So how exactly would one map CLOCK_PTPx to an eth device?

So this is a little bit further out there, but assuming PTPd can sync
the PTP clock correctly, why could the kernel itself not sync the PTP
clock to system time?

So instead of the sync path looking like:
1) packet from master arrives on NIC, is timestamped by PTP clock
2) PTPd calculates offset between PTP clock and master time
3) PTPd corrects PTP clock freq/offset
4) PTPd corrects system time freq/offset

It would look like:
1) packet from master arrives on NIC, is timestamped by PTP clock
2) PTPd calculates offset between PTP clock and master time
3) PTPd corrects system time freq/offset
4) kernel corrects PTP clock freq/offset

I'm guessing the indirect PTP clock sync would have greater error, but
it avoids the fractured sense of time.


thanks
-john
Richard Cochran - Aug. 18, 2010, 7:19 a.m.
On Tue, Aug 17, 2010 at 05:22:43PM -0700, john stultz wrote:
> Why would system time not be adjusted to the PTP time?
> 
> This is my main concern, that we're presenting a fractured API to
> userland. Suddenly there isn't just system time, but ptp time as well,
> and possibly multiple different ptp times.

John, it is a good thing to make thoughts about the big picture with
PTP clocks and the system time, like you are doing. However, the
situation is not as troubled as you think. Let me try to explain.

> The PTP clock is a bit of hardware (usually on the NIC) that can put
> timestamps on packets (both incoming or outgoing?).

Not only on the NIC. There are bunch of new products doing the
timestamping in the PHY or in a switch fabric attached to the host
like a PHY. The synchronization that one can achieve with PHY
timestamps is better that that with MAC timestamping.

> So while to me, it think it would be more ideal (or maybe just less
> different) to have a read-only interface (like the RTC), leaving PTPd to
> manage offset calculations and use that to steer the system time. I can
> acknowledge the need to have some way to correct the freq so the packet
> timestamps are corrected.

The PTPd need not change the system time at all for PTP clock to be
useful. (see below)

> I still feel a little concerned over the timer/alarm related interfaces.
> Could you explain why the alarm interface is necessary? 

The timer/alarm stuff is "ancillary" and is not at all necessary. It
is just a "nice to have." I will happily remove it, if it is too
troubling for people.
 
> So really I think my initial negative gut reaction to this was mostly
> out of the fact that you introduced a char dev that provides almost 100%
> coverage of the posix-time interface. That is duplication we definitely
> don't want. 

The reason why I modelled the char device on the posix interface was
to make the API more familiar to application programmers. After the
recent discussion (and having reviewed the posix clock implementation
in Linux), I now think it would be even better to simply offer a new
posic clock ID for PTP.

I was emulating the posix interface. Instead I should use it directly.

> Also I think the documentation I've read about PTP (likely just due to
> the engineering focus) has an odd inverted sense of priority, focusing
> on keeping obscure hardware clocks on NIC cards in sync, rather then the
> the more tangible feature of keeping the system time in sync.
> 
> This could be comically interpreted as trying to create a shadow-time on
> the system that is the "real time" and "yea, maybe we'll let the system
> know what time it is, but user-apps who want to know the score can send
> a magic ioctl to /dev/something and get the real deal". ;)  I'm sure
> that's not the case, but I'd like to keep any confusion in userland
> about which time is the best time to a minimum (ie: use the system
> time).

You are right. As John Eidson's excellent book points out, modern
computers and operating systems provide surprisingly little support
for programming based on absolute time. It is not PTP's fault. PTP is
actually a step in the right direction, but it doesn't yet really fit
in to the present computing world.

Okay, here is the Big Picture.

1. Use Case: SW timestamping

   PTP with software timestamping (ie without special hardware) can
   acheive synchronization within a few dozen microseconds, after
   about twenty minutes. This is sufficient for very many people. The
   new API (whether char device or syscall) fully and simply supports
   this use case. When the PTPd adjusts the PTP clock, it is actually
   adjusting the system time, just like NTPd.

2. Use Case: HW timestamping for industrial control

   PTP with hardware timestamping can acheive synchronization within
   100 nanoseconds after one minute. If you want to do something with
   your wonderfully synchronization PTP clock, it must have some kind
   of special hardware, like timestamping external signals or
   generating one-shot or periodic outputs. The new API (whether char
   device or syscall) supports this use case via the ancillary
   commands.

   In this case, the end user has an application that interfaces with
   the outside world via the PTP clock API. Such a specialized
   application (for example, motor control) uses only the PTP API,
   since it knows that the standard posix API cannot help. It is
   irrelevant that the system time is not synchronized, in this case.

   The PTP clock hardware may or may not provide a hardware interface
   (interrupt) to the main CPU. In this case, it does not matter. The
   PTP clock is useful all by itself.

3. Use Case: HW timestamping with PPS to host

   This case is the same as case 2, with the exception that the PTP
   clock can interrupt the main CPU. The PTP clock driver advertises
   the "PPS" capability. When enabled, the PTP layer delivers events
   via the existing Linux PPS subsystem. Programs like NTPd can use
   these events to regulate the system time.

   This means that the system clock and the PTP clock will be at least
   as well synchronized as when using a traditionial radio clock, GPS,
   or IRIG-B method. In my opinion, this will be good enough for any
   practical purpose. For example, let's say you want to run a
   periodic task synchronized to the absolute wall clock time. Your
   scheduling latency will be a dozen microseconds or so. Your PPS
   synchronized system clock should be close enough to the PTP clock
   to support this.

The API that I have suggested, whether offered as a char device or as
syscalls, supports all of the use cases using a single API.

Richard
Richard Cochran - Aug. 18, 2010, 2:04 p.m.
On Tue, Aug 17, 2010 at 01:36:29PM +0200, Arnd Bergmann wrote:
> On Tuesday 17 August 2010, Richard Cochran wrote:
> > I've been looking at offering the PTP clock as a posix clock, and it
> > is not as hard as I first thought. The PTP clock or clocks just have
> > to be registered as one of the posix_clocks[MAX_CLOCKS] in
> > posix-timers.c.
> 
> Ok sounds good.

I've been working turning the PTP stuff into syscalls, but here is a
little issue I ran into. The PTP clock layer wants to call the PPS
code via pps_register_source(), but the PPS can be compiled as a
module. Since the PTP layer is now offering syscalls, it must always
be present in the kernel. So I need to make sure that the PPS is
either absent entirely or staticly linked in.

How can I do this?

Thanks,

Richard
Arnd Bergmann - Aug. 18, 2010, 3:02 p.m.
On Wednesday 18 August 2010, Richard Cochran wrote:
> On Tue, Aug 17, 2010 at 01:36:29PM +0200, Arnd Bergmann wrote:
> > On Tuesday 17 August 2010, Richard Cochran wrote:
> > > I've been looking at offering the PTP clock as a posix clock, and it
> > > is not as hard as I first thought. The PTP clock or clocks just have
> > > to be registered as one of the posix_clocks[MAX_CLOCKS] in
> > > posix-timers.c.
> > 
> > Ok sounds good.
> 
> I've been working turning the PTP stuff into syscalls, but here is a
> little issue I ran into. The PTP clock layer wants to call the PPS
> code via pps_register_source(), but the PPS can be compiled as a
> module. Since the PTP layer is now offering syscalls, it must always
> be present in the kernel. So I need to make sure that the PPS is
> either absent entirely or staticly linked in.
> 
> How can I do this?

You might want to use callbacks for these system calls that you
can register to at module load time, like it is done for the
existing syscalls.

The simpler way (e.g. for testing) is using Kconfig dependencies, like

config PTP
	bool "IEEE 1588 Precision Time Protocol"

config PPS
	tristate "Pulse per Second"
	depends on PTP || !PTP

The depends statement is a way of expressing that when PTP is enabled,
PPS cannot be a module, while it may be a module if PTP is disabled.

	Arnd
john stultz - Aug. 19, 2010, 12:12 a.m.
On Wed, 2010-08-18 at 09:19 +0200, Richard Cochran wrote:
> On Tue, Aug 17, 2010 at 05:22:43PM -0700, john stultz wrote:
>>
> > So while to me, it think it would be more ideal (or maybe just less
> > different) to have a read-only interface (like the RTC), leaving PTPd to
> > manage offset calculations and use that to steer the system time. I can
> > acknowledge the need to have some way to correct the freq so the packet
> > timestamps are corrected.
> 
> The PTPd need not change the system time at all for PTP clock to be
> useful. (see below)

Right, obviously an ok-solution is often more useful then no-solution.
But that doesn't mean we shouldn't shoot for a good or even
great-solution. :)


> > I still feel a little concerned over the timer/alarm related interfaces.
> > Could you explain why the alarm interface is necessary? 
> 
> The timer/alarm stuff is "ancillary" and is not at all necessary. It
> is just a "nice to have." I will happily remove it, if it is too
> troubling for people.

If there's a compelling argument for it, I'm interested to hear. But
again, it seems like just
yet-another-way-to-get-alarm/timer-functionality, so before we add an
extra API (or widen an existing API) I'd like to understand the need.

But maybe it might simplify the discussion to pull it for now, but
keeping it in mind to possibly include later as an extension?

> > So really I think my initial negative gut reaction to this was mostly
> > out of the fact that you introduced a char dev that provides almost 100%
> > coverage of the posix-time interface. That is duplication we definitely
> > don't want. 
> 
> The reason why I modelled the char device on the posix interface was
> to make the API more familiar to application programmers. After the
> recent discussion (and having reviewed the posix clock implementation
> in Linux), I now think it would be even better to simply offer a new
> posic clock ID for PTP.
> 
> I was emulating the posix interface. Instead I should use it directly.

I'm definitely interested to see what you come up with here. I'm still
hesitant with adding a PTP clock_id, but extending the posix-clocks
interface in this way isn't unprecedented (see: CLOCK_SGI_CYCLE) I just
would like to make sure we don't end up with a clock_id namespace
littered with oddball clocks that were not well abstracted (see:
CLOCK_SGI_CYCLE :).

For instance: imagine if instead of keeping the clocksource abstraction
internal to the timekeeping core, we exposed each clocksource to
userland via a clock_id.  Every arch would have different ids, and each
arch might have multiple ids. Programming against that would be a huge
pain.

So in thinking about this, try to focus on what the new clock_id
provides that the other existing clockids do not? Are they at comparable
levels of abstraction? 15 years from now, are folks likely to still be
using it? Will it be maintainable? etc...


> > Also I think the documentation I've read about PTP (likely just due to
> > the engineering focus) has an odd inverted sense of priority, focusing
> > on keeping obscure hardware clocks on NIC cards in sync, rather then the
> > the more tangible feature of keeping the system time in sync.
> > 
> > This could be comically interpreted as trying to create a shadow-time on
> > the system that is the "real time" and "yea, maybe we'll let the system
> > know what time it is, but user-apps who want to know the score can send
> > a magic ioctl to /dev/something and get the real deal". ;)  I'm sure
> > that's not the case, but I'd like to keep any confusion in userland
> > about which time is the best time to a minimum (ie: use the system
> > time).
> 
> You are right. As John Eidson's excellent book points out, modern
> computers and operating systems provide surprisingly little support
> for programming based on absolute time. It is not PTP's fault. PTP is
> actually a step in the right direction, but it doesn't yet really fit
> in to the present computing world.

You'll have to forgive me, as I haven't had the time to check out that
book. What exactly do you mean by operating systems provide little
support for programming based on absolute time?


> Okay, here is the Big Picture.
> 
> 1. Use Case: SW timestamping
> 
>    PTP with software timestamping (ie without special hardware) can
>    acheive synchronization within a few dozen microseconds, after
>    about twenty minutes. This is sufficient for very many people. The
>    new API (whether char device or syscall) fully and simply supports
>    this use case. When the PTPd adjusts the PTP clock, it is actually
>    adjusting the system time, just like NTPd.

Again this illustrates the inversion of focus: system time is merely one
of many possible PTP clocks in the larger PTP framework.

The way I tend to see it: PTP is just one of the many ways to sync
system time.


> 2. Use Case: HW timestamping for industrial control
> 
>    PTP with hardware timestamping can acheive synchronization within
>    100 nanoseconds after one minute. If you want to do something with
>    your wonderfully synchronization PTP clock, it must have some kind
>    of special hardware, like timestamping external signals or
>    generating one-shot or periodic outputs. The new API (whether char
>    device or syscall) supports this use case via the ancillary
>    commands.
> 
>    In this case, the end user has an application that interfaces with
>    the outside world via the PTP clock API. Such a specialized
>    application (for example, motor control) uses only the PTP API,
>    since it knows that the standard posix API cannot help. It is
>    irrelevant that the system time is not synchronized, in this case.
> 
>    The PTP clock hardware may or may not provide a hardware interface
>    (interrupt) to the main CPU. In this case, it does not matter. The
>    PTP clock is useful all by itself.

These specialized applications are part of what concerns me the most. 

For example, I can see some parallels between things like audio
processing, where you have a buffer consumed by the card at a certain
rate. Now, the card has its own crystal it uses to time its consumption,
so it has its own time domain, and could drift from system time. Thus
you want to trigger buffer-refill interrupts off of the audio card's
clock, not the system time which might run the risk of being late.

But again, we don't expose the audio hardware clock to userland in the
same way we expose system time.

So, instead of a chardev or a posix clock id, would it make more sense
for the PTP adjustment interface to be more closely to the
nic/phy/whatever interface?  Much like how the audio apis do it?

Again, my knowledge in the networking stack is pretty limited. But it
would seem that having an interface that does something to the effect of
"adjust the timestamp clock on the hardware that generated it from this
packet by Xppb" would feel like the right level of abstraction. Its
closely related to SO_TIMESTAMP, option right? Would something like
using the setsockopt/getsockopt interface with
SO_TIMESTAMP_ADJUST/OFFSET/SET/etc be reasonable?


> 3. Use Case: HW timestamping with PPS to host
> 
>    This case is the same as case 2, with the exception that the PTP
>    clock can interrupt the main CPU. The PTP clock driver advertises
>    the "PPS" capability. When enabled, the PTP layer delivers events
>    via the existing Linux PPS subsystem. Programs like NTPd can use
>    these events to regulate the system time.
> 
>    This means that the system clock and the PTP clock will be at least
>    as well synchronized as when using a traditionial radio clock, GPS,
>    or IRIG-B method. In my opinion, this will be good enough for any
>    practical purpose. For example, let's say you want to run a
>    periodic task synchronized to the absolute wall clock time. Your
>    scheduling latency will be a dozen microseconds or so. Your PPS
>    synchronized system clock should be close enough to the PTP clock
>    to support this.

And yes, this seems perfectly reasonable feature to add. Its not
controversial to me, because its likely to work within the existing
interfaces and won't expose anything new to userland.


Again, sorry to be such a pain about all of this. I know its frustrating
to bring your hard work to lkml and then get a lot of non-specific
push-back to "do it differently". You're earlier comments about being
wary of adding syscalls because it requires lots of review and debate
were spot on, but *any* user-visible API really requires the same level
of care (which, of course, they don't always get). So please don't take
my comments personally. Keep pushing and I will either come around or
maybe we can find a better middle-ground.

thanks
-john
Richard Cochran - Aug. 19, 2010, 5:55 a.m.
On Wed, Aug 18, 2010 at 05:12:56PM -0700, john stultz wrote:
> On Wed, 2010-08-18 at 09:19 +0200, Richard Cochran wrote:
> > The timer/alarm stuff is "ancillary" and is not at all necessary. It
> > is just a "nice to have." I will happily remove it, if it is too
> > troubling for people.
> 
> If there's a compelling argument for it, I'm interested to hear. But
> again, it seems like just
> yet-another-way-to-get-alarm/timer-functionality, so before we add an
> extra API (or widen an existing API) I'd like to understand the need.

We don't really need it, IMHO.

But if we offer clockid_t CLOCK_PTP, then we get timer_settime()
without any extra effort.

> > I was emulating the posix interface. Instead I should use it directly.
> 
> I'm definitely interested to see what you come up with here. I'm still
> hesitant with adding a PTP clock_id, but extending the posix-clocks
> interface in this way isn't unprecedented (see: CLOCK_SGI_CYCLE) I just
> would like to make sure we don't end up with a clock_id namespace
> littered with oddball clocks that were not well abstracted (see:
> CLOCK_SGI_CYCLE :).
> 
> For instance: imagine if instead of keeping the clocksource abstraction
> internal to the timekeeping core, we exposed each clocksource to
> userland via a clock_id.  Every arch would have different ids, and each
> arch might have multiple ids. Programming against that would be a huge
> pain.

The clockid_t CLOCK_PTP will be arch-neutral.

> So in thinking about this, try to focus on what the new clock_id
> provides that the other existing clockids do not? Are they at comparable
> levels of abstraction? 15 years from now, are folks likely to still be
> using it? Will it be maintainable? etc...

Arnd convinced me that clockid_t=CLOCK_PTP is a good fit. My plan
would be to introduce just one additional syscall:

SYSCALL_DEFINE3(clock_adjtime, const clockid_t, clkid,
		int, ppb, struct timespec __user *, ts)

ppb - desired frequency adjustment in parts per billion
ts  - desired time step (or jump) in <sec,nsec> to correct
      a measured offset

Arguably, this syscall might be useful for other clocks, too.

I think the ancillary features from PTP hardware clocks should be made
available throught the sysfs. A syscall for these would end up very
ugly, looking like an ioctl. Also, it is hard to see how these
features relate to the more general idea of the clockid.

In contrast, sysfs attributes will fit the need nicely:

1. enable or disable pps
2. enable or disable external timestamps
3. read out external timestamp
4. configure period for periodic output

> > 1. Use Case: SW timestamping
> The way I tend to see it: PTP is just one of the many ways to sync
> system time.

> > 2. Use Case: HW timestamping for industrial control
> These specialized applications are part of what concerns me the most. 

PTP was not invented to just to get a computer's system time in the
ball park. For that, NTP is good enough. Rather, some people want to
use their computers for tasks that require close synchronization, like
industrial control, audio/video streaming, and many others.

Are you saying that we should not support such applications?

> For example, I can see some parallels between things like audio
> processing, where you have a buffer consumed by the card at a certain
> rate. Now, the card has its own crystal it uses to time its consumption,
> so it has its own time domain, and could drift from system time. Thus
> you want to trigger buffer-refill interrupts off of the audio card's
> clock, not the system time which might run the risk of being late.
> 
> But again, we don't expose the audio hardware clock to userland in the
> same way we expose system time.

This is a good example of the poverty (in regards to time
synchronization) of our current systems.

Lets say I want to build a surround sound audio system, using a set of
distributed computers, each host connected to one speaker. How I can
be sure that the samples in one channel (ie one host) pass through the
DA converter at exactly the same time?

> Again, my knowledge in the networking stack is pretty limited. But it
> would seem that having an interface that does something to the effect of
> "adjust the timestamp clock on the hardware that generated it from this
> packet by Xppb" would feel like the right level of abstraction. Its
> closely related to SO_TIMESTAMP, option right? Would something like
> using the setsockopt/getsockopt interface with
> SO_TIMESTAMP_ADJUST/OFFSET/SET/etc be reasonable?

The clock and its adjustment have nothing to do with a network
socket. The current PTP hacks floating around all add private ioctls
to the MAC driver. That is the *wrong* way to do it.

> > 3. Use Case: HW timestamping with PPS to host
...
> And yes, this seems perfectly reasonable feature to add. Its not
> controversial to me, because its likely to work within the existing
> interfaces and won't expose anything new to userland.

Okay, then will you support an elegant solution for case 3, that also
supports cases 1 and 2 without any additional work?

> Again, sorry to be such a pain about all of this. I know its frustrating...

Thanks for your comments!

Richard
Richard Cochran - Aug. 19, 2010, 9:22 a.m.
On Wed, Aug 18, 2010 at 05:02:03PM +0200, Arnd Bergmann wrote:
> You might want to use callbacks for these system calls that you
> can register to at module load time, like it is done for the
> existing syscalls.

Can you point me to a specific example?

> The simpler way (e.g. for testing) is using Kconfig dependencies, like
> 
> config PTP
> 	bool "IEEE 1588 Precision Time Protocol"
> 
> config PPS
> 	tristate "Pulse per Second"
> 	depends on PTP || !PTP
> 
> The depends statement is a way of expressing that when PTP is enabled,
> PPS cannot be a module, while it may be a module if PTP is disabled.

THis did not work for me.

What I got was, in effect, two independent booleans.


Thanks,

Richard
Arnd Bergmann - Aug. 19, 2010, 12:28 p.m.
On Thursday 19 August 2010, Richard Cochran wrote:
> On Wed, Aug 18, 2010 at 05:12:56PM -0700, john stultz wrote:
> > On Wed, 2010-08-18 at 09:19 +0200, Richard Cochran wrote:
> > > The timer/alarm stuff is "ancillary" and is not at all necessary. It
> > > is just a "nice to have." I will happily remove it, if it is too
> > > troubling for people.
> > 
> > If there's a compelling argument for it, I'm interested to hear. But
> > again, it seems like just
> > yet-another-way-to-get-alarm/timer-functionality, so before we add an
> > extra API (or widen an existing API) I'd like to understand the need.
> 
> We don't really need it, IMHO.
> 
> But if we offer clockid_t CLOCK_PTP, then we get timer_settime()
> without any extra effort.

Makes sense.

> > So in thinking about this, try to focus on what the new clock_id
> > provides that the other existing clockids do not? Are they at comparable
> > levels of abstraction? 15 years from now, are folks likely to still be
> > using it? Will it be maintainable? etc...
> 
> Arnd convinced me that clockid_t=CLOCK_PTP is a good fit.

My point was that a syscall is better than an ioctl based interface here,
which I definitely still think. Given that John knows much more about
clocks than I do, we still need to get agreement on the question that
he raised, which is whether we actually need to expose this clock to the
user or not.

If we can find a way to sync system time accurate enough with PTP and
PPS, user applications may not need to see two separate clocks at all.

> My plan would be to introduce just one additional syscall:
> 
> SYSCALL_DEFINE3(clock_adjtime, const clockid_t, clkid,
> 		int, ppb, struct timespec __user *, ts)
> 
> ppb - desired frequency adjustment in parts per billion
> ts  - desired time step (or jump) in <sec,nsec> to correct
>       a measured offset
> 
> Arguably, this syscall might be useful for other clocks, too.

This is a mix of adjtime and adjtimex with the addition of
the clkid parameter, right?

Have you considered passing a struct timex instead of ppb and ts?

Is using ppb instead of the timex ppm required to get the accuracy
you want?

	Arnd
Arnd Bergmann - Aug. 19, 2010, 12:29 p.m.
On Thursday 19 August 2010, Richard Cochran wrote:
> 
> On Wed, Aug 18, 2010 at 05:02:03PM +0200, Arnd Bergmann wrote:
> > You might want to use callbacks for these system calls that you
> > can register to at module load time, like it is done for the
> > existing syscalls.
> 
> Can you point me to a specific example?

The struct k_clock contains callback functions to clock source
specific implementations of the syscalls and other functions.
Adding a new clock source that is (potentially) implemented
as a module means you have to define a new k_clock in that module
that you register using register_posix_clock.

If you need additional syscalls to be implemented by the same
module, you can put more callback functions into struct k_clock
that would then only be implemented by the new module.

> > The simpler way (e.g. for testing) is using Kconfig dependencies, like
> > 
> > config PTP
> >       bool "IEEE 1588 Precision Time Protocol"
> > 
> > config PPS
> >       tristate "Pulse per Second"
> >       depends on PTP || !PTP
> > 
> > The depends statement is a way of expressing that when PTP is enabled,
> > PPS cannot be a module, while it may be a module if PTP is disabled.
> 
> THis did not work for me.
> 
> What I got was, in effect, two independent booleans.

Hmm, right. I wonder what I was thinking of then.

You can certainly do

config PTP
	bool "IEEE 1588 Precision Time Protocol"
	depends on PPS != m

but that will hide the PTP option if PPS is set to 'm', which is normally
not what you want.

	Arnd
Ira Snyder - Aug. 19, 2010, 3:23 p.m.
On Thu, Aug 19, 2010 at 02:29:49PM +0200, Arnd Bergmann wrote:
> On Thursday 19 August 2010, Richard Cochran wrote:
> > 
> > On Wed, Aug 18, 2010 at 05:02:03PM +0200, Arnd Bergmann wrote:
> > > You might want to use callbacks for these system calls that you
> > > can register to at module load time, like it is done for the
> > > existing syscalls.
> > 
> > Can you point me to a specific example?
> 
> The struct k_clock contains callback functions to clock source
> specific implementations of the syscalls and other functions.
> Adding a new clock source that is (potentially) implemented
> as a module means you have to define a new k_clock in that module
> that you register using register_posix_clock.
> 
> If you need additional syscalls to be implemented by the same
> module, you can put more callback functions into struct k_clock
> that would then only be implemented by the new module.
> 
> > > The simpler way (e.g. for testing) is using Kconfig dependencies, like
> > > 
> > > config PTP
> > >       bool "IEEE 1588 Precision Time Protocol"
> > > 
> > > config PPS
> > >       tristate "Pulse per Second"
> > >       depends on PTP || !PTP
> > > 
> > > The depends statement is a way of expressing that when PTP is enabled,
> > > PPS cannot be a module, while it may be a module if PTP is disabled.
> > 
> > THis did not work for me.
> > 
> > What I got was, in effect, two independent booleans.
> 
> Hmm, right. I wonder what I was thinking of then.
> 
> You can certainly do
> 
> config PTP
> 	bool "IEEE 1588 Precision Time Protocol"
> 	depends on PPS != m
> 
> but that will hide the PTP option if PPS is set to 'm', which is normally
> not what you want.
> 

Arnd,

Perhaps you were thinking of the vhost example (taken from
drivers/vhost/Kconfig):

config VHOST_NET
	tristate "Host kernel accelerator for virtio net (EXPERIMENTAL)"
	depends on NET && EVENTFD && (TUN || !TUN) && (MACVTAP || !MACVTAP) && EXPERIMENTAL

They have a similar construct with both TUN and MACVTAP there. Perhaps
the parens are a necessary part of the "X || !X" syntax? Just a random
guess.

Ira
Richard Cochran - Aug. 19, 2010, 3:38 p.m.
On Thu, Aug 19, 2010 at 02:28:04PM +0200, Arnd Bergmann wrote:
> My point was that a syscall is better than an ioctl based interface here,
> which I definitely still think. Given that John knows much more about
> clocks than I do, we still need to get agreement on the question that
> he raised, which is whether we actually need to expose this clock to the
> user or not.
> 
> If we can find a way to sync system time accurate enough with PTP and
> PPS, user applications may not need to see two separate clocks at all.

At the very least, one user application (the PTPd) needs to see the
PTP clock.

> > SYSCALL_DEFINE3(clock_adjtime, const clockid_t, clkid,
> > 		int, ppb, struct timespec __user *, ts)
> > 
> > ppb - desired frequency adjustment in parts per billion
> > ts  - desired time step (or jump) in <sec,nsec> to correct
> >       a measured offset
> > 
> > Arguably, this syscall might be useful for other clocks, too.
> 
> This is a mix of adjtime and adjtimex with the addition of
> the clkid parameter, right?

Sort of, but not really. ADJTIME(3) takes an offset and slowly
corrects the clock using a servo in the kernel, over hours.

For this function, the offset passed in the 'ts' parameter will be
immediately corrected, by jumping to the new time. This reflects the
way that PTP works. After the first few samples, the PTPd has an
estimate of the offset to the master and the rate difference. The PTPd
can proceed in one of two ways.

1. If resetting the clock is not desired, then the clock is set to the
   maximum adjustment (in the right direction) until the clock time is
   close to the master's time.

2. The estimated offset is added to the current time, resulting in a
   jump in time.

We need clock_adjtime(id, 0, ts) for the second case.

> Have you considered passing a struct timex instead of ppb and ts?

Yes, but the timex is not suitable, IMHO.

> Is using ppb instead of the timex ppm required to get the accuracy
> you want?

That is one very good reason.

Another is this: can you explain what the 20+ fields mean?

Consider the field, freq. The comment says "frequency offset (scaled
ppm)."  To what is it scaled? The only way I know of to find out is to
read the NTP code (which is fairly complex) and see what the unit
really is meant to be. Ditto for the other fields.

The timex structure reveals, AFAICT, the inner workings of the kernel
clock servo. For PTP, we don't need or want the kernel servo. The PTPd
has its own clock servo in user space.

Richard
Arnd Bergmann - Aug. 19, 2010, 3:48 p.m.
On Thursday 19 August 2010, Ira W. Snyder wrote:
> Perhaps you were thinking of the vhost example (taken from
> drivers/vhost/Kconfig):
> 
> config VHOST_NET
>         tristate "Host kernel accelerator for virtio net (EXPERIMENTAL)"
>         depends on NET && EVENTFD && (TUN || !TUN) && (MACVTAP || !MACVTAP) && EXPERIMENTAL
> 
> They have a similar construct with both TUN and MACVTAP there. Perhaps
> the parens are a necessary part of the "X || !X" syntax? Just a random
> guess.

Yes, that's the one I was thinking of.

My mistake was that the effect is slightly different here. VHOST and TUN are
both tristate. What we guarantee here is that if TUN is "m", VHOST cannot be "y",
because its dependency cannot be fulfilled for y.

	Arnd
john stultz - Aug. 23, 2010, 8:08 p.m.
Sorry for the slow response here, I got busy with other work at the end
of last week.

On Thu, 2010-08-19 at 07:55 +0200, Richard Cochran wrote:
> On Wed, Aug 18, 2010 at 05:12:56PM -0700, john stultz wrote:
> > On Wed, 2010-08-18 at 09:19 +0200, Richard Cochran wrote:
> > > The timer/alarm stuff is "ancillary" and is not at all necessary. It
> > > is just a "nice to have." I will happily remove it, if it is too
> > > troubling for people.
> > 
> > If there's a compelling argument for it, I'm interested to hear. But
> > again, it seems like just
> > yet-another-way-to-get-alarm/timer-functionality, so before we add an
> > extra API (or widen an existing API) I'd like to understand the need.
> 
> We don't really need it, IMHO.
> 
> But if we offer clockid_t CLOCK_PTP, then we get timer_settime()
> without any extra effort.

Sure. There are some clear parallels and the API seems to match, but
what I'm asking is: does it make sense from an overall API view that
application developers have to understand?

> > > I was emulating the posix interface. Instead I should use it directly.
> > 
> > I'm definitely interested to see what you come up with here. I'm still
> > hesitant with adding a PTP clock_id, but extending the posix-clocks
> > interface in this way isn't unprecedented (see: CLOCK_SGI_CYCLE) I just
> > would like to make sure we don't end up with a clock_id namespace
> > littered with oddball clocks that were not well abstracted (see:
> > CLOCK_SGI_CYCLE :).
> > 
> > For instance: imagine if instead of keeping the clocksource abstraction
> > internal to the timekeeping core, we exposed each clocksource to
> > userland via a clock_id.  Every arch would have different ids, and each
> > arch might have multiple ids. Programming against that would be a huge
> > pain.
> 
> The clockid_t CLOCK_PTP will be arch-neutral.

Sure, but are they conceptually neutral? There are other clock
synchronization algorithms out there. Will they need their own
similar-but-different clock_ids?

Look at the other clock ids and what the represent:

CLOCK_REALTIME : Wall time (possibly freq/offset corrected)
CLOCK_MONOTONIC: Monotonic time (possibly freq corrected).
CLOCK_PROCESS_CPUTIME_ID: Process cpu time.
CLOCK_THREAD_CPUTIME_ID: Thread cpu time.
CLOCK_MONOTONIC_RAW: Non freq corrected monotonic time.
CLOCK_REALTIME_COARSE: Tick granular wall time (filesystem timestamp)
CLOCK_MONOTONIC_COARSE: Tick granular monotonic time.

CLOCK_PTP that you're proposing doesn't seem to be at the same level of
abstraction. I'm not saying that this isn't the right place for it, but
can we take a step back from PTP and consider what your exposing in more
generic terms. In other words, could someone use the same
packet-timestamping hardware to implement a different non-PTP time
synchronization algorithm?

Further, if we're using PTP to synchoronize the system time, then there
shouldn't be any measurable difference between CLOCK_PTP and
CLOCK_REALTIME, no?

> > So in thinking about this, try to focus on what the new clock_id
> > provides that the other existing clockids do not? Are they at comparable
> > levels of abstraction? 15 years from now, are folks likely to still be
> > using it? Will it be maintainable? etc...
> 
> Arnd convinced me that clockid_t=CLOCK_PTP is a good fit. My plan
> would be to introduce just one additional syscall:
> 
> SYSCALL_DEFINE3(clock_adjtime, const clockid_t, clkid,
> 		int, ppb, struct timespec __user *, ts)
> 
> ppb - desired frequency adjustment in parts per billion
> ts  - desired time step (or jump) in <sec,nsec> to correct
>       a measured offset
> 
> Arguably, this syscall might be useful for other clocks, too.

So yea, obviously the syscall should not be CLOCK_PTP specific, so we
would want it to be usable against CLOCK_REALTIME.

That said, the clock_adjtime your proposing does not seem to be
sufficient for usage by NTPd. So this suggests that it is not generic
enough.

> I think the ancillary features from PTP hardware clocks should be made
> available throught the sysfs. A syscall for these would end up very
> ugly, looking like an ioctl. Also, it is hard to see how these
> features relate to the more general idea of the clockid.

This may be a good approach, but be aware that adding stuff to sysfs
requires similar scrutiny as adding a syscall.  

> In contrast, sysfs attributes will fit the need nicely:
> 
> 1. enable or disable pps
> 2. enable or disable external timestamps
> 3. read out external timestamp
> 4. configure period for periodic output

Things to consider here:
Do having these options really make sense? 

Why would we want pps disabled?  And if that does make sense, would it
be better to do so via the existing pps interface instead of adding a
new ptp specific one? 

Same for the timestamps and periodic output (ie: and how do they differ
from reading or setting a timer on CLOCK_PTP?)


> > > 1. Use Case: SW timestamping
> > The way I tend to see it: PTP is just one of the many ways to sync
> > system time.
> 
> > > 2. Use Case: HW timestamping for industrial control
> > These specialized applications are part of what concerns me the most. 
> 
> PTP was not invented to just to get a computer's system time in the
> ball park. For that, NTP is good enough. Rather, some people want to
> use their computers for tasks that require close synchronization, like
> industrial control, audio/video streaming, and many others.
> 
> Are you saying that we should not support such applications?

Of course not! Just because I'm reviewing and critiquing your work does
not mean our goals are incompatible.


> > For example, I can see some parallels between things like audio
> > processing, where you have a buffer consumed by the card at a certain
> > rate. Now, the card has its own crystal it uses to time its consumption,
> > so it has its own time domain, and could drift from system time. Thus
> > you want to trigger buffer-refill interrupts off of the audio card's
> > clock, not the system time which might run the risk of being late.
> > 
> > But again, we don't expose the audio hardware clock to userland in the
> > same way we expose system time.
> 
> This is a good example of the poverty (in regards to time
> synchronization) of our current systems.
> 
> Lets say I want to build a surround sound audio system, using a set of
> distributed computers, each host connected to one speaker. How I can
> be sure that the samples in one channel (ie one host) pass through the
> DA converter at exactly the same time?

They won't be exactly the same, but to minimize any noticeable
difference we'd need each speaker/client-system that have their system
time closely synced. Then the server-system would need to send the
channel stream and frame times to each client. The clients would then
feed the audio frames to the audio card at the designated times.

This is a little high level and generic and of course, the devil's in
the details:

1) How is the system time synchronized across systems?

2) How is the error between the system time freq and the audio cards
rate addressed?

These are things that need to be addressed, but the high-level design is
what the applications should target, because it doesn't limit them to
the specifics of the details.

By suggesting the application be designed to use CLOCK_PTP, it limits
itself to systems with CLOCK_PTP hardware, and should the application be
ported to a different distributed system that's using RADclocks or some
other synchronization method, it won't function.

What the kernel needs to provide are ways to address #1 and #2 above,
but what the kernel needs to expose to userland should be minimal and
generic.


> > Again, my knowledge in the networking stack is pretty limited. But it
> > would seem that having an interface that does something to the effect of
> > "adjust the timestamp clock on the hardware that generated it from this
> > packet by Xppb" would feel like the right level of abstraction. Its
> > closely related to SO_TIMESTAMP, option right? Would something like
> > using the setsockopt/getsockopt interface with
> > SO_TIMESTAMP_ADJUST/OFFSET/SET/etc be reasonable?
> 
> The clock and its adjustment have nothing to do with a network
> socket. The current PTP hacks floating around all add private ioctls
> to the MAC driver. That is the *wrong* way to do it.

Could you clarify on *why* that is the wrong approach?

Maybe this is where some of the confusion is coming from? The subtleties
of the more generic PTP algorithm and how the existence of PTP hardware
clocks change things are not clear to me. My understanding of ptp and
the networking details around it is limited, so your expertise is
appreciated.  Might you consider covering some of this via a
Documentation/ptp/overview.txt file in a future version of your patch?

Here's a summary of what I understand:
So from:
http://en.wikipedia.org/wiki/Precision_Time_Protocol#Synchronization

We see the message exchange of Sync/Delay_Req/Delay_Resp, and the
calculation of the local offset from the server (and then a frequency
adjustment over time as offsets values are accumulated).

Without the hardware clock, this all of these messages and their
corresponding timestamps are likely created by PTPd, using clock_gettime
and then adjtimex() to correct for the calculated offset or freq
adjustment. No extra interfaces are necessary, and PTPd is syncing the
system time as accurately as it can. This is how the existing ptpd
projects on the web seem to function.

Now, with PTP hardware on the system, my understanding of what you're
trying to enable with your patches is that the PTP hardware does the
timestamping on both incoming and outgoing messages. PTPd then reads the
pre-timestamped messages, calculates the offset and freq correction, and
then feeds that back into the PTP hardware via your interface. No time
correction is done at all by PTPd.

Instead, you're proposing then to have a PPS signal emitted by the PTP
hardware (via the timer interface on that hardware, if I'm understanding
correctly). This PPS signal would then be picked up by something like
NTPd which would use it to correct the system time.


Questions:
1) When the PTP hardware is doing the timestamping, what API/interface
does PTPd use to get and send the Sync/Delay_req/Delay_Resp messages?

SO_TIMESTAMPed packets from a network device seems the obvious answer,
but your comments above about with regards to my SO_TIMESTAMP_ADJ idea
suggest there's something more subtle here.

2) You've mentioned multiple PTP hardware clocks are possible, but maybe
not practically useful. How does PTPd enumerate the existing clocks, and
know which devices to listen to for Sync/Delay_Resp messages?

The issue I'm trying to address here is the interface inconsistency
between the message timestamping interface (ie: likely from a packet,
possibly multiple sources) and the proposed CLOCK_PTP interface (with
only a single clock being exposed at a time, and that being controlled
by a sysfs interface).



My concerns: 
1) Again, I'm not totally comfortable exposing the PTP hardware via the
posix-clocks/timers interface. I'm not dead set against it, but it just
doesn't seem right as a top-level abstraction.

I'm curious if its possible to do the PTP hardware offset/adjustment
calculation in a module internally to the kernel? That would allow the
PPS interface to still be used to sync the system time, and not expose
additional interfaces.

2) If this is a top-level interface, I'd prefer the inconsistency
between how the timestamped messages are received and the proposed
posix_clocks/timer interface be clarified. 

For example: does the networking stack need to have the source clock_id
to use for SO_TIMESTAMPing be specified?

3) I still prefer the idea of keeping the PTP hardware adjustment API
close to the existing API to enable the SO_TIMESTAMPing. I realize you
disagree, but would like to understand better why.


thanks again,
-john
john stultz - Aug. 23, 2010, 8:21 p.m.
On Thu, 2010-08-19 at 17:38 +0200, Richard Cochran wrote:
> On Thu, Aug 19, 2010 at 02:28:04PM +0200, Arnd Bergmann wrote:
> > My point was that a syscall is better than an ioctl based interface here,
> > which I definitely still think. Given that John knows much more about
> > clocks than I do, we still need to get agreement on the question that
> > he raised, which is whether we actually need to expose this clock to the
> > user or not.
> > 
> > If we can find a way to sync system time accurate enough with PTP and
> > PPS, user applications may not need to see two separate clocks at all.
> 
> At the very least, one user application (the PTPd) needs to see the
> PTP clock.
> 
> > > SYSCALL_DEFINE3(clock_adjtime, const clockid_t, clkid,
> > > 		int, ppb, struct timespec __user *, ts)
> > > 
> > > ppb - desired frequency adjustment in parts per billion
> > > ts  - desired time step (or jump) in <sec,nsec> to correct
> > >       a measured offset
> > > 
> > > Arguably, this syscall might be useful for other clocks, too.
> > 
> > This is a mix of adjtime and adjtimex with the addition of
> > the clkid parameter, right?
> 
> Sort of, but not really. ADJTIME(3) takes an offset and slowly
> corrects the clock using a servo in the kernel, over hours.
> 
> For this function, the offset passed in the 'ts' parameter will be
> immediately corrected, by jumping to the new time. This reflects the
> way that PTP works. After the first few samples, the PTPd has an
> estimate of the offset to the master and the rate difference. The PTPd
> can proceed in one of two ways.
> 
> 1. If resetting the clock is not desired, then the clock is set to the
>    maximum adjustment (in the right direction) until the clock time is
>    close to the master's time.
> 
> 2. The estimated offset is added to the current time, resulting in a
>    jump in time.
> 
> We need clock_adjtime(id, 0, ts) for the second case.
>
> > Have you considered passing a struct timex instead of ppb and ts?
> 
> Yes, but the timex is not suitable, IMHO.

Could you expand on this?

Could we not add a adjustment mode ADJ_SETOFFSET or something that would
provide the instantaneous offset correction?

> > Is using ppb instead of the timex ppm required to get the accuracy
> > you want?
> 
> That is one very good reason.
> 
> Another is this: can you explain what the 20+ fields mean?
> 
> Consider the field, freq. The comment says "frequency offset (scaled
> ppm)."  To what is it scaled? The only way I know of to find out is to
> read the NTP code (which is fairly complex) and see what the unit
> really is meant to be. Ditto for the other fields.
> 
> The timex structure reveals, AFAICT, the inner workings of the kernel
> clock servo. For PTP, we don't need or want the kernel servo. The PTPd
> has its own clock servo in user space.

You're right that the timex is a little crufty. But its legacy that we
will support indefinitely. So following the established interface helps
maintainability.

So if the clock_adjtime interface is needed, it would seem best for it
to be generic enough to support not only PTP, but also the NTP kernel
PLL. 

In effect, this would make clock_adjtime(or clock_adjtimex) identical to
adjtimex, but only applicable to different CLOCK_ids. Additionally, it
would simplify the code for the CLOCK_REALTIME case as you could just
call directly into do_adjtimex(). 

Of course, extending adjtimex for ADJ_SETOFFSET would be needed first,
but that seems like a reasonable expansion of the interface that could
be used by more then just PTP.

thanks
-john
Stephan Gatzka - Aug. 24, 2010, 6:30 p.m.
Hello!

> I'm curious if its possible to do the PTP hardware offset/adjustment
> calculation in a module internally to the kernel? That would allow the
> PPS interface to still be used to sync the system time, and not expose
> additional interfaces.
Just my 2 cents on this issue. PTP is very often used in embedded 
systems, where the PTP timestamps will go into some dedicated hardware, 
for instance an FPGA.

I'm currently working on a project where it is not necessary to adjust 
the Linux system time via PTP (although it would be a nice to have), but 
we only need the timestamps from the PHY to put them into our FPGA 
device. So we need some kind of API to access the PTP timestamp registers.

Kind regards,

Stephan
Christian Riesch - Aug. 25, 2010, 9:40 a.m.
On Mon, Aug 23, 2010 at 10:08 PM, john stultz <johnstul@us.ibm.com> wrote:
> On Thu, 2010-08-19 at 07:55 +0200, Richard Cochran wrote:
>> On Wed, Aug 18, 2010 at 05:12:56PM -0700, john stultz wrote:
>> > Again, my knowledge in the networking stack is pretty limited. But it
>> > would seem that having an interface that does something to the effect of
>> > "adjust the timestamp clock on the hardware that generated it from this
>> > packet by Xppb" would feel like the right level of abstraction. Its
>> > closely related to SO_TIMESTAMP, option right? Would something like
>> > using the setsockopt/getsockopt interface with
>> > SO_TIMESTAMP_ADJUST/OFFSET/SET/etc be reasonable?
>>
>> The clock and its adjustment have nothing to do with a network
>> socket. The current PTP hacks floating around all add private ioctls
>> to the MAC driver. That is the *wrong* way to do it.
>
> Could you clarify on *why* that is the wrong approach?
>
> Maybe this is where some of the confusion is coming from? The subtleties
> of the more generic PTP algorithm and how the existence of PTP hardware
> clocks change things are not clear to me. My understanding of ptp and
> the networking details around it is limited, so your expertise is
> appreciated.  Might you consider covering some of this via a
> Documentation/ptp/overview.txt file in a future version of your patch?
>
> Here's a summary of what I understand:
> So from:
> http://en.wikipedia.org/wiki/Precision_Time_Protocol#Synchronization
>
> We see the message exchange of Sync/Delay_Req/Delay_Resp, and the
> calculation of the local offset from the server (and then a frequency
> adjustment over time as offsets values are accumulated).
>
> Without the hardware clock, this all of these messages and their
> corresponding timestamps are likely created by PTPd, using clock_gettime
> and then adjtimex() to correct for the calculated offset or freq
> adjustment. No extra interfaces are necessary, and PTPd is syncing the
> system time as accurately as it can. This is how the existing ptpd
> projects on the web seem to function.
>
> Now, with PTP hardware on the system, my understanding of what you're
> trying to enable with your patches is that the PTP hardware does the
> timestamping on both incoming and outgoing messages. PTPd then reads the
> pre-timestamped messages, calculates the offset and freq correction, and
> then feeds that back into the PTP hardware via your interface. No time
> correction is done at all by PTPd.

John,
What you describe here is only one of the use cases. If the hardware
has a single network port and operates as a PTP slave, it timestamps
the PTP packets that are sent and received and subsequently uses these
timestamps and the information it received from the master in the
packets to steer its clock to align it with the master clock. In such
a case the timestamping hardware and the clock hardware work together
closely and it seems to be okay to use the same interface to control
both the timestamping and the PTP clock.

But we have to consider other use cases, e.g.,

1) Boundary clocks:
We have more than one network port. One port operates as a slave
clock, our system gets time information via this port and steers its
PTP clock to align with the master clock. The other network ports of
our system operate as master clocks and redistribute the time
information we got from the master to other clocks on these networks.
In such a case we do timestamping on each of the network ports, but we
only have a single PTP clock. Each network port's timestamping
hardware uses the same hardware clock to generate time stamps.

2) Master clock:
We have one or more network ports. Our system has a really good clock
(ovenized quartz crystal, an atomic clock, a GPS timing receiver...)
and it distributes this time on the network. In such a case we do not
steer our clock based on the (packet) timestamps we get from our
timestamping unit. Instead, we directly drive our clock hardware with
a very stable frequency that we get from the OCXO or the atomic
clock... or we use one of the ancillary features of the PTP clock that
Richard mentioned to timestamp not network packets but a 1pps signal
and use these timestamps to steer the clock. Packet time stamping is
used to distribute the time to the slaves, but it is not part of the
control loop in this case.

So in the first case we have one PTP clock but several network packet
timestamping units, whereas in the second case the packet timestamping
is done but it is not part of the control loop that steers the clock.
Of course in most hardware implementations both the PTP clock and the
timestamping unit sit on the same chip and often use the same means of
communication to the cpu, e.g., the MDIO bus, but I think we need some
logical separation here.

Christian
john stultz - Aug. 27, 2010, 1:57 a.m.
On Wed, 2010-08-25 at 11:40 +0200, Christian Riesch wrote:
> What you describe here is only one of the use cases. If the hardware
> has a single network port and operates as a PTP slave, it timestamps
> the PTP packets that are sent and received and subsequently uses these
> timestamps and the information it received from the master in the
> packets to steer its clock to align it with the master clock. In such
> a case the timestamping hardware and the clock hardware work together
> closely and it seems to be okay to use the same interface to control
> both the timestamping and the PTP clock.
> 
> But we have to consider other use cases, e.g.,
> 
> 1) Boundary clocks:
> We have more than one network port. One port operates as a slave
> clock, our system gets time information via this port and steers its
> PTP clock to align with the master clock. The other network ports of
> our system operate as master clocks and redistribute the time
> information we got from the master to other clocks on these networks.
> In such a case we do timestamping on each of the network ports, but we
> only have a single PTP clock. Each network port's timestamping
> hardware uses the same hardware clock to generate time stamps.
> 
> 2) Master clock:
> We have one or more network ports. Our system has a really good clock
> (ovenized quartz crystal, an atomic clock, a GPS timing receiver...)
> and it distributes this time on the network. In such a case we do not
> steer our clock based on the (packet) timestamps we get from our
> timestamping unit. Instead, we directly drive our clock hardware with
> a very stable frequency that we get from the OCXO or the atomic
> clock... 

Ok. Following you here...

> or we use one of the ancillary features of the PTP clock that
> Richard mentioned to timestamp not network packets but a 1pps signal
> and use these timestamps to steer the clock. 

Wait.. I thought we weren't using PTP to steer the clock? But now we're
using the pps signal from it to do so? Do I misunderstand you? Or did
you just not mean this?

> Packet time stamping is
> used to distribute the time to the slaves, but it is not part of the
> control loop in this case.

I assume here you mean PTPd is steering the PTP clock according to the
system time (which is NTP/GPS/whatever sourced)? And then the PTP clock
distributes that time through the network?

> So in the first case we have one PTP clock but several network packet
> timestamping units, whereas in the second case the packet timestamping
> is done but it is not part of the control loop that steers the clock.
> Of course in most hardware implementations both the PTP clock and the
> timestamping unit sit on the same chip and often use the same means of
> communication to the cpu, e.g., the MDIO bus, but I think we need some
> logical separation here.


So first of all, thanks for the extra explanation and context here! I
really appreciate it, as I'm not familiar with all the hardware details
and possible use cases, but I'm trying to learn.

So in the two cases you mention, the time "flow" is something like:

#1) [Master Clock on Network1] => [PTP Clock] => [PTPd] =>
	[PTP Clock] => [PTP Clients on Network2]

#2) [GPS] => [NTPd] => [System Time] => [PTPd] => [PTP clock] =>
	[PTP clients on Network]

And the original case:
#3) [Master Clock on Network] => [PTP clock] => [PTPd] => [PTP clock]

With a secondary control flow:
	[PPS signal from PTP clock] => [NTPd] => [System Time]


Right?


So, just brainstorming here, I guess the question I'm trying to figure
out here, is can the "System Time" and "PTP clock" be merged/globbed
into a single "Time" interface from the userspace point of view?

In other words, if internal to the kernel, the PTP clock was always
synced to the system time, couldn't the flow look something like:

#3') [Master clock on network] => [PTP clock] => [PTPd] =>
	 [System Time] => [in-kernel sync thread] => [PTP clock]

So PTPd sees the offset adjustment from the PTP clock, and then feeds
that offset correction right into (a possibly enhanced) adjtimex. The
kernel would then immediately steer the PTP clock by the same amount to
keep it in sync with system time (along with a periodic offset/freq
correction step to deal with crystal drift).

Similarly:

#2') [GPS] => [NTPd] => [System Time] => [in-kernel sync thread] => 
		[PTP clock] => [PTP clients on Network]

and 

#1') [Master Clock on Network1] => [PTP Clock] => [PTPd] =>
	[System Time] => [in-kernel sync thread] => [PTP Clock] => 
	[PTP Clients on Network2]

Now, I realize PTP purists probably won't like this, because it
effectively makes the in-kernel sync thread similar to a PTP boundary
clock (or worse, since the control loop isn't exactly direct).

But considering that the kernel (internally) allows for *very*
fine-grained adjustments (we keep our long-term offset error in
(nanoseconds << 32)  ie: ~quarter-*billion*ths of a nanosecond - I think
that's sub-attosecond, if I recall the unit). And even the existing
external adjtimex interface allows for adjustments of 1ppm<<16 which is
a granularity of ~15 parts-per-trillion (assuming i'm doing the math
right).

These are all much greater then the parts-per-billion adjustment
granularity proposed for the direct PTP clock steering, so I suspect any
error caused by the indirection in the control flow could be minimized
significantly.

Additionally my suggestion here has the benefit of:
A: Avoiding the fragmented time domains (ie CLOCK_REALTIME vs CLOCK_PTP)
caused by adding a new clock_id.

B: Avoiding the indirect system time sync through the PPS interface,
which isn't completely terrible, but just feels a little ugly
configuration wise from a users-perspective.

I'm sure I still have lots to learn about PTP, so please let me know
where I'm off-base.

thanks
-john
Richard Cochran - Aug. 27, 2010, 7:57 a.m.
On Thu, Aug 26, 2010 at 06:57:49PM -0700, john stultz wrote:
> On Wed, 2010-08-25 at 11:40 +0200, Christian Riesch wrote:
> > 2) Master clock:
> > We have one or more network ports. Our system has a really good clock
> > (ovenized quartz crystal, an atomic clock, a GPS timing receiver...)
> > and it distributes this time on the network. In such a case we do not
> > steer our clock based on the (packet) timestamps we get from our
> > timestamping unit. Instead, we directly drive our clock hardware with
> > a very stable frequency that we get from the OCXO or the atomic
> > clock...
>
> Ok. Following you here...
>
> > or we use one of the ancillary features of the PTP clock that
> > Richard mentioned to timestamp not network packets but a 1pps signal
> > and use these timestamps to steer the clock.
>
> Wait.. I thought we weren't using PTP to steer the clock? But now we're
> using the pps signal from it to do so? Do I misunderstand you? Or did
> you just not mean this?

The master node in a PTP network probably takes its time from a
precise external time source, like GPS. The GPS provides a 1 PPS
directly to the PTP clock hardware, which latches the PTP hardware
clock time on the PPS edge. This provides one sample as input to a
clock servo (in the PTPd) that, in turn, regulates the PTP clock
hardware.

> > Packet time stamping is
> > used to distribute the time to the slaves, but it is not part of the
> > control loop in this case.
>
> I assume here you mean PTPd is steering the PTP clock according to the
> system time (which is NTP/GPS/whatever sourced)? And then the PTP clock
> distributes that time through the network?

Yes, but in this case, "system time" has nothing to do with the Linux
system time. For a PTP master clock, it really doesn't matter whether
the Linux time is correct, or not. It doesn't hurt either (but see
below about chaining servos).

> So first of all, thanks for the extra explanation and context here! I
> really appreciate it, as I'm not familiar with all the hardware details
> and possible use cases, but I'm trying to learn.
>
> So in the two cases you mention, the time "flow" is something like:
>
> #1) [Master Clock on Network1] => [PTP Clock] => [PTPd] =>
> 	[PTP Clock] => [PTP Clients on Network2]

I would only draw the PTP clock once, perhaps like this:

                                +------------------------------+
                                |                              ^
                                V                              |
[Master Clock on NW 1]--->[HW timestamp]--->[PTPd]--adj-->[PTP Clock]
[Slaves on NW 2,3,...]<---[HW timestamp]<---[    ]

>
> #2) [GPS] => [NTPd] => [System Time] => [PTPd] => [PTP clock] =>
> 	[PTP clients on Network]

Nope. More like this:

                                      +------------------------------+
                                      |                              ^
                                      V                              |
[GPS]----------PPS--------->[Latch  timestamp]--->[PTPd]--adj-->[PTP Clock]
[Slaves on NW 1,2,3,...]<---[HW pkt timestamp]<---[    ]

> And the original case:
> #3) [Master Clock on Network] => [PTP clock] => [PTPd] => [PTP clock]

More like:

[Master Clock on NW 1]--->[HW timestamp]--->[PTPd]--adj-->[PTP Clock]

> With a secondary control flow:
> 	[PPS signal from PTP clock] => [NTPd] => [System Time]

Yes.


> So, just brainstorming here, I guess the question I'm trying to figure
> out here, is can the "System Time" and "PTP clock" be merged/globbed
> into a single "Time" interface from the userspace point of view?

This is the core issue and source of misunderstanding, in my view. The
fact of the matter is, the current generation of computers has
multiple clocks, and these are usually unsynchronized. I think we
should not try too hard to cover up or work around this. It is a fact
of life.

It would be nice if there were only one clock, and that clock could do
everything that we need. Indeed, the next generation of SoC computers may
all have PTP build in to the main CPU. Well, one can always wish.

If we can make it appear that multiple clocks are just one clock, then
I agree that we should do it. But I would not want to sacrifice
synchronization accuracy or application features just to keep that
illusion.

> In other words, if internal to the kernel, the PTP clock was always
> synced to the system time, couldn't the flow look something like:
>
> #3') [Master clock on network] => [PTP clock] => [PTPd] =>
> 	 [System Time] => [in-kernel sync thread] => [PTP clock]
>
> So PTPd sees the offset adjustment from the PTP clock, and then feeds
> that offset correction right into (a possibly enhanced) adjtimex. The
> kernel would then immediately steer the PTP clock by the same amount to
> keep it in sync with system time (along with a periodic offset/freq
> correction step to deal with crystal drift).
>
> Similarly:
>
> #2') [GPS] => [NTPd] => [System Time] => [in-kernel sync thread] =>
> 		[PTP clock] => [PTP clients on Network]
>
> and
>
> #1') [Master Clock on Network1] => [PTP Clock] => [PTPd] =>
> 	[System Time] => [in-kernel sync thread] => [PTP Clock] =>
> 	[PTP Clients on Network2]
>
> Now, I realize PTP purists probably won't like this, because it
> effectively makes the in-kernel sync thread similar to a PTP boundary
> clock (or worse, since the control loop isn't exactly direct).

I don't like it. The experience with PTP boundary clocks already shows
that the errors in servo loops cascade. It worsens the PTP performance.

However, I think it would fine just to synch the Linux system time to
the PTP clock using the PPS interface, which is what my patch set
implements. Linux user applications would not be able to detect the
difference.

Richard
Richard Cochran - Aug. 27, 2010, 11:08 a.m.
On Mon, Aug 23, 2010 at 01:21:39PM -0700, john stultz wrote:
> On Thu, 2010-08-19 at 17:38 +0200, Richard Cochran wrote:
> > On Thu, Aug 19, 2010 at 02:28:04PM +0200, Arnd Bergmann wrote:
> > > My point was that a syscall is better than an ioctl based interface here,
> > > which I definitely still think. Given that John knows much more about
> > > clocks than I do, we still need to get agreement on the question that
> > > he raised, which is whether we actually need to expose this clock to the
> > > user or not.
> > > 
> > > If we can find a way to sync system time accurate enough with PTP and
> > > PPS, user applications may not need to see two separate clocks at all.
> > 
> > At the very least, one user application (the PTPd) needs to see the
> > PTP clock.
> > 
> > > > SYSCALL_DEFINE3(clock_adjtime, const clockid_t, clkid,
> > > > 		int, ppb, struct timespec __user *, ts)
> > > > 
> > > > ppb - desired frequency adjustment in parts per billion
> > > > ts  - desired time step (or jump) in <sec,nsec> to correct
> > > >       a measured offset
> > > > 
> > > > Arguably, this syscall might be useful for other clocks, too.
> > > 
> > > This is a mix of adjtime and adjtimex with the addition of
> > > the clkid parameter, right?
> > 
> > Sort of, but not really. ADJTIME(3) takes an offset and slowly
> > corrects the clock using a servo in the kernel, over hours.
> > 
> > For this function, the offset passed in the 'ts' parameter will be
> > immediately corrected, by jumping to the new time. This reflects the
> > way that PTP works. After the first few samples, the PTPd has an
> > estimate of the offset to the master and the rate difference. The PTPd
> > can proceed in one of two ways.
> > 
> > 1. If resetting the clock is not desired, then the clock is set to the
> >    maximum adjustment (in the right direction) until the clock time is
> >    close to the master's time.
> > 
> > 2. The estimated offset is added to the current time, resulting in a
> >    jump in time.
> > 
> > We need clock_adjtime(id, 0, ts) for the second case.
> >
> > > Have you considered passing a struct timex instead of ppb and ts?
> > 
> > Yes, but the timex is not suitable, IMHO.
> 
> Could you expand on this?

We need to able to specify that the call is for a PTP clock. We could
add that to the modes flag, like this:

/*timex.h*/
#define ADJ_PTP_0 0x10000
#define ADJ_PTP_1 0x20000
#define ADJ_PTP_2 0x30000
#define ADJ_PTP_3 0x40000

I can live with this, if everyone else can, too.
 
> Could we not add a adjustment mode ADJ_SETOFFSET or something that would
> provide the instantaneous offset correction?

Yes, but we would also need to add a struct timespec to the struct
timex, in order to get nanosecond resolution. I think it would be
possible to do in the padding at the end?

> You're right that the timex is a little crufty. But its legacy that we
> will support indefinitely. So following the established interface helps
> maintainability.

We can use it for PTP, with the modifications suggested above. Or we
can just introduce the clock_adjtime method, instead.
 
> So if the clock_adjtime interface is needed, it would seem best for it
> to be generic enough to support not only PTP, but also the NTP kernel
> PLL.

For the proposed clock_adjime, what else is needed to support clock
adjustment in general?

I don't mind making the interface generic enough to support any
(realistic) conceivable clock adjustment scheme, but beyond the
present PTP hardware clocks, I don't know what else might be needed.

Richard
Arnd Bergmann - Aug. 27, 2010, 12:03 p.m.
On Friday 27 August 2010, Richard Cochran wrote:
> On Mon, Aug 23, 2010 at 01:21:39PM -0700, john stultz wrote:
> > On Thu, 2010-08-19 at 17:38 +0200, Richard Cochran wrote:
> > > On Thu, Aug 19, 2010 at 02:28:04PM +0200, Arnd Bergmann wrote:
> > > > Have you considered passing a struct timex instead of ppb and ts?
> > > 
> > > Yes, but the timex is not suitable, IMHO.
> > 
> > Could you expand on this?
> 
> We need to able to specify that the call is for a PTP clock. We could
> add that to the modes flag, like this:
> 
> /*timex.h*/
> #define ADJ_PTP_0 0x10000
> #define ADJ_PTP_1 0x20000
> #define ADJ_PTP_2 0x30000
> #define ADJ_PTP_3 0x40000
>
> I can live with this, if everyone else can, too.

My suggestion was actually to have a new syscall with the existing
structure, and pass a clockid_t value to it, similar to your
sys_clock_adjtime(), not change the actual sys_adjtime syscall.
  
> > Could we not add a adjustment mode ADJ_SETOFFSET or something that would
> > provide the instantaneous offset correction?
> 
> Yes, but we would also need to add a struct timespec to the struct
> timex, in order to get nanosecond resolution. I think it would be
> possible to do in the padding at the end?

Yes, that's exactly what the padding is for. Instead of timespec, you can
probably have a extra values for replacing the existing ppm values with
ppb values.

	Arnd
Richard Cochran - Aug. 27, 2010, 12:38 p.m.
On Mon, Aug 23, 2010 at 01:08:45PM -0700, john stultz wrote:
> On Thu, 2010-08-19 at 07:55 +0200, Richard Cochran wrote:
> > The clockid_t CLOCK_PTP will be arch-neutral.
> 
> Sure, but are they conceptually neutral? There are other clock
> synchronization algorithms out there. Will they need their own
> similar-but-different clock_ids?
> 
> Look at the other clock ids and what the represent:

IMHO, the presently offered clock ids are a mixed bag...
 
> CLOCK_REALTIME : Wall time (possibly freq/offset corrected)
> CLOCK_MONOTONIC: Monotonic time (possibly freq corrected).
> CLOCK_PROCESS_CPUTIME_ID: Process cpu time.
> CLOCK_THREAD_CPUTIME_ID: Thread cpu time.

The amount of time a thread has been granted by the kernel is really
not connected to the real passage of time, at least not in a direct
way.

> CLOCK_MONOTONIC_RAW: Non freq corrected monotonic time.

This one comes from commit 2d42244ae71d6c7b0884b5664cf2eda30fb2ae68
and is surely a special case, unrelated to the other clock ids. The
commit message mentions that this was added to help the btime.sf.net
project. That project does not seem to have had any activity since
2007. If we can justify adding a clock id in this case, surely we can
add one for PTP as well!

> CLOCK_REALTIME_COARSE: Tick granular wall time (filesystem timestamp)
> CLOCK_MONOTONIC_COARSE: Tick granular monotonic time.

These were added in commit da15cfdae03351c689736f8d142618592e3cebc3
in order to fulfill needs of special applications.

> CLOCK_PTP that you're proposing doesn't seem to be at the same level of
> abstraction. I'm not saying that this isn't the right place for it, but
> can we take a step back from PTP and consider what your exposing in more
> generic terms. In other words, could someone use the same
> packet-timestamping hardware to implement a different non-PTP time
> synchronization algorithm?

Yes, of course. There is nothing at all in the patch set about the PTP
protocol itself. It just lets you access the hardware. In short:

1. SO_TIMESTAMPING delivers timestamped packets
2. the PTP API lets you tune the clock.

"That's all, folks."

> Further, if we're using PTP to synchoronize the system time, then there
> shouldn't be any measurable difference between CLOCK_PTP and
> CLOCK_REALTIME, no?

When using software timestamping, then the clocks are one in the same.

When using PTP, with the PPS hook to synchoronize the Linux system
time, the clocks will be a close as the servo algorithm provides. I
have not measured this yet, but it cannot be much different than using
any other PPS source.

> > SYSCALL_DEFINE3(clock_adjtime, const clockid_t, clkid,
> > 		int, ppb, struct timespec __user *, ts)
> > 
> > ppb - desired frequency adjustment in parts per billion
> > ts  - desired time step (or jump) in <sec,nsec> to correct
> >       a measured offset
> > 
> > Arguably, this syscall might be useful for other clocks, too.
> 
> So yea, obviously the syscall should not be CLOCK_PTP specific, so we
> would want it to be usable against CLOCK_REALTIME.
> 
> That said, the clock_adjtime your proposing does not seem to be
> sufficient for usage by NTPd. So this suggests that it is not generic
> enough.

I don't think we need to support ntpd. It already has adjtimex, and it
won't get any better by using another interface.

> > I think the ancillary features from PTP hardware clocks should be made
> > available throught the sysfs. A syscall for these would end up very
> > ugly, looking like an ioctl. Also, it is hard to see how these
> > features relate to the more general idea of the clockid.
> 
> This may be a good approach, but be aware that adding stuff to sysfs
> requires similar scrutiny as adding a syscall.  

Yes, it will be properly documented and maintained. I have already
implemented the ancillary stuff in two ways, via sysfs and with a
character device. The next patch set will include them both, and you
all can just choose which one to delete (or leave them both).

> > In contrast, sysfs attributes will fit the need nicely:
> > 
> > 1. enable or disable pps
> > 2. enable or disable external timestamps
> > 3. read out external timestamp
> > 4. configure period for periodic output
> 
> Things to consider here:
> Do having these options really make sense? 

Yes, since they represent the PTP clock's hardware features. As I
explained previously, if you don't have any hardware interfaces, then
having your clocks synchoronized to under 100 nanoseconds does not
help you more than having them to within 1 microsecond.

> Why would we want pps disabled?

If you are a master clock, then you want to take your PPS from an
external time source, like GPS.  If you leave the PTP PPS events on,
then they will occur close in time to the GPS PPS events and may add
unwanted latency to the interrupt handler.

> And if that does make sense, would it
> be better to do so via the existing pps interface instead of adding a
> new ptp specific one? 

We have not introduced new PPS interface. We use existing PPS subsystem.

> Same for the timestamps and periodic output (ie: and how do they differ
> from reading or setting a timer on CLOCK_PTP?)

The posix timer calls won't work:

I have a PTP hardware clocks with multiple external timestamp
channels. Using timer_gettime, how can I specify (or decode) the
channel of interest to me?

> > This is a good example of the poverty (in regards to time
> > synchronization) of our current systems.
> > 
> > Lets say I want to build a surround sound audio system, using a set of
> > distributed computers, each host connected to one speaker. How I can
> > be sure that the samples in one channel (ie one host) pass through the
> > DA converter at exactly the same time?
> 
> They won't be exactly the same, but to minimize any noticeable
> difference we'd need each speaker/client-system that have their system
> time closely synced. Then the server-system would need to send the
> channel stream and frame times to each client. The clients would then
> feed the audio frames to the audio card at the designated times.
> 
> This is a little high level and generic and of course, the devil's in
> the details:
> 
> 1) How is the system time synchronized across systems?
> 
> 2) How is the error between the system time freq and the audio cards
> rate addressed?
> 
> These are things that need to be addressed, but the high-level design is
> what the applications should target, because it doesn't limit them to
> the specifics of the details.
> 
> By suggesting the application be designed to use CLOCK_PTP, it limits
> itself to systems with CLOCK_PTP hardware, and should the application be
> ported to a different distributed system that's using RADclocks or some
> other synchronization method, it won't function.
> 
> What the kernel needs to provide are ways to address #1 and #2 above,
> but what the kernel needs to expose to userland should be minimal and
> generic.

My point was this:

The application requires that the soundcard DA clocks (*not* the CPU
clocks) be synchronized. Currently the Linux kernel offers no way at
all to do this at all.

> > The clock and its adjustment have nothing to do with a network
> > socket. The current PTP hacks floating around all add private ioctls
> > to the MAC driver. That is the *wrong* way to do it.
> 
> Could you clarify on *why* that is the wrong approach?

Christian explained this pretty well.

> Questions:
> 1) When the PTP hardware is doing the timestamping, what API/interface
> does PTPd use to get and send the Sync/Delay_req/Delay_Resp messages?
>
> SO_TIMESTAMPed packets from a network device seems the obvious answer,
> but your comments above about with regards to my SO_TIMESTAMP_ADJ idea
> suggest there's something more subtle here.

Nope, no magic here, just a plain old UDP socket.

> 2) You've mentioned multiple PTP hardware clocks are possible, but maybe
> not practically useful. How does PTPd enumerate the existing clocks, and
> know which devices to listen to for Sync/Delay_Resp messages?
> 
> The issue I'm trying to address here is the interface inconsistency
> between the message timestamping interface (ie: likely from a packet,
> possibly multiple sources) and the proposed CLOCK_PTP interface (with
> only a single clock being exposed at a time, and that being controlled
> by a sysfs interface).

The sysfs will include one class device for each PTP clock. Each clock
has a sysfs attribute with the corresponding clock id.

For the network timestamps, they need to be enabled using the
SIOCSHWTSTAMP ioctl. We can easily extend that ioctl to include the
desired clock id.

> My concerns: 
> 1) Again, I'm not totally comfortable exposing the PTP hardware via the
> posix-clocks/timers interface. I'm not dead set against it, but it just
> doesn't seem right as a top-level abstraction.

I would also be happy with the character device idea already
posted. Just pick one of the two, and I'll resubmit the patch set...

> I'm curious if its possible to do the PTP hardware offset/adjustment
> calculation in a module internally to the kernel? That would allow the
> PPS interface to still be used to sync the system time, and not expose
> additional interfaces.

It would be possible, but not too nice, IMHO. In contrast to NTP,
there is no real need to place the servo in the kernel. Having the
protocol code and servo in user space makes life much easier.

> 2) If this is a top-level interface, I'd prefer the inconsistency
> between how the timestamped messages are received and the proposed
> posix_clocks/timer interface be clarified. 
> 
> For example: does the networking stack need to have the source clock_id
> to use for SO_TIMESTAMPing be specified?

We could do it that way. I never liked the <SW,SYS,RAW> tuple in the
SO_TIMESTAMPING control message in the first place. Sending three
fields with two blank seems wasteful. Instead, <clockid,timestamp>
would be sufficient. Maybe that it is too late to change this.

Even without altering the timestamp, we can simply augment
SIOCSHWTSTAMP with a clock id.

At this point I would just like to go forward with one of the two
proposed APIs. I had modelled the character device on the posix clock
calls in order to make it immediately familar, and I think it is a
viable approach. After the lkml discussion, I think it is even cleaner
and nicer to just offer a new clock id.

Thanks for all the feedback and comments,

Richard
Alan Cox - Aug. 27, 2010, 12:41 p.m.
> The master node in a PTP network probably takes its time from a
> precise external time source, like GPS. The GPS provides a 1 PPS
> directly to the PTP clock hardware, which latches the PTP hardware
> clock time on the PPS edge. This provides one sample as input to a
> clock servo (in the PTPd) that, in turn, regulates the PTP clock
> hardware.

A PTP clock is TAI, Unix time is UTC.

> This is the core issue and source of misunderstanding, in my view. The
> fact of the matter is, the current generation of computers has
> multiple clocks, and these are usually unsynchronized. I think we
> should not try too hard to cover up or work around this. It is a fact
> of life.

In this case I don't think you can. Their divergence is rather difficult
to handle unless you have a GPS to hand.

But all this talk of "PTP this" and "PTP that" is not helpful. Any
interface for additional time sources should be generic with PTP being
one use case.

Alan
Alan Cox - Aug. 27, 2010, 12:45 p.m.
> > So if the clock_adjtime interface is needed, it would seem best for it
> > to be generic enough to support not only PTP, but also the NTP kernel
> > PLL.
> 
> For the proposed clock_adjime, what else is needed to support clock
> adjustment in general?

Multiple PLLs, at least with containers and certain classes of system you
want different containers in different timespaces, especially when doing
high precision stuff where you need your system tracking say a local
master clock for syncing musical instruments and sound events while
tracking other clocks like NTP for general system time.

> I don't mind making the interface generic enough to support any
> (realistic) conceivable clock adjustment scheme, but beyond the
> present PTP hardware clocks, I don't know what else might be needed.

Put the clock type in the new fields. It becomes

	u16 clock_type;
	[clock type specific data]

saves having to guess.

Alan
Alan Cox - Aug. 27, 2010, 1:38 p.m.
> 2007. If we can justify adding a clock id in this case, surely we can
> add one for PTP as well!

But PTP isn't really a clock - its a time sync protocol. You can (and may
need to) have multiple clocks of this form on the same host because it's
master based and you may have to deal with multiple masters who disagree.

> > Further, if we're using PTP to synchoronize the system time, then there
> > shouldn't be any measurable difference between CLOCK_PTP and
> > CLOCK_REALTIME, no?
> 
> When using software timestamping, then the clocks are one in the same.

Technically the POSIX clock is UTC, IEEE1588v2 is TAI.

> It would be possible, but not too nice, IMHO. In contrast to NTP,
> there is no real need to place the servo in the kernel. Having the
> protocol code and servo in user space makes life much easier.

We can't currently put it in the kernel anyway for other good reasons.

> viable approach. After the lkml discussion, I think it is even cleaner
> and nicer to just offer a new clock id.

PTP is not a clock, it's many clocks so a clock id doesn't really work.
You could assume a single time domain and add a CLOCK_TAI plus then use
PTP to track it I guess ?

The question then is who would consume it and how ?

Generic applications want POSIX time, which is managed by NTP but could
in userspace also be slewed via the existing API to track a PTP source if
someone wanted and if there is a GPS source around they can compute UTC
from it.

Specialist applications will presumably need to know which time source or
sources they are tracking and synchronizing too out of multiple potential
PTP sources

Kernel stuff is more of a problem.

I'm not sure shoehorning a source of many clocks and time sync bases into
a jump up and down and make it fit single time assumption is wise. Making
system time bimble track a source makes sense just as with NTP but making
it a new clock seems the wrong model extending a non-too-bright API when
you can just put the time sources in a file tree.

Alan
Richard Cochran - Aug. 27, 2010, 2:02 p.m.
On Fri, Aug 27, 2010 at 01:41:54PM +0100, Alan Cox wrote:
> > The master node in a PTP network probably takes its time from a
> > precise external time source, like GPS. The GPS provides a 1 PPS
> > directly to the PTP clock hardware, which latches the PTP hardware
> > clock time on the PPS edge. This provides one sample as input to a
> > clock servo (in the PTPd) that, in turn, regulates the PTP clock
> > hardware.
> 
> A PTP clock is TAI, Unix time is UTC.

But TAI and UTC progress at the same rate, and UTC differs from TAI by
a constant offset. In fact, the needed conversion is provided by the
protocol, so it is not hard to take a 1 PPS from GPS and set the PTP
clock to TAI.

> > This is the core issue and source of misunderstanding, in my view. The
> > fact of the matter is, the current generation of computers has
> > multiple clocks, and these are usually unsynchronized. I think we
> > should not try too hard to cover up or work around this. It is a fact
> > of life.
> 
> In this case I don't think you can. Their divergence is rather difficult
> to handle unless you have a GPS to hand.
> 
> But all this talk of "PTP this" and "PTP that" is not helpful. Any
> interface for additional time sources should be generic with PTP being
> one use case.

To tell the truth, my original motivation for the patch set was to
support PTP clocks and applications. I don't think that is such a bad
idea. After all, the adjtimex interface was added just to support NTP.

At the same time, I can understand the desire to have a generic
hardware clock adjustment API. Let me see if I can understand and
summarize what people are asking for:

	  clock_adjtime(clockid_t id, struct timex *t);

and struct timex gets some new fields at the end.

Using the call, NTPd can call clock_adjtime(CLOCK_REALTIME) and PTPd
can call clock_realtime(CLOCK_PTP) and everyone is happy, no?

Thanks,
Richard
Richard Cochran - Aug. 27, 2010, 2:34 p.m.
On Fri, Aug 27, 2010 at 02:38:44PM +0100, Alan Cox wrote:
> > 2007. If we can justify adding a clock id in this case, surely we can
> > add one for PTP as well!
> 
> But PTP isn't really a clock - its a time sync protocol. You can (and may
> need to) have multiple clocks of this form on the same host because it's
> master based and you may have to deal with multiple masters who disagree.

Okay, I really meant "for PTP hardware clocks". In general the
discussion is about supporting a kind of hardware and not about the
PTP network protocol. In fact, the hardware clocks and clock servo
loops are not at all part of the IEEE 1588 standard.

Sorry for causing confusion, but please understand "a hardware clock
with timestamping capabilities than can be used for PTP support"
whenever I wrote "PTP" or "PTP clock."

Well, what I just said is not entirely true.

In fact, most of the current crop of PTP hardware clocks operate by
recognizing PTP network frames and timestamping them. One clock I know
of can timestamp every frame, but that seems to be the exception
rather than the rule. So, in theory they are just clocks, but actually
most are bound to the PTP protocol. That may change in the future...

> > viable approach. After the lkml discussion, I think it is even cleaner
> > and nicer to just offer a new clock id.
> 
> PTP is not a clock, it's many clocks so a clock id doesn't really work.
> You could assume a single time domain and add a CLOCK_TAI plus then use
> PTP to track it I guess ?
> 
> The question then is who would consume it and how ?
> 
> Generic applications want POSIX time, which is managed by NTP but could
> in userspace also be slewed via the existing API to track a PTP source if
> someone wanted and if there is a GPS source around they can compute UTC
> from it.

Yes, and even without a GPS, the PTP protocol (this time I really do
mean the protocol!) does provide the UTC offset whenever it is known.

> Specialist applications will presumably need to know which time source or
> sources they are tracking and synchronizing too out of multiple potential
> PTP sources

Yes, the PTPd will have some special knowledge.

> Kernel stuff is more of a problem.
> 
> I'm not sure shoehorning a source of many clocks and time sync bases into
> a jump up and down and make it fit single time assumption is wise. Making
> system time bimble track a source makes sense just as with NTP but making
> it a new clock seems the wrong model extending a non-too-bright API when
> you can just put the time sources in a file tree.

Don't get your meaning here, what did you mean by "file tree?"

Thanks,
Richard
Alan Cox - Aug. 27, 2010, 2:50 p.m.
> To tell the truth, my original motivation for the patch set was to
> support PTP clocks and applications. I don't think that is such a bad

ptp *clocks*

> idea. After all, the adjtimex interface was added just to support NTP.
> 
> At the same time, I can understand the desire to have a generic
> hardware clock adjustment API. Let me see if I can understand and
> summarize what people are asking for:
> 
> 	  clock_adjtime(clockid_t id, struct timex *t);
> 
> and struct timex gets some new fields at the end.

For a new syscall you could equally make it

	(clockid_t id, void *args)

> Using the call, NTPd can call clock_adjtime(CLOCK_REALTIME) and PTPd
> can call clock_realtime(CLOCK_PTP) and everyone is happy, no?

If you only have one clock that you are calling 'the PTP clock' - but is
that a good assumption ?

I agree with your fundamental arguments as I understand them

- That it's another clock or clocks possibly not synchronized with the
  system clock

- That there should be a sensible API for doing slews and steps on other
  clocks but the systen clock.

I'm concerned about the assumption that there is a single magic PTP
clock, and calling it a PTP clock for two reasons

- There can be more than one

- PTP is just a protocol, in five years time it might be TICTOC or
  something newer and more wonderous, in some environments it'll be a
  synchronous distributed clock generation not PTP etc. Wiring PTP or
  IEE1588v2 into the clock name doesn't make sense.

I'd be happier with a model which says we have some arbitary number of
synchronization sources which may or may not have a connection to system
time, and may be using all sorts of synchronization methods. Clock in
fact seems almost a misnomer.

Alan
Alan Cox - Aug. 27, 2010, 3:06 p.m.
> Sorry for causing confusion, but please understand "a hardware clock
> with timestamping capabilities than can be used for PTP support"
> whenever I wrote "PTP" or "PTP clock."

Ok that makes sense.
> 
> Well, what I just said is not entirely true.
> most are bound to the PTP protocol. That may change in the future...

Which seems fine to me too - its an implementation detail of that time
source.

> > Specialist applications will presumably need to know which time source or
> > sources they are tracking and synchronizing too out of multiple potential
> > PTP sources
> 
> Yes, the PTPd will have some special knowledge.

Not only that but consumers of different time synchronizations will need
to be able to describe which time source they want to talk about from a
selection of PTP or similar things.

> > system time bimble track a source makes sense just as with NTP but making
> > it a new clock seems the wrong model extending a non-too-bright API when
> > you can just put the time sources in a file tree.
> 
> Don't get your meaning here, what did you mean by "file tree?"

Something like

	/sys/class/timesource/<name>/...

at which point we don't have to enumerate them all, add special system
calls and then fret about the fact you can't access them from things like
shell scripts.

The fact SYS5.4 Unix and SuS got obsessed with numbering things
rather than using names unlike Unix doesn't mean it's the right model to
number them - usually the reverse is true, a heirarchy of file names is
rather more future proof.

Alan
Patrick Loschmidt - Aug. 27, 2010, 3:21 p.m.
Hi!

I'd like to add my two cents about the discussion. Just to shortly
introduce myself: I'm working with PTP since version 2002 (now 2008 or
PTPv2) and I'm developing matching network cards, drivers, and also
sometimes a bit of the stack.

I always had the problem of different HW implementations (even my own)
and how to access the clocks there. So reading this thread, I strongly
support the idea to provide a driver class, which allows the userspace
to run certain standard operations (settime, gettime, adjtime, ...) as
proposed and leave the detailed implementation to a specific PTP clock
driver. I always made my own char device, but I don't want to open this
discussion again as for me it doesn't matter.

Concerning the long discussion about PTP clock, system time, etc. I'd
like to say, that from a point of view of PTP every node/host has only
one clock. That means, that if you have a NIC with integrated clock
(required for HW timestamping) it is counted as a node. As soon as you
want to use multiple NICs this is actually outside the PTP protocol
definition, unless you have only one clock available to both network
interfaces (which is hard to achieve).

So the solution to treat a PTP enabled NIC as a time source for the
system time is in general sufficient, unless for applications that
require precise timestamps in applications. I know of use cases where
code gets instrumented to measure time delays, processing time, and
sequence in SW, e.g. distributed databases for trading systems at the
stock exchange.

Summarizing I think it would be a huge step forward, if all the
different HW implementations could be controlled via a standardised
interface through the kernel. I need timestamps from my network card
with ps resolution and to steer the onboard clock from user space. Then
I would be happy already. :-)

Thanks,
Patrick
M. Warner Losh - Aug. 27, 2010, 3:35 p.m.
In message: <20100827140205.GA3293@riccoc20.at.omicron.at>
            Richard Cochran <richardcochran@gmail.com> writes:
: On Fri, Aug 27, 2010 at 01:41:54PM +0100, Alan Cox wrote:
: > > The master node in a PTP network probably takes its time from a
: > > precise external time source, like GPS. The GPS provides a 1 PPS
: > > directly to the PTP clock hardware, which latches the PTP hardware
: > > clock time on the PPS edge. This provides one sample as input to a
: > > clock servo (in the PTPd) that, in turn, regulates the PTP clock
: > > hardware.
: > 
: > A PTP clock is TAI, Unix time is UTC.
: 
: But TAI and UTC progress at the same rate, and UTC differs from TAI by
: a constant offset. In fact, the needed conversion is provided by the
: protocol, so it is not hard to take a 1 PPS from GPS and set the PTP
: clock to TAI.

Except for leap seconds, this is true.   However, Unix time isn't UTC
either.  Unix time is UTC that pretends leap seconds just don't
exist.  POSIX enshrined this long ago, and nobody is going to change
that any time soon.

I don't believe IEEEv2 propagates leap seconds, does it?

Warner
Jacob Keller - Aug. 27, 2010, 4:17 p.m.
On Fri, Aug 27, 2010 at 8:06 AM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote

> > > system time bimble track a source makes sense just as with NTP but
> making
> > > it a new clock seems the wrong model extending a non-too-bright API
> when
> > > you can just put the time sources in a file tree.
> >
> > Don't get your meaning here, what did you mean by "file tree?"
>
> Something like
>
>        /sys/class/timesource/<name>/...
>
> at which point we don't have to enumerate them all, add special system
> calls and then fret about the fact you can't access them from things like
> shell scripts.
>

I agree that this interface is the best way to be future proof. This appears
to allow for multiple "clocks" or nodes, and doesn't get messed up when we
add more than one NIC with separate clocks to a machine.
john stultz - Aug. 27, 2010, 8:14 p.m.
On Fri, 2010-08-27 at 13:08 +0200, Richard Cochran wrote:
> On Mon, Aug 23, 2010 at 01:21:39PM -0700, john stultz wrote:
> > On Thu, 2010-08-19 at 17:38 +0200, Richard Cochran wrote:
> > > On Thu, Aug 19, 2010 at 02:28:04PM +0200, Arnd Bergmann wrote:
> > > > My point was that a syscall is better than an ioctl based interface here,
> > > > which I definitely still think. Given that John knows much more about
> > > > clocks than I do, we still need to get agreement on the question that
> > > > he raised, which is whether we actually need to expose this clock to the
> > > > user or not.
> > > > 
> > > > If we can find a way to sync system time accurate enough with PTP and
> > > > PPS, user applications may not need to see two separate clocks at all.
> > > 
> > > At the very least, one user application (the PTPd) needs to see the
> > > PTP clock.
> > > 
> > > > > SYSCALL_DEFINE3(clock_adjtime, const clockid_t, clkid,
> > > > > 		int, ppb, struct timespec __user *, ts)
> > > > > 
> > > > > ppb - desired frequency adjustment in parts per billion
> > > > > ts  - desired time step (or jump) in <sec,nsec> to correct
> > > > >       a measured offset
> > > > > 
> > > > > Arguably, this syscall might be useful for other clocks, too.
> > > > 
> > > > This is a mix of adjtime and adjtimex with the addition of
> > > > the clkid parameter, right?
> > > 
> > > Sort of, but not really. ADJTIME(3) takes an offset and slowly
> > > corrects the clock using a servo in the kernel, over hours.
> > > 
> > > For this function, the offset passed in the 'ts' parameter will be
> > > immediately corrected, by jumping to the new time. This reflects the
> > > way that PTP works. After the first few samples, the PTPd has an
> > > estimate of the offset to the master and the rate difference. The PTPd
> > > can proceed in one of two ways.
> > > 
> > > 1. If resetting the clock is not desired, then the clock is set to the
> > >    maximum adjustment (in the right direction) until the clock time is
> > >    close to the master's time.
> > > 
> > > 2. The estimated offset is added to the current time, resulting in a
> > >    jump in time.
> > > 
> > > We need clock_adjtime(id, 0, ts) for the second case.
> > >
> > > > Have you considered passing a struct timex instead of ppb and ts?
> > > 
> > > Yes, but the timex is not suitable, IMHO.
> > 
> > Could you expand on this?
> 
> We need to able to specify that the call is for a PTP clock. We could
> add that to the modes flag, like this:
> 
> /*timex.h*/
> #define ADJ_PTP_0 0x10000
> #define ADJ_PTP_1 0x20000
> #define ADJ_PTP_2 0x30000
> #define ADJ_PTP_3 0x40000
> 
> I can live with this, if everyone else can, too.

I wasn't suggesting adding the clock multiplexing to the timex, just
using the timex to specify the adjustments in the clock_adjtime call. 

So I was asking why a timex was not suitable instead of using just the
ppb and timespec.


> > Could we not add a adjustment mode ADJ_SETOFFSET or something that would
> > provide the instantaneous offset correction?
> 
> Yes, but we would also need to add a struct timespec to the struct
> timex, in order to get nanosecond resolution. I think it would be
> possible to do in the padding at the end?


The existing struct timeval in the timex can be also used as a timespec.
NTPv4 uses it that way specifying the ADJ_NANO flag.


> > You're right that the timex is a little crufty. But its legacy that we
> > will support indefinitely. So following the established interface helps
> > maintainability.
> 
> We can use it for PTP, with the modifications suggested above. Or we
> can just introduce the clock_adjtime method, instead.

Again, I think you misunderstood my suggestion. I was suggesting
something like clock_adjtime(clock_id, struct timex*).


> > So if the clock_adjtime interface is needed, it would seem best for it
> > to be generic enough to support not only PTP, but also the NTP kernel
> > PLL.
> 
> For the proposed clock_adjime, what else is needed to support clock
> adjustment in general?
> 
> I don't mind making the interface generic enough to support any
> (realistic) conceivable clock adjustment scheme, but beyond the
> present PTP hardware clocks, I don't know what else might be needed.

I think using the timex struct covers most of the existing knowledge of
what is needed. 

thanks
-john
john stultz - Aug. 27, 2010, 8:56 p.m.
On Fri, 2010-08-27 at 14:03 +0200, Arnd Bergmann wrote:
> On Friday 27 August 2010, Richard Cochran wrote:
> > On Mon, Aug 23, 2010 at 01:21:39PM -0700, john stultz wrote:
> > > On Thu, 2010-08-19 at 17:38 +0200, Richard Cochran wrote:
> > > > On Thu, Aug 19, 2010 at 02:28:04PM +0200, Arnd Bergmann wrote:
> > > > > Have you considered passing a struct timex instead of ppb and ts?
> > > > 
> > > > Yes, but the timex is not suitable, IMHO.
> > > 
> > > Could you expand on this?
> > 
> > We need to able to specify that the call is for a PTP clock. We could
> > add that to the modes flag, like this:
> > 
> > /*timex.h*/
> > #define ADJ_PTP_0 0x10000
> > #define ADJ_PTP_1 0x20000
> > #define ADJ_PTP_2 0x30000
> > #define ADJ_PTP_3 0x40000
> >
> > I can live with this, if everyone else can, too.
> 
> My suggestion was actually to have a new syscall with the existing
> structure, and pass a clockid_t value to it, similar to your
> sys_clock_adjtime(), not change the actual sys_adjtime syscall.
>   
> > > Could we not add a adjustment mode ADJ_SETOFFSET or something that would
> > > provide the instantaneous offset correction?
> > 
> > Yes, but we would also need to add a struct timespec to the struct
> > timex, in order to get nanosecond resolution. I think it would be
> > possible to do in the padding at the end?
> 
> Yes, that's exactly what the padding is for. Instead of timespec, you can
> probably have a extra values for replacing the existing ppm values with
> ppb values.

Right, although the ppm/ppb issue shouldn't be a problem as the timex
allows for much finer then ppb resolution changes.

The only adjustment to the adjtimex/timex interface that may be needed
is the ability to set the time by an offset (ie: ADJ_SETOFFSET), rather
then slewing the offset in (ADJ_OFFSET, or ADJ_OFFSET_SINGLESHOT).

This avoids the calc offset, gettime(&now), settime(now+offset) method
where any latency between the gettime and settime adds to the error.

thanks
-john




thanks
-john
john stultz - Aug. 27, 2010, 10:30 p.m.
On Fri, 2010-08-27 at 14:38 +0200, Richard Cochran wrote:
> On Mon, Aug 23, 2010 at 01:08:45PM -0700, john stultz wrote:
> > On Thu, 2010-08-19 at 07:55 +0200, Richard Cochran wrote:
> > > The clockid_t CLOCK_PTP will be arch-neutral.
> > 
> > Sure, but are they conceptually neutral? There are other clock
> > synchronization algorithms out there. Will they need their own
> > similar-but-different clock_ids?
> > 
> > Look at the other clock ids and what the represent:
> 
> IMHO, the presently offered clock ids are a mixed bag...
> 
> > CLOCK_REALTIME : Wall time (possibly freq/offset corrected)
> > CLOCK_MONOTONIC: Monotonic time (possibly freq corrected).
> > CLOCK_PROCESS_CPUTIME_ID: Process cpu time.
> > CLOCK_THREAD_CPUTIME_ID: Thread cpu time.
> 
> The amount of time a thread has been granted by the kernel is really
> not connected to the real passage of time, at least not in a direct
> way.
> 
> > CLOCK_MONOTONIC_RAW: Non freq corrected monotonic time.
> 
> This one comes from commit 2d42244ae71d6c7b0884b5664cf2eda30fb2ae68
> and is surely a special case, unrelated to the other clock ids. The
> commit message mentions that this was added to help the btime.sf.net
> project. That project does not seem to have had any activity since
> 2007. If we can justify adding a clock id in this case, surely we can
> add one for PTP as well!

I'm not opposed to adding a clock id, I just want it to be generic
enough to make sense. 

So while MONOTONIC_RAW it was spurred into being from btime, it isn't a
CLOCK_BTIME interface. 

I've also been pushing the RADClock folks to use it since it avoids the
ugly raw hardware access they want to get access to a constant freq
counter. In addition, I've used it to monitor freq adjustments done by
NTP.


> > CLOCK_REALTIME_COARSE: Tick granular wall time (filesystem timestamp)
> > CLOCK_MONOTONIC_COARSE: Tick granular monotonic time.
> 
> These were added in commit da15cfdae03351c689736f8d142618592e3cebc3
> in order to fulfill needs of special applications.

Right, but what they represent and how its different from CLOCK_REALTIME
is easy to describe in a generic way.


> > CLOCK_PTP that you're proposing doesn't seem to be at the same level of
> > abstraction. I'm not saying that this isn't the right place for it, but
> > can we take a step back from PTP and consider what your exposing in more
> > generic terms. In other words, could someone use the same
> > packet-timestamping hardware to implement a different non-PTP time
> > synchronization algorithm?
> 
> Yes, of course. There is nothing at all in the patch set about the PTP
> protocol itself. It just lets you access the hardware. In short:
> 
> 1. SO_TIMESTAMPING delivers timestamped packets
> 2. the PTP API lets you tune the clock.
> 
> "That's all, folks."

Right. So CLOCK_SO_TIMESTAMP is maybe a better description? And isn't it
just an adjustment interface, not a PTP API to tune the clock?


> > Further, if we're using PTP to synchoronize the system time, then there
> > shouldn't be any measurable difference between CLOCK_PTP and
> > CLOCK_REALTIME, no?
> 
> When using software timestamping, then the clocks are one in the same.

So this is the duplication I don't like. If the API represents CLOCK_PTP
or whatever as something different from CLOCK_REALTIME, there shouldn't
be a mode where they're the same. That would just cause confusion.
Instead CLOCK_PTP calls just return -EINVAL and the PTPd application
should use CLOCK_REALTIME.


> When using PTP, with the PPS hook to synchoronize the Linux system
> time, the clocks will be a close as the servo algorithm provides. I
> have not measured this yet, but it cannot be much different than using
> any other PPS source.
> 
> > > SYSCALL_DEFINE3(clock_adjtime, const clockid_t, clkid,
> > > 		int, ppb, struct timespec __user *, ts)
> > > 
> > > ppb - desired frequency adjustment in parts per billion
> > > ts  - desired time step (or jump) in <sec,nsec> to correct
> > >       a measured offset
> > > 
> > > Arguably, this syscall might be useful for other clocks, too.
> > 
> > So yea, obviously the syscall should not be CLOCK_PTP specific, so we
> > would want it to be usable against CLOCK_REALTIME.
> > 
> > That said, the clock_adjtime your proposing does not seem to be
> > sufficient for usage by NTPd. So this suggests that it is not generic
> > enough.
> 
> I don't think we need to support ntpd. It already has adjtimex, and it
> won't get any better by using another interface.

I strongly disagree. If the new interface isn't generic enough that NTPd
couldn't use it to adjust CLOCK_REALTIME, then its too specific to only
your needs.

There will be other clock sync algorithms down the road, so if we have
to expose this sort of timestamping hardware to userland, we need to
allow those other sync methods to use the API you're providing, so if
you're API isn't a superset of the existing adjustment needs, then its
already missing something.

I'm not saying we have to implement the kernel PLL adjustment and every
other adjustment mode for every CLOCK_ID right now, but we should be
able to extend things in the future.


> > > In contrast, sysfs attributes will fit the need nicely:
> > > 
> > > 1. enable or disable pps
> > > 2. enable or disable external timestamps
> > > 3. read out external timestamp
> > > 4. configure period for periodic output
> > 
> > Things to consider here:
> > Do having these options really make sense? 
> 
> Yes, since they represent the PTP clock's hardware features. As I
> explained previously, if you don't have any hardware interfaces, then
> having your clocks synchoronized to under 100 nanoseconds does not
> help you more than having them to within 1 microsecond.
> 
> > Why would we want pps disabled?
> 
> If you are a master clock, then you want to take your PPS from an
> external time source, like GPS.  If you leave the PTP PPS events on,
> then they will occur close in time to the GPS PPS events and may add
> unwanted latency to the interrupt handler.
> 
> > And if that does make sense, would it
> > be better to do so via the existing pps interface instead of adding a
> > new ptp specific one? 
> 
> We have not introduced new PPS interface. We use existing PPS subsystem.

Doesn't the pps subsystem have its own way to control the pps signal
interrupt? I'm not totally sure here, but given your point above that
having multiple pps events it seems like they should be selectable. It
seems something that we'd want to control via the global pps interface,
rather then having a pps-enable flag on every random bit of hardware
that can support it.


> > Same for the timestamps and periodic output (ie: and how do they differ
> > from reading or setting a timer on CLOCK_PTP?)
> 
> The posix timer calls won't work:
> 
> I have a PTP hardware clocks with multiple external timestamp
> channels. Using timer_gettime, how can I specify (or decode) the
> channel of interest to me?

I guess I'm not following you here. Again, I'm not super familiar with
the hardware involved. Could you clarify a bit more? What do you mean by
external timestamp? Is this what is used on the packet timestamping? 

> > What the kernel needs to provide are ways to address #1 and #2 above,
> > but what the kernel needs to expose to userland should be minimal and
> > generic.
> 
> My point was this:
> 
> The application requires that the soundcard DA clocks (*not* the CPU
> clocks) be synchronized. Currently the Linux kernel offers no way at
> all to do this at all.


So yea.. I think this gets to the main point of contention.

On one side, there is the view that the kernel should abstract and hide
the hardware details to increase app portability. And the other is that
the raw hardware details should be exposed, so the OS stays out of the
way.

Neither of these views are always right. Ideally we can come up with a
abstracted way to provide the hardware data needed that's generic enough
that applications don't have to be changed when moving between hardware
or configurations.

The posix clock id interface is frustrating because the flat static
enumeration is really limiting.

I wonder if a dynamic enumeration for posix clocks would be possibly a
way to go?

In other words, a driver registers a clock with the system, and the
system reserves a clock_id for it from the designated available pool and
assigns it back. Then userland could query the driver via something like
sysfs to get the clockid for the hardware. 

Would that maybe work? 


> > 2) You've mentioned multiple PTP hardware clocks are possible, but maybe
> > not practically useful. How does PTPd enumerate the existing clocks, and
> > know which devices to listen to for Sync/Delay_Resp messages?
> > 
> > The issue I'm trying to address here is the interface inconsistency
> > between the message timestamping interface (ie: likely from a packet,
> > possibly multiple sources) and the proposed CLOCK_PTP interface (with
> > only a single clock being exposed at a time, and that being controlled
> > by a sysfs interface).
> 
> The sysfs will include one class device for each PTP clock. Each clock
> has a sysfs attribute with the corresponding clock id.

Do you mean the clock_id # in the posix clocks namespace?


> For the network timestamps, they need to be enabled using the
> SIOCSHWTSTAMP ioctl. We can easily extend that ioctl to include the
> desired clock id.
> 
> > My concerns: 
> > 1) Again, I'm not totally comfortable exposing the PTP hardware via the
> > posix-clocks/timers interface. I'm not dead set against it, but it just
> > doesn't seem right as a top-level abstraction.
> 
> I would also be happy with the character device idea already
> posted. Just pick one of the two, and I'll resubmit the patch set...

Personally, with regard to exposing the ptp clock to userland I'm more
comfortable via the chardev. However, the posix clocks/timer api is
really close to what you need, so I'm not totally set against it. And
maybe the dynamic enumeration would resolve my main problems with it?

That said, with the chardev method, I don't like the idea of duplicating
the existing time apis via a systemtime device. Dropping that from your
earlier patch would ease the majority of my objections.

> > I'm curious if its possible to do the PTP hardware offset/adjustment
> > calculation in a module internally to the kernel? That would allow the
> > PPS interface to still be used to sync the system time, and not expose
> > additional interfaces.
> 
> It would be possible, but not too nice, IMHO. In contrast to NTP,
> there is no real need to place the servo in the kernel. Having the
> protocol code and servo in user space makes life much easier.

For you, yes, but from an API maintenance view its just adding to the
list of interfaces we have to preserve forever. :)


> > 2) If this is a top-level interface, I'd prefer the inconsistency
> > between how the timestamped messages are received and the proposed
> > posix_clocks/timer interface be clarified. 
> > 
> > For example: does the networking stack need to have the source clock_id
> > to use for SO_TIMESTAMPing be specified?
> 
> We could do it that way. I never liked the <SW,SYS,RAW> tuple in the
> SO_TIMESTAMPING control message in the first place. Sending three
> fields with two blank seems wasteful. Instead, <clockid,timestamp>
> would be sufficient. Maybe that it is too late to change this.
> 
> Even without altering the timestamp, we can simply augment
> SIOCSHWTSTAMP with a clock id.
> 
> At this point I would just like to go forward with one of the two
> proposed APIs. I had modelled the character device on the posix clock
> calls in order to make it immediately familar, and I think it is a
> viable approach. After the lkml discussion, I think it is even cleaner
> and nicer to just offer a new clock id.
> 
> Thanks for all the feedback and comments,

Thanks for the clarifications and putting up with my questions!
-john
Christian Riesch - Aug. 29, 2010, 1:32 p.m.
Alan Cox wrote:
>> The master node in a PTP network probably takes its time from a
>> precise external time source, like GPS. The GPS provides a 1 PPS
>> directly to the PTP clock hardware, which latches the PTP hardware
>> clock time on the PPS edge. This provides one sample as input to a
>> clock servo (in the PTPd) that, in turn, regulates the PTP clock
>> hardware.
> 
> A PTP clock is TAI, Unix time is UTC.

Not necessarily. AFAIK, the time distributed by IEEE1588v2 can either be 
based on the "PTP epoch" (timePropertiesDS.ptpTimescale=TRUE) and thus 
represent TAI or be based on an implementation specific arbitrary epoch 
(timePropertiesDS.ptpTimescale=FALSE) and represent time on some 
arbitrary time scale.

Christian
Richard Cochran - Sept. 6, 2010, 6:33 a.m.
On Fri, Aug 27, 2010 at 03:30:39PM -0700, John Stultz wrote:
> On Fri, 2010-08-27 at 14:38 +0200, Richard Cochran wrote:
> > We have not introduced new PPS interface. We use existing PPS subsystem.
> 
> Doesn't the pps subsystem have its own way to control the pps signal
> interrupt? I'm not totally sure here, but given your point above that
> having multiple pps events it seems like they should be selectable. It
> seems something that we'd want to control via the global pps interface,
> rather then having a pps-enable flag on every random bit of hardware
> that can support it.

The PPS subsystem offers no way to disable PPS interrupts. 

> > > Same for the timestamps and periodic output (ie: and how do they differ
> > > from reading or setting a timer on CLOCK_PTP?)
> > 
> > The posix timer calls won't work:
> > 
> > I have a PTP hardware clocks with multiple external timestamp
> > channels. Using timer_gettime, how can I specify (or decode) the
> > channel of interest to me?
> 
> I guess I'm not following you here. Again, I'm not super familiar with
> the hardware involved. Could you clarify a bit more? What do you mean by
> external timestamp? Is this what is used on the packet timestamping? 

No, the packet timestamp occurs in the PHY, MAC, or on the MII bus and
is an essential feature to support the PTP.

An external timestamp is just a wire going into the clock and is an
optional feature to make the clock more useful. The clock can latch
the current time value when an edge is dectected on the wire. Using
external timestamps, you correlate real world events with the absolute
time in the clock.

Typically, a clock offers two or more such input channels (wires), but
timer_gettime does not offer a way to differentiate between them, and
thus is not suitable.

> The posix clock id interface is frustrating because the flat static
> enumeration is really limiting.
> 
> I wonder if a dynamic enumeration for posix clocks would be possibly a
> way to go?

I am perfectly happy with this.

> In other words, a driver registers a clock with the system, and the
> system reserves a clock_id for it from the designated available pool and
> assigns it back. Then userland could query the driver via something like
> sysfs to get the clockid for the hardware. 
> 
> Would that maybe work? 

I have now posted a sample implementation of this idea. Do you like it?

> > The sysfs will include one class device for each PTP clock. Each clock
> > has a sysfs attribute with the corresponding clock id.
> 
> Do you mean the clock_id # in the posix clocks namespace?

Yes.

> > I would also be happy with the character device idea already
> > posted. Just pick one of the two, and I'll resubmit the patch set...
> 
> Personally, with regard to exposing the ptp clock to userland I'm more
> comfortable via the chardev. However, the posix clocks/timer api is
> really close to what you need, so I'm not totally set against it. And
> maybe the dynamic enumeration would resolve my main problems with it?

Okay, I have posted a draft of the dynamic idea. Can you support it?

> That said, with the chardev method, I don't like the idea of duplicating
> the existing time apis via a systemtime device. Dropping that from your
> earlier patch would ease the majority of my objections.

Well, the clock interface needs to offer basic services:

1. Set time
2. Get time
3. Jump offset
4. Adjust frequency

This is similar to what the posix clock and ntp API offer. Using a
chardev, should I make the ioctls really different, just for the
purpose of being different?

To me, it makes more sense to offer a familiar interface.

I was perfectly happy with the chardev idea. In fact, that is the way
I first implemented it. Now, I have also gone ahead and implemented
the dynamic posix clock idea, too.

> > At this point I would just like to go forward with one of the two
> > proposed APIs. I had modelled the character device on the posix clock
> > calls in order to make it immediately familar, and I think it is a
> > viable approach. After the lkml discussion, I think it is even cleaner
> > and nicer to just offer a new clock id.

I would like to repeat the sentiment in this last paragraph! I already
implemented and would be content with either form for the new clock
control API:

1. Character device
2. POSIX clock with dynamic ids

Please, just take your pick ;^)

Thanks,
Richard
Stephan Gatzka - Sept. 21, 2010, 4:54 p.m.
Hi Richard,

> 1. Character device
> 2. POSIX clock with dynamic ids
>
> Please, just take your pick ;^)

Just to reopen this nearly dead but very interesting thread: I'm happy 
with every solution that has no impact on the accuracy of the PTP clock.

If I can choose, I would prefer the POSIX clock with dynamic ids solution.

Kind regards and thanks for your effort,

Stephan Gatzka
Kyle Moffett - Sept. 21, 2010, 8:47 p.m.
On Fri, Aug 27, 2010 at 18:30, John Stultz <johnstul@us.ibm.com> wrote:
> On one side, there is the view that the kernel should abstract and hide
> the hardware details to increase app portability. And the other is that
> the raw hardware details should be exposed, so the OS stays out of the
> way.
>
> Neither of these views are always right. Ideally we can come up with a
> abstracted way to provide the hardware data needed that's generic enough
> that applications don't have to be changed when moving between hardware
> or configurations.
>
> The posix clock id interface is frustrating because the flat static
> enumeration is really limiting.
>
> I wonder if a dynamic enumeration for posix clocks would be possibly a
> way to go?

Well how about something much more straightforward:

#define CLOCK_FD 0x80000000
fd = open("/dev/clocks/some_clock_name", O_RDONLY);
clock_gettime(CLOCK_FD | fd, &ts);

This would provide all of the standard character-device semantics that
everyone is accustomed to, and furthermore you could allow non-root
users to manage specific clocks using just DAC permission bits.

You'd still probably need to add a clock_adjtimex() syscall, but it
could easily fail with EINVAL for software or per-thread clocks, but
that seems like it would nicely abstract the hardware without forcing
a numeric address space.

Cheers,
Kyle Moffett
Richard Cochran - Sept. 22, 2010, 10:14 a.m.
On Tue, Sep 21, 2010 at 04:47:45PM -0400, Kyle Moffett wrote:
> Well how about something much more straightforward:

I am about to post another patch set for discussion, so please comment
on it when it appears.

Thanks,
Richard

Patch

diff --git a/Documentation/ptp/ptp.txt b/Documentation/ptp/ptp.txt
new file mode 100644
index 0000000..46858b3
--- /dev/null
+++ b/Documentation/ptp/ptp.txt
@@ -0,0 +1,95 @@ 
+
+* PTP infrastructure for Linux
+
+  This patch set introduces support for IEEE 1588 PTP clocks in
+  Linux. Together with the SO_TIMESTAMPING socket options, this
+  presents a standardized method for developing PTP user space
+  programs, synchronizing Linux with external clocks, and using the
+  ancillary features of PTP hardware clocks.
+
+  A new class driver exports a kernel interface for specific clock
+  drivers and a user space interface. The infrastructure supports a
+  complete set of PTP functionality.
+
+  + Basic clock operations
+    - Set time
+    - Get time
+    - Shift the clock by a given offset atomically
+    - Adjust clock frequency
+
+  + Ancillary clock features
+    - One short or periodic alarms, with signal delivery to user program
+    - Time stamp external events
+    - Period output signals configurable from user space
+    - Synchronization of the Linux system time via the PPS subsystem
+
+** PTP kernel API
+
+   A PTP clock driver registers itself with the class driver. The
+   class driver handles all of the dealings with user space. The
+   author of a clock driver need only implement the details of
+   programming the clock hardware. The clock driver notifies the class
+   driver of asynchronous events (alarms and external time stamps) via
+   a simple message passing interface.
+
+   The class driver supports multiple PTP clock drivers. In normal use
+   cases, only one PTP clock is needed. However, for testing and
+   development, it can be useful to have more than one clock in a
+   single system, in order to allow performance comparisons.
+
+** PTP user space API
+
+   The class driver creates a character device for each registered PTP
+   clock. User space programs may control the clock using standardized
+   ioctls. A program may query, enable, configure, and disable the
+   ancillary clock features. User space can receive time stamped
+   events via blocking read() and poll(). One shot and periodic
+   signals may be configured via an ioctl API with semantics similar
+   to the POSIX timer_settime() system call.
+
+   As an real life example, the following two patches for ptpd version
+   1.0.0 demonstrate how the API works.
+
+   https://sourceforge.net/tracker/?func=detail&aid=2992845&group_id=139814&atid=744634
+
+   https://sourceforge.net/tracker/?func=detail&aid=2992847&group_id=139814&atid=744634
+
+** Writing clock drivers
+
+   Clock drivers include include/linux/ptp_clock_kernel.h and register
+   themselves by presenting a 'struct ptp_clock_info' to the
+   registration method. Clock drivers must implement all of the
+   functions in the interface. If a clock does not offer a particular
+   ancillary feature, then the driver should just return -EOPNOTSUPP
+   from those functions.
+
+   Drivers must ensure that all of the methods in interface are
+   reentrant. Since most hardware implementations treat the time value
+   as a 64 bit integer accessed as two 32 bit registers, drivers
+   should use spin_lock_irqsave/spin_unlock_irqrestore to protect
+   against concurrent access. This locking cannot be accomplished in
+   class driver, since the lock may also be needed by the clock
+   driver's interrupt service routine.
+
+** Supported hardware
+
+   + Standard Linux system timer
+     - No special PTP features
+     - For use with software time stamping
+
+   + Freescale eTSEC gianfar
+     - 2 Time stamp external triggers, programmable polarity (opt. interrupt)
+     - 2 Alarm registers (optional interrupt)
+     - 3 Periodic signals (optional interrupt)
+
+   + National DP83640
+     - 6 GPIOs programmable as inputs or outputs
+     - 6 GPIOs with dedicated functions (LED/JTAG/clock) can also be
+       used as general inputs or outputs
+     - GPIO inputs can time stamp external triggers
+     - GPIO outputs can produce periodic signals
+     - 1 interrupt pin
+
+   + Intel IXP465
+     - Auxiliary Slave/Master Mode Snapshot (optional interrupt)
+     - Target Time (optional interrupt)
diff --git a/Documentation/ptp/testptp.c b/Documentation/ptp/testptp.c
new file mode 100644
index 0000000..41a9839
--- /dev/null
+++ b/Documentation/ptp/testptp.c
@@ -0,0 +1,306 @@ 
+/*
+ * PTP 1588 clock support - User space test program
+ *
+ * Copyright (C) 2010 OMICRON electronics GmbH
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, write to the Free Software
+ *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+#include <errno.h>
+#include <fcntl.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <time.h>
+#include <unistd.h>
+
+#include <linux/ptp_clock.h>
+
+static void handle_alarm(int s)
+{
+	printf("received signal %d\n", s);
+}
+
+static int install_handler(int signum, void (*handler)(int))
+{
+	struct sigaction action;
+	sigset_t mask;
+
+	/* Unblock the signal. */
+	sigemptyset(&mask);
+	sigaddset(&mask, signum);
+	sigprocmask(SIG_UNBLOCK, &mask, NULL);
+
+	/* Install the signal handler. */
+	action.sa_handler = handler;
+	action.sa_flags = 0;
+	sigemptyset(&action.sa_mask);
+	sigaction(signum, &action, NULL);
+
+	return 0;
+}
+
+static void usage(char *progname)
+{
+	fprintf(stderr,
+		"usage: %s [options]\n"
+		" -a val     request a one-shot alarm after 'val' seconds\n"
+		" -A val     request a periodic alarm every 'val' seconds\n"
+		" -c         query the ptp clock's capabilities\n"
+		" -d name    device to open\n"
+		" -e val     read 'val' external time stamp events\n"
+		" -f val     adjust the ptp clock frequency by 'val' PPB\n"
+		" -g         get the ptp clock time\n"
+		" -h         prints this message\n"
+		" -p val     enable output with a period of 'val' nanoseconds\n"
+		" -P val     enable or disable (val=1|0) the system clock PPS\n"
+		" -s         set the ptp clock time from the system time\n"
+		" -t val     shift the ptp clock time by 'val' seconds\n"
+		" -v         query the ptp clock api version\n",
+		progname);
+}
+
+int main(int argc, char *argv[])
+{
+	struct ptp_clock_caps caps;
+	struct ptp_clock_timer timer;
+	struct ptp_extts_event event;
+	struct ptp_clock_request request;
+	struct timespec ts;
+	char *progname;
+	int c, cnt, fd, val = 0;
+
+	char *device = "/dev/ptp_clock_0";
+	int adjfreq = 0x7fffffff;
+	int adjtime = 0;
+	int capabilities = 0;
+	int extts = 0;
+	int gettime = 0;
+	int oneshot = 0;
+	int periodic = 0;
+	int perout = -1;
+	int pps = -1;
+	int settime = 0;
+	int version = 0;
+
+	progname = strrchr(argv[0], '/');
+	progname = progname ? 1+progname : argv[0];
+	while (EOF != (c = getopt(argc, argv, "a:A:cd:e:f:ghp:P:st:v"))) {
+		switch (c) {
+		case 'a':
+			oneshot = atoi(optarg);
+			break;
+		case 'A':
+			periodic = atoi(optarg);
+			break;
+		case 'c':
+			capabilities = 1;
+			break;
+		case 'd':
+			device = optarg;
+			break;
+		case 'e':
+			extts = atoi(optarg);
+			break;
+		case 'f':
+			adjfreq = atoi(optarg);
+			break;
+		case 'g':
+			gettime = 1;
+			break;
+		case 'p':
+			perout = atoi(optarg);
+			break;
+		case 'P':
+			pps = atoi(optarg);
+			break;
+		case 's':
+			settime = 1;
+			break;
+		case 't':
+			adjtime = atoi(optarg);
+			break;
+		case 'v':
+			version = 1;
+			break;
+		case 'h':
+			usage(progname);
+			return 0;
+		case '?':
+		default:
+			usage(progname);
+			return -1;
+		}
+	}
+
+	fd = open(device, O_RDWR);
+	if (fd < 0) {
+		fprintf(stderr, "cannot open %s: %s", device, strerror(errno));
+		return -1;
+	}
+
+	if (version) {
+		if (ioctl(fd, PTP_CLOCK_APIVERS, &val)) {
+			perror("PTP_CLOCK_APIVERS");
+		} else {
+			printf("version = 0x%08x\n", val);
+		}
+	}
+
+	if (capabilities) {
+		if (ioctl(fd, PTP_CLOCK_GETCAPS, &caps)) {
+			perror("PTP_CLOCK_GETCAPS");
+		} else {
+			printf("capabilities:\n"
+			       "  %d maximum frequency adjustment (PPB)\n"
+			       "  %d programmable alarms\n"
+			       "  %d external time stamp channels\n"
+			       "  %d programmable periodic signals\n"
+			       "  %d pulse per second\n",
+			       caps.max_adj,
+			       caps.n_alarm,
+			       caps.n_ext_ts,
+			       caps.n_per_out,
+			       caps.pps);
+		}
+	}
+
+	if (0x7fffffff != adjfreq) {
+		if (ioctl(fd, PTP_CLOCK_ADJFREQ, adjfreq)) {
+			perror("PTP_CLOCK_ADJFREQ");
+		} else {
+			puts("frequency adjustment okay");
+		}
+	}
+
+	if (adjtime) {
+		ts.tv_sec = adjtime;
+		ts.tv_nsec = 0;
+		if (ioctl(fd, PTP_CLOCK_ADJTIME, &ts)) {
+			perror("PTP_CLOCK_ADJTIME");
+		} else {
+			puts("time shift okay");
+		}
+	}
+
+	if (gettime) {
+		if (ioctl(fd, PTP_CLOCK_GETTIME, &ts)) {
+			perror("PTP_CLOCK_GETTIME");
+		} else {
+			printf("clock time: %ld.%09ld or %s",
+			       ts.tv_sec, ts.tv_nsec, ctime(&ts.tv_sec));
+		}
+	}
+
+	if (settime) {
+		clock_gettime(CLOCK_REALTIME, &ts);
+		if (ioctl(fd, PTP_CLOCK_SETTIME, &ts)) {
+			perror("PTP_CLOCK_SETTIME");
+		} else {
+			puts("set time okay");
+		}
+	}
+
+	if (extts) {
+		memset(&request, 0, sizeof(request));
+		request.type = PTP_REQUEST_EXTTS;
+		request.index = 0;
+		request.flags = PTP_ENABLE_FEATURE;
+		if (ioctl(fd, PTP_FEATURE_REQUEST, &request)) {
+			perror("PTP_FEATURE_REQUEST");
+			extts = 0;
+		} else {
+			puts("external time stamp request okay");
+		}
+		for (; extts; extts--) {
+			cnt = read(fd, &event, sizeof(event));
+			if (cnt != sizeof(event)) {
+				perror("read");
+				break;
+			}
+			printf("event index %d at %ld.%09ld\n", event.index,
+			       event.ts.tv_sec, event.ts.tv_nsec);
+			fflush(stdout);
+		}
+		/* Disable the feature again. */
+		request.flags = 0;
+		if (ioctl(fd, PTP_FEATURE_REQUEST, &request)) {
+			perror("PTP_FEATURE_REQUEST");
+		}
+	}
+
+	if (oneshot) {
+		install_handler(SIGALRM, handle_alarm);
+		memset(&timer, 0, sizeof(timer));
+		timer.signum = SIGALRM;
+		timer.tsp.it_value.tv_sec = oneshot;
+		if (ioctl(fd, PTP_CLOCK_SETTIMER, &timer)) {
+			perror("PTP_CLOCK_SETTIMER");
+		} else {
+			puts("set timer okay");
+		}
+		pause();
+	}
+
+	if (periodic) {
+		install_handler(SIGALRM, handle_alarm);
+		memset(&timer, 0, sizeof(timer));
+		timer.signum = SIGALRM;
+		timer.tsp.it_value.tv_sec = periodic;
+		timer.tsp.it_interval.tv_sec = periodic;
+		if (ioctl(fd, PTP_CLOCK_SETTIMER, &timer)) {
+			perror("PTP_CLOCK_SETTIMER");
+		} else {
+			puts("set timer okay");
+		}
+		while (1) {
+			pause();
+		}
+	}
+
+	if (perout >= 0) {
+		memset(&request, 0, sizeof(request));
+		request.type = PTP_REQUEST_PEROUT;
+		request.index = 0;
+		request.ts.tv_sec = 0;
+		request.ts.tv_nsec = perout;
+		request.flags = perout ? PTP_ENABLE_FEATURE : 0;
+		if (ioctl(fd, PTP_FEATURE_REQUEST, &request)) {
+			perror("PTP_FEATURE_REQUEST");
+			extts = 0;
+		} else {
+			puts("periodic output request okay");
+		}
+	}
+
+	if (pps != -1) {
+		memset(&request, 0, sizeof(request));
+		request.type = PTP_REQUEST_PPS;
+		request.flags = perout ? PTP_ENABLE_FEATURE : 0;
+		if (ioctl(fd, PTP_FEATURE_REQUEST, &request)) {
+			perror("PTP_FEATURE_REQUEST");
+		} else {
+			puts("pps for system time request okay");
+		}
+	}
+
+	close(fd);
+	return 0;
+}
diff --git a/Documentation/ptp/testptp.mk b/Documentation/ptp/testptp.mk
new file mode 100644
index 0000000..4ef2d97
--- /dev/null
+++ b/Documentation/ptp/testptp.mk
@@ -0,0 +1,33 @@ 
+# PTP 1588 clock support - User space test program
+#
+# Copyright (C) 2010 OMICRON electronics GmbH
+#
+#  This program is free software; you can redistribute it and/or modify
+#  it under the terms of the GNU General Public License as published by
+#  the Free Software Foundation; either version 2 of the License, or
+#  (at your option) any later version.
+#
+#  This program is distributed in the hope that it will be useful,
+#  but WITHOUT ANY WARRANTY; without even the implied warranty of
+#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+#  GNU General Public License for more details.
+#
+#  You should have received a copy of the GNU General Public License
+#  along with this program; if not, write to the Free Software
+#  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+
+CC        = $(CROSS_COMPILE)gcc
+INC       = -I$(KBUILD_OUTPUT)/usr/include
+CFLAGS    = -Wall $(INC)
+LDLIBS    = -lrt
+PROGS     = testptp
+
+all: $(PROGS)
+
+testptp: testptp.o
+
+clean:
+	rm -f testptp.o
+
+distclean: clean
+	rm -f $(PROGS)
diff --git a/drivers/Kconfig b/drivers/Kconfig
index a2b902f..774fbd7 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -52,6 +52,8 @@  source "drivers/spi/Kconfig"
 
 source "drivers/pps/Kconfig"
 
+source "drivers/ptp/Kconfig"
+
 source "drivers/gpio/Kconfig"
 
 source "drivers/w1/Kconfig"
diff --git a/drivers/Makefile b/drivers/Makefile
index 91874e0..6d12b48 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -76,6 +76,7 @@  obj-$(CONFIG_I2O)		+= message/
 obj-$(CONFIG_RTC_LIB)		+= rtc/
 obj-y				+= i2c/ media/
 obj-$(CONFIG_PPS)		+= pps/
+obj-$(CONFIG_PTP_1588_CLOCK)	+= ptp/
 obj-$(CONFIG_W1)		+= w1/
 obj-$(CONFIG_POWER_SUPPLY)	+= power/
 obj-$(CONFIG_HWMON)		+= hwmon/
diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig
new file mode 100644
index 0000000..cd7becb
--- /dev/null
+++ b/drivers/ptp/Kconfig
@@ -0,0 +1,27 @@ 
+#
+# PTP clock support configuration
+#
+
+menu "PTP clock support"
+
+config PTP_1588_CLOCK
+	tristate "PTP clock support"
+	depends on EXPERIMENTAL
+	depends on PPS
+	help
+	  The IEEE 1588 standard defines a method to precisely
+	  synchronize distributed clocks over Ethernet networks. The
+	  standard defines a Precision Time Protocol (PTP), which can
+	  be used to achieve synchronization within a few dozen
+	  microseconds. In addition, with the help of special hardware
+	  time stamping units, it can be possible to achieve
+	  synchronization to within a few hundred nanoseconds.
+
+	  This driver adds support for PTP clocks as character
+	  devices. If you want to use a PTP clock, then you should
+	  also enable at least one clock driver as well.
+
+	  To compile this driver as a module, choose M here: the module
+	  will be called ptp_clock.
+
+endmenu
diff --git a/drivers/ptp/Makefile b/drivers/ptp/Makefile
new file mode 100644
index 0000000..b86695c
--- /dev/null
+++ b/drivers/ptp/Makefile
@@ -0,0 +1,5 @@ 
+#
+# Makefile for PTP 1588 clock support.
+#
+
+obj-$(CONFIG_PTP_1588_CLOCK)		+= ptp_clock.o
diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c
new file mode 100644
index 0000000..1ebb4b0
--- /dev/null
+++ b/drivers/ptp/ptp_clock.c
@@ -0,0 +1,514 @@ 
+/*
+ * PTP 1588 clock support
+ *
+ * Partially adapted from the Linux PPS driver.
+ *
+ * Copyright (C) 2010 OMICRON electronics GmbH
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, write to the Free Software
+ *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+#include <linux/bitops.h>
+#include <linux/cdev.h>
+#include <linux/device.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/poll.h>
+#include <linux/pps_kernel.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+
+#include <linux/ptp_clock_kernel.h>
+#include <linux/ptp_clock.h>
+
+#define PTP_MAX_ALARMS 4
+#define PTP_MAX_CLOCKS BITS_PER_LONG
+#define PTP_MAX_TIMESTAMPS 128
+#define PTP_PPS_DEFAULTS (PPS_CAPTUREASSERT | PPS_OFFSETASSERT)
+#define PTP_PPS_EVENT PPS_CAPTUREASSERT
+#define PTP_PPS_MODE (PTP_PPS_DEFAULTS | PPS_CANWAIT | PPS_TSFMT_TSPEC)
+
+struct alarm {
+	struct pid *pid;
+	int sig;
+};
+
+struct timestamp_event_queue {
+	struct ptp_extts_event buf[PTP_MAX_TIMESTAMPS];
+	int head;
+	int tail;
+	int overflow;
+};
+
+struct ptp_clock {
+	struct list_head list;
+	struct cdev cdev;
+	struct device *dev;
+	struct ptp_clock_info *info;
+	dev_t devid;
+	int index; /* index into clocks.map, also the minor number */
+	int pps_source;
+
+	struct alarm alarm[PTP_MAX_ALARMS];
+	struct mutex alarm_mux; /* one process at a time setting an alarm */
+
+	struct timestamp_event_queue tsevq; /* simple fifo for time stamps */
+	struct mutex tsevq_mux; /* one process at a time reading the fifo */
+	wait_queue_head_t tsev_wq;
+};
+
+/* private globals */
+
+static const struct file_operations ptp_fops;
+static dev_t ptp_devt;
+static struct class *ptp_class;
+
+static struct {
+	struct list_head list;
+	DECLARE_BITMAP(map, PTP_MAX_CLOCKS);
+} clocks;
+static DEFINE_MUTEX(clocks_mux); /* protects 'clocks' */
+
+/* time stamp event queue operations */
+
+static inline int queue_cnt(struct timestamp_event_queue *q)
+{
+	int cnt = q->tail - q->head;
+	return cnt < 0 ? PTP_MAX_TIMESTAMPS + cnt : cnt;
+}
+
+static inline int queue_free(struct timestamp_event_queue *q)
+{
+	return PTP_MAX_TIMESTAMPS - queue_cnt(q) - 1;
+}
+
+static void enqueue_external_timestamp(struct timestamp_event_queue *queue,
+				       struct ptp_clock_event *src)
+{
+	struct ptp_extts_event *dst;
+	u32 remainder;
+
+	dst = &queue->buf[queue->tail];
+
+	dst->index = src->index;
+	dst->ts.tv_sec = div_u64_rem(src->timestamp, 1000000000, &remainder);
+	dst->ts.tv_nsec = remainder;
+
+	if (!queue_free(queue))
+		queue->overflow++;
+
+	queue->tail = (queue->tail + 1) % PTP_MAX_TIMESTAMPS;
+}
+
+/* public interface */
+
+struct ptp_clock *ptp_clock_register(struct ptp_clock_info *info)
+{
+	struct ptp_clock *ptp;
+	int err = 0, index, major = MAJOR(ptp_devt);
+
+	if (info->n_alarm > PTP_MAX_ALARMS)
+		return ERR_PTR(-EINVAL);
+
+	/* Find a free clock slot and reserve it. */
+	err = -EBUSY;
+	mutex_lock(&clocks_mux);
+	index = find_first_zero_bit(clocks.map, PTP_MAX_CLOCKS);
+	if (index < PTP_MAX_CLOCKS)
+		set_bit(index, clocks.map);
+	else
+		goto no_clock;
+
+	/* Initialize a clock structure. */
+	err = -ENOMEM;
+	ptp = kzalloc(sizeof(struct ptp_clock), GFP_KERNEL);
+	if (ptp == NULL)
+		goto no_memory;
+
+	ptp->info = info;
+	ptp->devid = MKDEV(major, index);
+	ptp->index = index;
+	mutex_init(&ptp->alarm_mux);
+	mutex_init(&ptp->tsevq_mux);
+	init_waitqueue_head(&ptp->tsev_wq);
+
+	/* Create a new device in our class. */
+	ptp->dev = device_create(ptp_class, NULL, ptp->devid, ptp,
+				 "ptp_clock_%d", ptp->index);
+	if (IS_ERR(ptp->dev))
+		goto no_device;
+
+	dev_set_drvdata(ptp->dev, ptp);
+
+	/* Register a character device. */
+	cdev_init(&ptp->cdev, &ptp_fops);
+	ptp->cdev.owner = info->owner;
+	err = cdev_add(&ptp->cdev, ptp->devid, 1);
+	if (err)
+		goto no_cdev;
+
+	/* Register a new PPS source. */
+	if (info->pps) {
+		struct pps_source_info pps;
+		memset(&pps, 0, sizeof(pps));
+		snprintf(pps.name, PPS_MAX_NAME_LEN, "ptp%d", index);
+		pps.mode = PTP_PPS_MODE;
+		pps.owner = info->owner;
+		pps.dev = ptp->dev;
+		err = pps_register_source(&pps, PTP_PPS_DEFAULTS);
+		if (err < 0) {
+			pr_err("failed to register pps source\n");
+			goto no_pps;
+		} else
+			ptp->pps_source = err;
+	}
+
+	/* Clock is ready, add it into the list. */
+	list_add(&ptp->list, &clocks.list);
+
+	mutex_unlock(&clocks_mux);
+	return ptp;
+
+no_pps:
+no_cdev:
+	device_destroy(ptp_class, ptp->devid);
+no_device:
+	mutex_destroy(&ptp->alarm_mux);
+	mutex_destroy(&ptp->tsevq_mux);
+	kfree(ptp);
+no_memory:
+	clear_bit(index, clocks.map);
+no_clock:
+	mutex_unlock(&clocks_mux);
+	return ERR_PTR(err);
+}
+EXPORT_SYMBOL(ptp_clock_register);
+
+int ptp_clock_unregister(struct ptp_clock *ptp)
+{
+	/* Release the clock's resources. */
+	if (ptp->info->pps)
+		pps_unregister_source(ptp->pps_source);
+	cdev_del(&ptp->cdev);
+	device_destroy(ptp_class, ptp->devid);
+	mutex_destroy(&ptp->alarm_mux);
+	mutex_destroy(&ptp->tsevq_mux);
+
+	/* Remove the clock from the list. */
+	mutex_lock(&clocks_mux);
+	list_del(&ptp->list);
+	clear_bit(ptp->index, clocks.map);
+	mutex_unlock(&clocks_mux);
+
+	kfree(ptp);
+
+	return 0;
+}
+EXPORT_SYMBOL(ptp_clock_unregister);
+
+void ptp_clock_event(struct ptp_clock *ptp, struct ptp_clock_event *event)
+{
+	struct timespec ts;
+	struct pps_ktime pps_ts;
+
+	switch (event->type) {
+
+	case PTP_CLOCK_ALARM:
+		kill_pid(ptp->alarm[event->index].pid,
+			 ptp->alarm[event->index].sig, 1);
+		break;
+
+	case PTP_CLOCK_EXTTS:
+		enqueue_external_timestamp(&ptp->tsevq, event);
+		wake_up_interruptible(&ptp->tsev_wq);
+		break;
+
+	case PTP_CLOCK_PPS:
+		getnstimeofday(&ts);
+		pps_ts.sec = ts.tv_sec;
+		pps_ts.nsec = ts.tv_nsec;
+		pps_event(ptp->pps_source, &pps_ts, PTP_PPS_EVENT, NULL);
+		break;
+	}
+}
+EXPORT_SYMBOL(ptp_clock_event);
+
+/* character device operations */
+
+static int ptp_ioctl(struct inode *node, struct file *fp,
+		      unsigned int cmd, unsigned long arg)
+{
+	struct ptp_clock_caps caps;
+	struct ptp_clock_request req;
+	struct ptp_clock_timer timer;
+	struct ptp_clock *ptp = fp->private_data;
+	struct ptp_clock_info *ops = ptp->info;
+	void *priv = ops->priv;
+	struct timespec ts;
+	int flags, index;
+	int err = 0;
+	s32 ppb;
+
+	switch (cmd) {
+
+	case PTP_CLOCK_APIVERS:
+		err = put_user(PTP_CLOCK_VERSION, (u32 __user *)arg);
+		break;
+
+	case PTP_CLOCK_ADJFREQ:
+		if (!capable(CAP_SYS_TIME))
+			return -EPERM;
+		ppb = arg;
+		if (ppb > ops->max_adj || ppb < -ops->max_adj)
+			return -EINVAL;
+		err = ops->adjfreq(priv, ppb);
+		break;
+
+	case PTP_CLOCK_ADJTIME:
+		if (!capable(CAP_SYS_TIME))
+			return -EPERM;
+		if (copy_from_user(&ts, (void __user *)arg, sizeof(ts)))
+			err = -EFAULT;
+		else
+			err = ops->adjtime(priv, &ts);
+		break;
+
+	case PTP_CLOCK_GETTIME:
+		err = ops->gettime(priv, &ts);
+		if (err)
+			break;
+		err = copy_to_user((void __user *)arg, &ts, sizeof(ts));
+		break;
+
+	case PTP_CLOCK_SETTIME:
+		if (!capable(CAP_SYS_TIME))
+			return -EPERM;
+		if (copy_from_user(&ts, (void __user *)arg, sizeof(ts)))
+			err = -EFAULT;
+		else
+			err = ops->settime(priv, &ts);
+		break;
+
+	case PTP_CLOCK_GETCAPS:
+		memset(&caps, 0, sizeof(caps));
+		caps.max_adj = ptp->info->max_adj;
+		caps.n_alarm = ptp->info->n_alarm;
+		caps.n_ext_ts = ptp->info->n_ext_ts;
+		caps.n_per_out = ptp->info->n_per_out;
+		caps.pps = ptp->info->pps;
+		err = copy_to_user((void __user *)arg, &caps, sizeof(caps));
+		break;
+
+	case PTP_CLOCK_GETTIMER:
+		if (copy_from_user(&timer, (void __user *)arg, sizeof(timer))) {
+			err = -EFAULT;
+			break;
+		}
+		index = timer.alarm_index;
+		if (index < 0 || index >= ptp->info->n_alarm) {
+			err = -EINVAL;
+			break;
+		}
+		err = ops->gettimer(priv, index, &timer.tsp);
+		if (err)
+			break;
+		err = copy_to_user((void __user *)arg, &timer, sizeof(timer));
+		break;
+
+	case PTP_CLOCK_SETTIMER:
+		if (copy_from_user(&timer, (void __user *)arg, sizeof(timer))) {
+			err = -EFAULT;
+			break;
+		}
+		index = timer.alarm_index;
+		if (index < 0 || index >= ptp->info->n_alarm) {
+			err = -EINVAL;
+			break;
+		}
+		if (!valid_signal(timer.signum))
+			return -EINVAL;
+		flags = timer.flags;
+		if (flags & (flags != TIMER_ABSTIME)) {
+			err = -EINVAL;
+			break;
+		}
+		if (mutex_lock_interruptible(&ptp->alarm_mux))
+			return -ERESTARTSYS;
+
+		if (ptp->alarm[index].pid)
+			put_pid(ptp->alarm[index].pid);
+
+		ptp->alarm[index].pid = get_pid(task_pid(current));
+		ptp->alarm[index].sig = timer.signum;
+		err = ops->settimer(priv, index, flags, &timer.tsp);
+
+		mutex_unlock(&ptp->alarm_mux);
+		break;
+
+	case PTP_FEATURE_REQUEST:
+		if (copy_from_user(&req, (void __user *)arg, sizeof(req))) {
+			err = -EFAULT;
+			break;
+		}
+		switch (req.type) {
+		case PTP_REQUEST_EXTTS:
+		case PTP_REQUEST_PEROUT:
+			break;
+		case PTP_REQUEST_PPS:
+			if (!capable(CAP_SYS_TIME))
+				return -EPERM;
+			break;
+		default:
+			err = -EINVAL;
+			break;
+		}
+		if (err)
+			break;
+		err = ops->enable(priv, &req,
+				  req.flags & PTP_ENABLE_FEATURE ? 1 : 0);
+		break;
+
+	default:
+		err = -ENOTTY;
+		break;
+	}
+	return err;
+}
+
+static int ptp_open(struct inode *inode, struct file *fp)
+{
+	struct ptp_clock *ptp;
+	ptp = container_of(inode->i_cdev, struct ptp_clock, cdev);
+
+	fp->private_data = ptp;
+
+	return 0;
+}
+
+static unsigned int ptp_poll(struct file *fp, poll_table *wait)
+{
+	struct ptp_clock *ptp = fp->private_data;
+
+	poll_wait(fp, &ptp->tsev_wq, wait);
+
+	return queue_cnt(&ptp->tsevq) ? POLLIN : 0;
+}
+
+static ssize_t ptp_read(struct file *fp, char __user *buf,
+			size_t cnt, loff_t *off)
+{
+	struct ptp_clock *ptp = fp->private_data;
+	struct timestamp_event_queue *queue = &ptp->tsevq;
+	struct ptp_extts_event *event;
+	size_t qcnt;
+
+	if (mutex_lock_interruptible(&ptp->tsevq_mux))
+		return -ERESTARTSYS;
+
+	cnt = cnt / sizeof(struct ptp_extts_event);
+
+	if (wait_event_interruptible(ptp->tsev_wq,
+				     (qcnt = queue_cnt(&ptp->tsevq)))) {
+		mutex_unlock(&ptp->tsevq_mux);
+		return -ERESTARTSYS;
+	}
+
+	if (cnt > qcnt)
+		cnt = qcnt;
+
+	event = &queue->buf[queue->head];
+
+	if (copy_to_user(buf, event, cnt * sizeof(struct ptp_extts_event))) {
+		mutex_unlock(&ptp->tsevq_mux);
+		return -EFAULT;
+	}
+	queue->head = (queue->head + cnt) % PTP_MAX_TIMESTAMPS;
+
+	mutex_unlock(&ptp->tsevq_mux);
+
+	return cnt * sizeof(struct ptp_extts_event);
+}
+
+static int ptp_release(struct inode *inode, struct file *fp)
+{
+	struct ptp_clock *ptp;
+	struct itimerspec ts = {
+		{0, 0}, {0, 0}
+	};
+	int i;
+
+	ptp = container_of(inode->i_cdev, struct ptp_clock, cdev);
+
+	for (i = 0; i < ptp->info->n_alarm; i++) {
+		if (ptp->alarm[i].pid) {
+			ptp->info->settimer(ptp->info->priv, i, 0, &ts);
+			put_pid(ptp->alarm[i].pid);
+			ptp->alarm[i].pid = NULL;
+		}
+	}
+	return 0;
+}
+
+static const struct file_operations ptp_fops = {
+	.owner		= THIS_MODULE,
+	.ioctl		= ptp_ioctl,
+	.open		= ptp_open,
+	.poll		= ptp_poll,
+	.read		= ptp_read,
+	.release	= ptp_release,
+};
+
+/* module operations */
+
+static void __exit ptp_exit(void)
+{
+	class_destroy(ptp_class);
+	unregister_chrdev_region(ptp_devt, PTP_MAX_CLOCKS);
+}
+
+static int __init ptp_init(void)
+{
+	int err;
+
+	INIT_LIST_HEAD(&clocks.list);
+
+	ptp_class = class_create(THIS_MODULE, "ptp");
+	if (!ptp_class) {
+		printk(KERN_ERR "ptp: failed to allocate class\n");
+		return -ENOMEM;
+	}
+
+	err = alloc_chrdev_region(&ptp_devt, 0, PTP_MAX_CLOCKS, "ptp");
+	if (err < 0) {
+		printk(KERN_ERR "ptp: failed to allocate char device region\n");
+		goto no_region;
+	}
+
+	pr_info("PTP clock support registered\n");
+	return 0;
+
+no_region:
+	class_destroy(ptp_class);
+	return err;
+}
+
+subsys_initcall(ptp_init);
+module_exit(ptp_exit);
+
+MODULE_AUTHOR("Richard Cochran <richard.cochran@omicron.at>");
+MODULE_DESCRIPTION("PTP clocks support");
+MODULE_LICENSE("GPL");
diff --git a/include/linux/Kbuild b/include/linux/Kbuild
index 9aa9bca..471205a 100644
--- a/include/linux/Kbuild
+++ b/include/linux/Kbuild
@@ -140,6 +140,7 @@  header-y += pkt_sched.h
 header-y += posix_types.h
 header-y += ppdev.h
 header-y += prctl.h
+header-y += ptp_clock.h
 header-y += qnxtypes.h
 header-y += qnx4_fs.h
 header-y += radeonfb.h
diff --git a/include/linux/ptp_clock.h b/include/linux/ptp_clock.h
new file mode 100644
index 0000000..5a509c5
--- /dev/null
+++ b/include/linux/ptp_clock.h
@@ -0,0 +1,79 @@ 
+/*
+ * PTP 1588 clock support - user space interface
+ *
+ * Copyright (C) 2010 OMICRON electronics GmbH
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, write to the Free Software
+ *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#ifndef _PTP_CLOCK_H_
+#define _PTP_CLOCK_H_
+
+#include <linux/ioctl.h>
+#include <linux/types.h>
+
+#define PTP_ENABLE_FEATURE (1<<0)
+#define PTP_RISING_EDGE    (1<<1)
+#define PTP_FALLING_EDGE   (1<<2)
+
+enum ptp_request_types {
+	PTP_REQUEST_EXTTS,
+	PTP_REQUEST_PEROUT,
+	PTP_REQUEST_PPS,
+};
+
+struct ptp_clock_caps {
+	__s32 max_adj; /* Maximum frequency adjustment, parts per billon. */
+	int n_alarm;   /* Number of programmable alarms. */
+	int n_ext_ts;  /* Number of external time stamp channels. */
+	int n_per_out; /* Number of programmable periodic signals. */
+	int pps;       /* Whether the clock supports a PPS callback. */
+};
+
+struct ptp_clock_timer {
+	int alarm_index;       /* Which alarm to query or configure. */
+	int signum;            /* Requested signal. */
+	int flags;             /* Zero or TIMER_ABSTIME, see TIMER_SETTIME(2) */
+	struct itimerspec tsp; /* See TIMER_SETTIME(2) */
+};
+
+struct ptp_clock_request {
+	int type;  /* One of the ptp_request_types enumeration values. */
+	int index; /* Which channel to configure. */
+	struct timespec ts; /* For period signals, the desired period. */
+	int flags; /* Bit field for PTP_ENABLE_FEATURE or other flags. */
+};
+
+struct ptp_extts_event {
+	int index;
+	struct timespec ts;
+};
+
+#define PTP_CLOCK_VERSION 0x00000001
+
+#define PTP_CLK_MAGIC '='
+
+#define PTP_CLOCK_APIVERS _IOR (PTP_CLK_MAGIC, 1, __u32)
+#define PTP_CLOCK_ADJFREQ _IO  (PTP_CLK_MAGIC, 2)
+#define PTP_CLOCK_ADJTIME _IOW (PTP_CLK_MAGIC, 3, struct timespec)
+#define PTP_CLOCK_GETTIME _IOR (PTP_CLK_MAGIC, 4, struct timespec)
+#define PTP_CLOCK_SETTIME _IOW (PTP_CLK_MAGIC, 5, struct timespec)
+
+#define PTP_CLOCK_GETCAPS   _IOR  (PTP_CLK_MAGIC, 6, struct ptp_clock_caps)
+#define PTP_CLOCK_GETTIMER  _IOWR (PTP_CLK_MAGIC, 7, struct ptp_clock_timer)
+#define PTP_CLOCK_SETTIMER  _IOW  (PTP_CLK_MAGIC, 8, struct ptp_clock_timer)
+#define PTP_FEATURE_REQUEST _IOW  (PTP_CLK_MAGIC, 9, struct ptp_clock_request)
+
+#endif
diff --git a/include/linux/ptp_clock_kernel.h b/include/linux/ptp_clock_kernel.h
new file mode 100644
index 0000000..d6cc158
--- /dev/null
+++ b/include/linux/ptp_clock_kernel.h
@@ -0,0 +1,137 @@ 
+/*
+ * PTP 1588 clock support
+ *
+ * Copyright (C) 2010 OMICRON electronics GmbH
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, write to the Free Software
+ *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#ifndef _PTP_CLOCK_KERNEL_H_
+#define _PTP_CLOCK_KERNEL_H_
+
+#include <linux/ptp_clock.h>
+
+/**
+ * struct ptp_clock_info - decribes a PTP hardware clock
+ *
+ * @owner:     The clock driver should set to THIS_MODULE.
+ * @name:      A short name to identify the clock.
+ * @max_adj:   The maximum possible frequency adjustment, in parts per billon.
+ * @n_alarm:   The number of programmable alarms.
+ * @n_ext_ts:  The number of external time stamp channels.
+ * @n_per_out: The number of programmable periodic signals.
+ * @pps:       Indicates whether the clock supports a PPS callback.
+ * @priv:      Passed to the clock operations, for the driver's private use.
+ *
+ * clock operations
+ *
+ * @adjfreq:  Adjusts the frequency of the hardware clock.
+ *            parameter delta: Desired period change in parts per billion.
+ *
+ * @adjtime:  Shifts the time of the hardware clock.
+ *            parameter ts: Desired change in seconds and nanoseconds.
+ *
+ * @gettime:  Reads the current time from the hardware clock.
+ *            parameter ts: Holds the result.
+ *
+ * @settime:  Set the current time on the hardware clock.
+ *            parameter ts: Time value to set.
+ *
+ * @gettimer: Reads the time remaining from the given timer.
+ *            parameter index: Which alarm to query.
+ *            parameter ts: Holds the result.
+ *
+ * @settimer: Arms the given timer for periodic or one shot operation.
+ *            parameter index: Which alarm to set.
+ *            parameter abs: TIMER_ABSTIME, or zero for relative timer.
+ *            parameter ts: Alarm time and period to set.
+ *
+ * @enable:   Request driver to enable or disable an ancillary feature.
+ *            parameter request: Desired resource to enable or disable.
+ *            parameter on: Caller passes one to enable or zero to disable.
+ *
+ * The callbacks must all return zero on success, non-zero otherwise.
+ */
+
+struct ptp_clock_info {
+	struct module *owner;
+	char name[16];
+	s32 max_adj;
+	int n_alarm;
+	int n_ext_ts;
+	int n_per_out;
+	int pps;
+	void *priv;
+	int (*adjfreq)(void *priv, s32 delta);
+	int (*adjtime)(void *priv, struct timespec *ts);
+	int (*gettime)(void *priv, struct timespec *ts);
+	int (*settime)(void *priv, struct timespec *ts);
+	int (*gettimer)(void *priv, int index, struct itimerspec *ts);
+	int (*settimer)(void *priv, int index, int abs, struct itimerspec *ts);
+	int (*enable)(void *priv, struct ptp_clock_request *request, int on);
+};
+
+struct ptp_clock;
+
+/**
+ * ptp_clock_register() - register a PTP hardware clock driver
+ *
+ * @info:  Structure describing the new clock.
+ */
+
+extern struct ptp_clock *ptp_clock_register(struct ptp_clock_info *info);
+
+/**
+ * ptp_clock_unregister() - unregister a PTP hardware clock driver
+ *
+ * @ptp:  The clock to remove from service.
+ */
+
+extern int ptp_clock_unregister(struct ptp_clock *ptp);
+
+
+enum ptp_clock_events {
+	PTP_CLOCK_ALARM,
+	PTP_CLOCK_EXTTS,
+	PTP_CLOCK_PPS,
+};
+
+/**
+ * struct ptp_clock_event - decribes a PTP hardware clock event
+ *
+ * @type:  One of the ptp_clock_events enumeration values.
+ * @index: Identifies the source of the event.
+ * @timestamp: When the event occured.
+ */
+
+struct ptp_clock_event {
+	int type;
+	int index;
+	u64 timestamp;
+};
+
+/**
+ * ptp_clock_event() - notify the PTP layer about an event
+ *
+ * This function should only be called from interrupt context.
+ *
+ * @ptp:    The clock obtained from ptp_clock_register().
+ * @event:  Message structure describing the event.
+ */
+
+extern void ptp_clock_event(struct ptp_clock *ptp,
+			    struct ptp_clock_event *event);
+
+#endif