333 lines
19 KiB
HTML
333 lines
19 KiB
HTML
|
<HTML>
|
||
|
<HEAD>
|
||
|
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
|
||
|
<META NAME="GENERATOR" CONTENT="Mozilla/4.01 [en] (Win95; I) [Netscape]">
|
||
|
<TITLE>Mitigation Rules and the ``prefer'' Keyword
|
||
|
</TITLE>
|
||
|
</HEAD>
|
||
|
<BODY>
|
||
|
|
||
|
<H3>
|
||
|
Mitigation Rules and the <TT>prefer</TT> Keyword</H3>
|
||
|
|
||
|
<HR>
|
||
|
<H4>
|
||
|
Introduction</H4>
|
||
|
The mechanics of the NTP algorithms which select the best data sample from
|
||
|
each available peer and the best subset of the peer population have been
|
||
|
finely crafted to resist network jitter, faults in the network or peer
|
||
|
operations, and to deliver the best possible accuracy. Most of the time
|
||
|
these algorithms do a good job without requiring explicit manual tailoring
|
||
|
of the configuration file. However, there are times when the accuracy can
|
||
|
be improved by some careful tailoring. The following sections explain how
|
||
|
to do this using explicit configuration items and special signals, when
|
||
|
available, that are generated by some radio clocks.
|
||
|
|
||
|
<P>In order to provide robust backup sources, primary (stratum-1) servers
|
||
|
are usually operated in a diversity configuration, in which the server
|
||
|
operates with a number of remote peers in addition to one or more radio
|
||
|
or modem clocks operating as local peers. In these configurations the suite
|
||
|
of algorithms used in NTP to refine the data from each peer separately
|
||
|
and to select and weight the data from a number of peers are used with
|
||
|
the entire ensemble of remote peers and local peers. As the result of these
|
||
|
algorithms, a set of <I>survivors</I> are identified which can presumably
|
||
|
provide the most reliable and accurate time. Ordinarily, the individual
|
||
|
clock offsets of the survivors are combined on a weighted average basis
|
||
|
to produce an offset used to control the system clock.
|
||
|
|
||
|
<P>However, because of small but significant systematic time offsets between
|
||
|
the survivors, it is in general not possible to achieve the lowest jitter
|
||
|
and highest stability in these configurations. This happens because the
|
||
|
selection algorithm tends to <I>clockhop</I> between survivors of substantially
|
||
|
the same quality, but showing small systematic offsets between them. In
|
||
|
addition, there are a number of configurations involving pulse-per-second
|
||
|
(PPS) signals, modem backup services and other special cases, so that a
|
||
|
set of mitigation rules becomes necessary to select a single peer from
|
||
|
among the survivors. These rules are based on a set of special characteristics
|
||
|
of the various peers and reference clock drivers specified in the configuration
|
||
|
file.
|
||
|
<H4>
|
||
|
The <TT>prefer</TT> Peer</H4>
|
||
|
The mitigation rules are designed to provide an intelligent selection between
|
||
|
various peers of substantially the same statistical quality. They is designed
|
||
|
to provide the best quality time without compromising the normal operation
|
||
|
of the NTP algorithms. The mitigation scheme in its present form is not
|
||
|
an integral component of the NTP Version 3 specification RFC- 1305. but
|
||
|
is to be included in the version 4 specification when it is published.
|
||
|
The scheme is based on the concept of <I>prefer peer</I>, which is specified
|
||
|
by including the <TT>prefer</TT> keyword with the associated <TT>server</TT>
|
||
|
or <TT>peer</TT> command in the configuration file. This keyword can be
|
||
|
used with any peer or server, but is most commonly used with a radio clock.
|
||
|
While the scheme does not forbid it, it does not seem useful to designate
|
||
|
more than one peer as preferred, since the additional complexities to mitigate
|
||
|
among them do not seem justified from on-air experience.
|
||
|
|
||
|
<P>The prefer scheme works on the set of peers that have survived the sanity
|
||
|
checks and intersection algorithms of the clock selection procedures. Ordinarily,
|
||
|
the members of this set can be considered <I>truechimers</I> and any one
|
||
|
of them could in principle provide correct time; however, due to various
|
||
|
error contributions, not all can provide the most accurate and stable time.
|
||
|
The job of the clustering algorithm, which is invoked at this point, is
|
||
|
to select the best subset of the survivors providing the least variance
|
||
|
in the combined ensemble average, compared to the variance in each member
|
||
|
of the subset separately. The detailed operation of the clustering algorithm,
|
||
|
which is given in the specification, is not important here, other than
|
||
|
to point out it operates in rounds, where a survivor, presumably the worst
|
||
|
of the lot, is discarded in each round until one of several termination
|
||
|
conditions is met.
|
||
|
|
||
|
<P>In the prefer scheme the clustering algorithm is modified so that the
|
||
|
prefer peer is never discarded; on the contrary, its potential removal
|
||
|
becomes a termination condition. If the original algorithm were about to
|
||
|
toss out the prefer peer, the algorithm terminates right there. The prefer
|
||
|
peer can still be discarded by the sanity checks and intersection algorithms,
|
||
|
of course, but it will always survive the clustering algorithm. If it does
|
||
|
not survive or for some reason it fails to provide updates, it will eventually
|
||
|
become unreachable and the clock selection will remitigate to select the
|
||
|
next best source.
|
||
|
|
||
|
<P>Along with this behavior, the clock selection procedures are modified
|
||
|
so that the combining algorithm is not used when a prefer peer is present.
|
||
|
Instead, the offset of the prefer peer is used exclusively as the synchronization
|
||
|
source. In the usual case involving a radio clock and a flock of remote
|
||
|
stratum-1 peers, and with the radio clock designated a prefer peer, the
|
||
|
result is that the high quality radio time disciplines the server clock
|
||
|
as long as the radio itself remains operational and with valid time, as
|
||
|
determined from the remote peers, sanity checks and intersection algorithm.
|
||
|
<H4>
|
||
|
Peer Classification</H4>
|
||
|
In order to understand the effects of the various intricate schemes involved,
|
||
|
it is necessary to understand some arcane details on how the algorithms
|
||
|
decide on a synchronization source, when more than one source is available.
|
||
|
This is done on the basis of a set of explicit mitigation rules, which
|
||
|
define special classes of remote and local peers as a function of configuration
|
||
|
declarations and reference clock driver type:
|
||
|
<OL>
|
||
|
<LI>
|
||
|
The prefer peer is designated using the <TT>prefer</TT> keyword with the
|
||
|
<TT>server</TT> or <TT>peer</TT> commands. All other things being equal,
|
||
|
this peer will be selected for synchronization over all other survivors
|
||
|
of the clock selection procedures.</LI>
|
||
|
|
||
|
<BR>
|
||
|
<LI>
|
||
|
When a PPS signal is connected via the PPS Clock Discipline driver (type
|
||
|
22), this is called the <I>PPS peer</I>. This driver provides precision
|
||
|
clock corrections only within one second, so is always operated in conjunction
|
||
|
with another peer or reference clock driver, which provides the seconds
|
||
|
numbering. The PPS peer is active only under conditions explained below.</LI>
|
||
|
|
||
|
<BR>
|
||
|
<LI>
|
||
|
When the Undisciplined Local Clock driver (type 1) is configured, this
|
||
|
is called the <I>local clock peer</I>. This is used either as a backup
|
||
|
reference source (stratum greater than zero), should all other synchronization
|
||
|
sources fail, or as the primary reference source (stratum zero) in cases
|
||
|
where the kernel time is disciplined by some other means of synchronization,
|
||
|
such as the NIST <TT>lockclock</TT> scheme, or another synchronization
|
||
|
protocol, such as the Digital Time Synchronization Service (DTSS).</LI>
|
||
|
|
||
|
<BR>
|
||
|
<LI>
|
||
|
When a modem driver such as the Automated Computer Time Service driver
|
||
|
(type 18) is configured, this is called the <I>modem peer</I>. This is
|
||
|
used either as a backup reference source, should all other primary sources
|
||
|
fail, or as the (only) primary reference source.</LI>
|
||
|
|
||
|
<BR>
|
||
|
<LI>
|
||
|
Where support is available, the PPS signal may be processed directly by
|
||
|
the kernel, as described in the <A HREF="kern.htm">A Kernel Model for Precision
|
||
|
Timekeeping</A> page. This is called the <I>kernel discipline</I>. The
|
||
|
PPS signal can discipline the kernel in both frequency and time. The frequency
|
||
|
discipline is active as long as the PPS interface device and signal itself
|
||
|
is operating correctly, as determined by the kernel algorithms. The time
|
||
|
discipline is active only under conditions explained below.</LI>
|
||
|
</OL>
|
||
|
Reference clock drivers operate in the manner described in the <A HREF="refclock.htm">Reference
|
||
|
Clock Drivers</A> page and its dependencies. The drivers are ordinarily
|
||
|
operated at stratum zero, so that as the result of ordinary NTP operations,
|
||
|
the server itself operates at stratum one, as required by the NTP specification.
|
||
|
In some cases described below, the driver is intentionally operated at
|
||
|
an elevated stratum, so that it will be selected only if no other survivor
|
||
|
is present with a lower stratum. In the case of the PPS peer or kernel
|
||
|
time discipline, these sources appear active only if the prefer peer has
|
||
|
survived the intersection and clustering algorithms, as described below,
|
||
|
and its clock offset relative to the current local clock is less than a
|
||
|
specified value, currently 128 ms.
|
||
|
|
||
|
<P>The modem clock drivers are a special case. Ordinarily, the update interval
|
||
|
between modem calls to synchronize the system clock is many times longer
|
||
|
than the interval between polls of either the remote or local peers. In
|
||
|
order to provide the best stability, the operation of the clock discipline
|
||
|
algorithm changes gradually from a phase-lock mode at the shorter update
|
||
|
intervals to a frequency-lock mode at the longer update intervals. If both
|
||
|
remote or local peers together with a modem peer are operated in the same
|
||
|
configuration, what can happen is that first the clock selection algorithm
|
||
|
can select one or more remote/local peers and the clock discipline algorithm
|
||
|
will optimize for the shorter update intervals. Then, the selection algorithm
|
||
|
can select the modem peer, which requires a much different optimization.
|
||
|
The intent in the design is to allow the modem peer to control the system
|
||
|
clock either when no other source is available or, if the modem peer happens
|
||
|
to be marked as prefer, then it always controls the clock, as long as it
|
||
|
passes the sanity checks and intersection algorithm. There still is room
|
||
|
for suboptimal operation in this scheme, since a noise spike can still
|
||
|
cause a clockhop either way. Nevertheless, the optimization function is
|
||
|
slow to adapt, so that a clockhop or two does not cause much harm.
|
||
|
|
||
|
<P>The local clock driver is another special case. Normally, this driver
|
||
|
is eligible for selection only if no other source is available. When selected,
|
||
|
vernier adjustments introduced via the configuration file or remotely using
|
||
|
the <TT><A HREF="ntpdc.htm">ntpdc</A> </TT>program can be used to trim
|
||
|
the local clock frequency and time. However, if the local clock driver
|
||
|
is designated the prefer peer, this driver is always selected and all other
|
||
|
sources are ignored. This behavior is intended for use when the kernel
|
||
|
time is controlled by some means external to NTP, such as the NIST <TT>lockclock</TT>
|
||
|
algorithm or another time synchronization protocol such as DTSS.
|
||
|
In this case the only way to disable the local clock driver is to mark
|
||
|
it unsynchronized using the leap indicator bits. In the case of modified
|
||
|
kernels with the <TT>ntp_adjtime()</TT> system call, this can be done automatically
|
||
|
if the external synchronization protocol uses it to discipline the kernel
|
||
|
time.
|
||
|
<H4>
|
||
|
Mitigation Rules</H4>
|
||
|
The mitigation rules apply in the intersection and clustering algorithms
|
||
|
described in the NTP specification. The intersection algorithm first scans
|
||
|
all peers with a persistent association and includes only those that satisfy
|
||
|
specified sanity checks. In addition to the checks required by the specification,
|
||
|
the mitigation rules require either the local-clock peer or modem peer
|
||
|
to be included only if marked as the prefer peer. The intersection algorithm
|
||
|
operates on the included population to select only those peers believed
|
||
|
to represent the correct time. If one or more peers survive the operation,
|
||
|
processing continues in the clustering algorithm. Otherwise, if there is
|
||
|
a modem peer, it is declared the only survivor; otherwise, if there is
|
||
|
a local-clock peer, it is declared the only survivor. Processing then continues
|
||
|
in the clustering algorithm.
|
||
|
|
||
|
<P>The clustering algorithm repeatedly discards outlyers in order to reduce
|
||
|
the residual jitter in the survivor population. As required by the NTP
|
||
|
specification, these operations continue until either a specified minimum
|
||
|
number of survivors remain or the minimum select dispersion of the population
|
||
|
is greater than the maximum peer dispersion of any member. The mitigation
|
||
|
rules require an additional terminating condition which stops these operations
|
||
|
at the point where the prefer peer is about to be discarded.
|
||
|
|
||
|
<P>The mitigation rules establish the choice of <I>system peer</I>, which
|
||
|
determine the stratum, reference identifier and several other system variables
|
||
|
which are visible to clients of the local server. In addition, they establish
|
||
|
which source or combination of sources control the local clock.
|
||
|
<OL>
|
||
|
<LI>
|
||
|
If there is a prefer peer and it is the local-clock peer or the modem peer;
|
||
|
or, if there is a prefer peer and the kernel time discipline is active,
|
||
|
choose the prefer peer as the system peer and its offset as the system
|
||
|
clock offset. If the prefer peer is the local-clock peer, an offset can
|
||
|
be calculated by the driver to produce a frequency offset in order to correct
|
||
|
for systematic frequency errors. In case a source other than NTP is controlling
|
||
|
the system clock, corrections determined by NTP can be ignored by using
|
||
|
the <TT>disable pll</TT> in the configuration file. If the prefer peer
|
||
|
is the modem peer, it must be the primary source for the reasons noted
|
||
|
above. If the kernel time discipline is active, the system clock offset
|
||
|
is ignored and the corrections handled directly by the kernel.</LI>
|
||
|
|
||
|
<LI>
|
||
|
If the above is not the case and there is a PPS peer, then choose it as
|
||
|
the system peer and its offset as the system clock offset.</LI>
|
||
|
|
||
|
<LI>
|
||
|
If the above is not the case and there is a prefer peer (not the local-clock
|
||
|
or modem peer in this case), then choose it as the system peer and its
|
||
|
offset as the system clock offset.</LI>
|
||
|
|
||
|
<LI>
|
||
|
If the above is not the case and the peer previously chosen as the system
|
||
|
peer is in the surviving population, then choose it as the system peer
|
||
|
and average its offset along with the other survivors to determine the
|
||
|
system clock offset. This behavior is designed to avoid excess jitter due
|
||
|
to clockhopping, when switching the system peer would not materially improve
|
||
|
the time accuracy.</LI>
|
||
|
|
||
|
<LI>
|
||
|
If the above is not the case, then choose the first candidate in the list
|
||
|
of survivors ranked in order of synchronization distance and average its
|
||
|
offset along with the other survivors to determine the system clock offset.
|
||
|
This is the default case and the only case considered in the current NTP
|
||
|
specification.</LI>
|
||
|
</OL>
|
||
|
|
||
|
<H4>
|
||
|
Using the Pulse-per-Second (PPS) Signal</H4>
|
||
|
Most radio clocks are connected using a serial port operating at speeds
|
||
|
of 9600 bps or higher. The accuracy using typical timecode formats, where
|
||
|
the on-time epoch is indicated by a designated ASCII character, like carriage-return
|
||
|
<TT><cr></TT>, is limited to a millisecond at best and a few milliseconds
|
||
|
in typical cases. However, some radios produce a PPS signal which can be
|
||
|
used to improve the accuracy with typical workstation servers to the order
|
||
|
of a few tens of microseconds. The details of how this can be accomplished
|
||
|
are discussed in the <A HREF="pps.htm">Pulse-per-second (PPS) Signal Interfacing</A>
|
||
|
page. The following paragraphs discuss how the PPS signal is affected by
|
||
|
the mitigation rules.
|
||
|
|
||
|
<P>First, it should be pointed out that the PPS signal is inherently ambiguous,
|
||
|
in that it provides a precise seconds epoch, but does not provide a way
|
||
|
to number the seconds. In principle and most commonly, another source of
|
||
|
synchronization, either the timecode from an associated radio clock, or
|
||
|
even one or more remote NTP servers, is available to perform that function.
|
||
|
In all cases, a specific, configured peer or server must be designated
|
||
|
as associated with the PPS signal. This is done using the <TT>prefer</TT>
|
||
|
keyword as described previously. The PPS signal can be associated in this
|
||
|
way with any peer, but is most commonly used with the radio clock generating
|
||
|
the PPS signal.
|
||
|
|
||
|
<P>The PPS signal can be used in two ways to discipline the local clock,
|
||
|
one using a special PPS driver described in the <A HREF="driver22.htm">PPS
|
||
|
Clock Discipline</A> page, the other using PPS signal support in the kernel,
|
||
|
as described in the <A HREF="kern.htm">A Kernel Model for Precision Timekeeping</A>
|
||
|
page. In either case, the signal must be present and within nominal jitter
|
||
|
and wander error tolerances. In addition, the associated prefer peer must
|
||
|
have survived the sanity checks and intersection algorithms and the dispersion
|
||
|
settled below 1 s. This insures that the radio clock hardware is operating
|
||
|
correctly and that, presumably, the PPS signal is operating correctly as
|
||
|
well. Second, the absolute offset of the local clock from that peer must
|
||
|
be less than 128 ms, or well within the 0.5-s unambiguous range of the
|
||
|
PPS signal itself. In the case of the PPS driver, the time offsets generated
|
||
|
from the PPS signal are propagated via the clock filter to the clock selection
|
||
|
procedures just like any other peer. Should these pass the sanity checks
|
||
|
and intersection algorithms, they will show up along with the offsets of
|
||
|
the prefer peer itself. Note that, unlike the prefer peer, the PPS peer
|
||
|
samples are not protected from discard by the clustering algorithm. These
|
||
|
complicated procedures insure that the PPS offsets developed in this way
|
||
|
are the most accurate, reliable available for synchronization.
|
||
|
|
||
|
<P>The PPS peer remains active as long as it survives the intersection
|
||
|
algorithm and the prefer peer is reachable; however, like any other clock
|
||
|
driver, it runs a reachability algorithm on the PPS signal itself. If for
|
||
|
some reason the signal fails or displays gross errors, the PPS peer will
|
||
|
either become unreachable or stray out of the survivor population. In this
|
||
|
case the clock selection remitigates as described above.
|
||
|
|
||
|
<P>When kernel support for the PPS signal is available, the PPS signal
|
||
|
is interfaced to the kernel serial driver code via a modem control lead.
|
||
|
As the PPS signal is derived from external equipment, cables, etc., which
|
||
|
sometimes fail, a good deal of error checking is done in the kernel to
|
||
|
detect signal failure and excessive noise. The way in which the mitigation
|
||
|
rules affect the kernel discipline is as follows.
|
||
|
|
||
|
<P>In order to operate, the kernel support must be enabled by the <TT>enable
|
||
|
pll </TT>command in the configuration file and the signal must be present
|
||
|
and within nominal jitter and wander error tolerances. In the NTP daemon,
|
||
|
the PPS discipline is active only when the prefer peer is among the survivors
|
||
|
of the clustering algorithm, and its absolute offset is within 128 ms,
|
||
|
as in the PPS driver. Under these conditions the kernel disregards updates
|
||
|
produced by the NTP daemon and uses its internal PPS source instead. The
|
||
|
kernel maintains a watchdog timer for the PPS signal; if the signal has
|
||
|
not been heard or is out of tolerance for more than some interval, currently
|
||
|
two minutes, the kernel discipline is declared inoperable and operation
|
||
|
continues as if it were not present.
|
||
|
<HR>
|
||
|
<ADDRESS>
|
||
|
David L. Mills (mills@udel.edu)</ADDRESS>
|
||
|
|
||
|
</BODY>
|
||
|
</HTML>
|