327 lines
18 KiB
HTML
327 lines
18 KiB
HTML
<!-- $NetBSD: prefer.html,v 1.1 1998/12/30 20:20:36 mcr Exp $ --><
|
|
!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML Strict//EN">
|
|
<html><head><title>
|
|
Mitigation Rules and the ``prefer'' Keyword
|
|
</title></head><body><h3>
|
|
Mitigation Rules and the <code>prefer</code> Keyword
|
|
</h3><hr>
|
|
|
|
<p><h4>Introduction</h4>
|
|
|
|
<p>The mechanics of the NTP algorithms which select the best data sample
|
|
from each available peer and the best subset of the peer population have
|
|
been finely crafted to resist network jitter, faults in the network or
|
|
peer operations, and to deliver the best possible accuracy. Most of the
|
|
time these algorithms do a good job without requiring explicit manual
|
|
tailoring of the configuration file. However, there are times when the
|
|
accuracy can be improved by some careful tailoring. The following
|
|
sections explain how to do this using explicit configuration items and
|
|
special signals, when available, that are generated by some radio
|
|
clocks.
|
|
|
|
<p>In order to provide robust backup sources, primary (stratum-1)
|
|
servers are usually operated in a diversity configuration, in which the
|
|
server operates with a number of remote peers in addition to one or more
|
|
radio or modem clocks operating as local peers. In these configurations
|
|
the suite of algorithms used in NTP to refine the data from each peer
|
|
separately and to select and weight the data from a number of peers are
|
|
used with the entire ensemble of remote peers and local peers. As the
|
|
result of these algorithms, a set of <i>survivors</i> are identified
|
|
which can presumably provide the most reliable and accurate time.
|
|
Ordinarily, the individual clock offsets of the survivors are combined
|
|
on a weighted average basis to produce an offset used to control the
|
|
system clock.
|
|
|
|
<p>However, because of small but significant systematic time offsets
|
|
between the survivors, it is in general not possible to achieve the
|
|
lowest jitter and highest stability in these configurations. This
|
|
happens because the selection algorithm tends to <i>clockhop</i> between
|
|
survivors of substantially the same quality, but showing small
|
|
systematic offsets between them. In addition, there are a number of
|
|
configurations involving pulse-per-second (PPS) signals, modem backup
|
|
services and other special cases, so that a set of mitigation rules
|
|
becomes necessary to select a single peer from among the survivors.
|
|
These rules are based on a set of special characteristics of the various
|
|
peers and reference clock drivers specified in the configuration file.
|
|
|
|
<p><h4>The <code>prefer</code> Peer</h4>
|
|
|
|
<p>The mitigation rules are designed to provide an intelligent selection
|
|
between various peers of substantially the same statistical quality.
|
|
They is designed to provide the best quality time without compromising
|
|
the normal operation of the NTP algorithms. The mitigation scheme in its
|
|
present form is not an integral component of the NTP specification RFC-
|
|
1305. but is likely to be included in future versions of the
|
|
specification. The scheme is based on the concept of <i>prefer peer</i>,
|
|
which is specified by including the <code>prefer</code> keyword with the
|
|
associated <code>server</code> or <code>peer</code> command in the
|
|
configuration file. This keyword can be used with any peer or server,
|
|
but is most commonly used with a radio clock. While the scheme does not
|
|
forbid it, it does not seem useful to designate more than one peer as
|
|
preferred, since the additional complexities to mitigate among them do
|
|
not seem justified from on the air experience.
|
|
|
|
<p>The prefer scheme works on the set of peers that have survived the
|
|
sanity checks and intersection algorithms of the clock selection
|
|
procedures. Ordinarily, the members of this set can be considered
|
|
<i>truechimers</i> and any one of them could in principle provide
|
|
correct time; however, due to various error contributions, not all can
|
|
provide the most stable time. The job of the clustering algorithm, which
|
|
is invoked at this point, is to select the best subset of the survivors
|
|
providing the least variance in the combined ensemble, compared to the
|
|
variance in each member of the subset. The detailed operation of the
|
|
clustering algorithm, which is given in the specification, is not
|
|
important here, other than to point out it operates in rounds, where a
|
|
survivor, presumably the worst of the lot, is discarded in each round
|
|
until one of several termination conditions is met.
|
|
|
|
<p>In the prefer scheme the clustering algorithm is modified so that the
|
|
prefer peer is never discarded; on the contrary, its potential removal
|
|
becomes a termination condition. If the original algorithm were about to
|
|
toss out the prefer peer, the algorithm terminates right there. The
|
|
prefer peer can still be discarded by the sanity checks and intersection
|
|
algorithms, of course, but it will always survive the clustering
|
|
algorithm. The prefer peer is used as long as it survives the sanity
|
|
checks and intersection algorithm. If it does not survive or for some
|
|
reason it fails to provide updates, it will eventually become
|
|
unreachable and the clock selection will remitigate to select the next
|
|
best source.
|
|
|
|
<p>Along with this behavior, the clock selection procedures are modified
|
|
so that the combining algorithm is not used when a prefer peer is
|
|
present. Instead, the offset of the prefer peer is used exclusively as
|
|
the synchronization source. In the usual case involving a radio clock
|
|
and a flock of remote stratum-1 peers, and with the radio clock
|
|
designated a prefer peer, the result is that the high quality radio time
|
|
disciplines the server clock as long as the radio itself remains
|
|
operational and with valid time, as determined from the remote peers,
|
|
sanity checks and intersection algorithm.
|
|
|
|
<p><h4>Peer Classification</h4>
|
|
|
|
<p>In order to understand the effects of the various intricate schemes
|
|
involved, it is necessary to understand some arcane details on how the
|
|
algorithms decide on a synchronization source, when more than one source
|
|
is available. This is done on the basis of a set of explicit mitigation
|
|
rules, which define special classes of remote and local peers as a
|
|
function of configuration declarations and reference clock driver type:
|
|
|
|
<ol>
|
|
|
|
<li>The prefer peer is designated using the <code>prefer</code>
|
|
keyword with the <code>server</code> or <code>peer</code> commands. All
|
|
other things being equal, this peer will be selected for synchronization
|
|
over all other survivors of the clock selection procedures.
|
|
|
|
<p><li>When a PPS signal is connected via the PPS Clock Discipline
|
|
driver (type 22), this is called the <i>PPS peer</i>. This driver
|
|
provides precision clock corrections only within one second, so is
|
|
always operated in conjunction with another peer or reference clock
|
|
driver, which provides the seconds numbering. The PPS peer is active
|
|
only under conditions explained below.
|
|
|
|
<p><li>When the Undisciplined Local Clock driver (type 1) is configured,
|
|
this is called the <i>local-clock peer</i>. This is used either as a
|
|
backup reference source (stratum greater than zero), should all other
|
|
synchronization sources fail, or as the primary reference source
|
|
(stratum zero) in cases where the kernel time is disciplined by some
|
|
other means of synchronization, such as the NIST <code>lockclock</code>
|
|
scheme, or another synchronization protocol, such as the Digital Time
|
|
Synchronization Service (DTSS).
|
|
|
|
<p><li>When a modem driver such as the Automated Computer Time Service
|
|
driver (type 18) is configured, this is called the <i>modem peer</i>.
|
|
This is used either as a backup reference source, should all other
|
|
primary sources fail, or as the (only) primary reference source.
|
|
|
|
<p><li>Where support is available, the PPS signal may be processed
|
|
directly by the kernel, as described in the <a href="kern.html">A Kernel
|
|
Model for Precision Timekeeping</a> page. This is called the <i>kernel
|
|
discipline</i>. The PPS signal can discipline the kernel in both
|
|
frequency and time. The frequency discipline is active as long as the
|
|
PPS signal itself is operating correctly, as determined by the kernel
|
|
algorithms. The time discipline is active only under conditions
|
|
explained below.
|
|
|
|
</ol>
|
|
|
|
<p>Reference clock drivers operate in the manner described in the <a
|
|
href="refclock.html">Reference Clock Drivers</a> page and its
|
|
dependencies. The drivers are ordinarily operated at stratum zero, so
|
|
that as the result of ordinary NTP operations, the server itself
|
|
operates at stratum one, as required by the NTP specification RFC-1305.
|
|
In some cases described below, the driver is intentionally operated at
|
|
an elevated stratum, so that it will be selected only if no other
|
|
survivor is present with a lower stratum. In the case of the PPS peer or
|
|
kernel time discipline, these sources appear active only if the prefer
|
|
peer has survived the intersection and clustering algorithms, as
|
|
described below, and its clock offset relative to the current local
|
|
clock is less than a specified value, currently +-128 ms.
|
|
|
|
<p>The modem clock driver is a special case. Ordinarily, the update
|
|
interval between modem calls to synchronize the system clock is many
|
|
times longer than the interval between polls of either the remote or
|
|
local peers. In order to provide the best stability, the operation of
|
|
the clock discipline algorithm changes from a phase-lock mode at the
|
|
shorter update intervals to a frequency-lock mode at the longer update
|
|
intervals. If both remote or local peers together with a modem peer are
|
|
operated in the same configuration, what can happen is that first the
|
|
clock selection algorithm can select one or more remote/local peers and
|
|
the clock discipline algorithm will optimize for the shorter update
|
|
intervals. Then, the selection algorithm can select the modem peer,
|
|
which requires a much different optimization. The intent in the design
|
|
is to allow the modem peer to control the system clock either when no
|
|
other source is available or, if the modem peer happens to be marked as
|
|
prefer, then it always controls the clock, as long as it passes the
|
|
sanity checks and intersection algorithm. There still is room for
|
|
suboptimal operation in this scheme, since a noise spike can still cause
|
|
a clockhop either way. Nevertheless, the optimization function is slow
|
|
to adapt, so that a clockhop or two does not cause much harm.
|
|
|
|
<h4>Mitigation Rules</h4>
|
|
|
|
<p>The mitigation rules apply in the intersection and clustering
|
|
algorithms described in the NTP specification. The intersection
|
|
algorithm first scans all peers with a persistent association and
|
|
includes only those that satisfy specified sanity checks. In addition to
|
|
the checks required by the specification, the mitigation rules require
|
|
either the local-clock peer or modem peer to be included only if marked
|
|
as the prefer peer. The intersection algorithm operates on the included
|
|
population to select only those peers believed to represent the correct
|
|
time. If one or more peers survive the operation, processing continues
|
|
in the clustering algorithm. Otherwise, if there is a modem peer, it is
|
|
declared the only survivor; otherwise, if there is a local-clock peer,
|
|
it is declared the only survivor. Processing then continues in the
|
|
clustering algorithm.
|
|
|
|
<p>The clustering algorithm repeatedly discards outlyers in order to
|
|
reduce the residual jitter in the survivor population. As required by
|
|
the NTP specification, these operations continue until either a
|
|
specified minimum number of survivors remain or the minimum select
|
|
dispersion of the population is greater than the maximum peer dispersion
|
|
of any member. The mitigation rules require an additional terminating
|
|
condition which stops these operations at the point where the prefer
|
|
peer is about to be discarded.
|
|
|
|
<p>The mitigation rules establish the choice of <i>system peer</i>,
|
|
which determine the stratum, reference identifier and several other
|
|
system variables which are visible to clients of the local server. In
|
|
addition, they establish which source or combination of sources control
|
|
the local clock.
|
|
|
|
<ol>
|
|
|
|
<li>If there is a prefer peer and it is the local-clock peer or the
|
|
modem peer; or, if there is a prefer peer and the kernel time discipline
|
|
is active, choose the prefer peer as the system peer and its offset as
|
|
the system clock offset. If the prefer peer is the local-clock peer, an
|
|
offset can be calculated by the driver to produce a frequency offset in
|
|
order to correct for systematic frequency errors. In case a source other
|
|
than NTP is controlling the system clock, corrections determined by NTP
|
|
can be ignored by using the <code>disable pll</code> in the
|
|
configuration file. If the prefer peer is the modem peer, it must be the
|
|
primary source for the reasons noted above. If the kernel time
|
|
discipline is active, the system clock offset is ignored and the
|
|
corrections handled directly by the kernel.
|
|
|
|
<p><li>If the above is not the case and there is a PPS peer, then choose
|
|
it as the system peer and its offset as the system clock offset.
|
|
|
|
<p><li>If the above is not the case and there is a prefer peer (not the
|
|
local-clock or modem peer in this case), then choose it as the system
|
|
peer and its offset as the system clock offset.
|
|
|
|
<p><li>If the above is not the case and the peer previously chosen as
|
|
the system peer is in the surviving population, then choose it as the
|
|
system peer and average its offset along with the other survivors to
|
|
determine the system clock offset. This behavior is designed to avoid
|
|
excess jitter due to clockhopping, when switching the system peer would
|
|
not materially improve the time accuracy.
|
|
|
|
<p><li>If the above is not the case, then choose the first candidate in
|
|
the list of survivors ranked in order of synchronization distance and
|
|
average its offset along with the other survivors to determine the
|
|
system clock offset. This is the default case and the only case
|
|
considered in the current NTP specification.
|
|
|
|
</ol>
|
|
|
|
<p><h4>Using the Pulse-per-Second (PPS) Signal</h4>
|
|
|
|
<p>Most radio clocks are connected using a serial port operating at
|
|
speeds of 9600 bps or higher. The accuracy using typical timecode
|
|
formats, where the on-time epoch is indicated by a designated ASCII
|
|
character, like carriage-return <cr>, is limited to a millisecond
|
|
at best and a few milliseconds in typical cases. However, some radios
|
|
produce a PPS signal which can be used to improve the accuracy in
|
|
typical workstation servers to the order of a few tens of microseconds.
|
|
The details of how this can be accomplished are discussed in the <a
|
|
href="pps.html">Pulse-per-second (PPS) Signal Interfacing</a>
|
|
page. The following paragraphs discuss how the PPS signal is affected by
|
|
the mitigation rules.
|
|
|
|
<p>First, it should be pointed out that the PPS signal is inherently
|
|
ambiguous, in that it provides a precise seconds epoch, but does not
|
|
provide a way to number the seconds. In principle and most commonly,
|
|
another source of synchronization, either the timecode from an
|
|
associated radio clock, or even one or more remote peers, is available
|
|
to perform that function. In all cases, a specific, configured peer or
|
|
server must be designated as associated with the PPS signal. This is
|
|
done using the <code>prefer</code> keyword as described previously. The
|
|
PPS signal can be associated in this way any peer, but is most commonly
|
|
used with the radio clock generating the PPS signal.
|
|
|
|
<p>In order to operate, the PPS driver must be enabled by the
|
|
<code>enable pps</code> command in the configuration file and the signal
|
|
must be present and within nominal jitter and wander error tolerances.
|
|
In addition, its associated prefer peer must have survived the sanity
|
|
checks and intersection algorithms and have become active. This insures
|
|
that the radio clock hardware is operating correctly and that,
|
|
presumably, the PPS signal is operating correctly as well. Second, the
|
|
absolute time offset from that peer must be less than
|
|
<code>CLOCK_MAX</code>, the gradual-adjustment range, which is
|
|
ordinarily set at +-128 ms, or well within the +-0.5-s unambiguous range
|
|
of the PPS signal itself. Finally, the time offsets generated by the PPS
|
|
peer are propagated via the clock filter to the clock selection
|
|
procedures just like any other peer. Should these pass the sanity checks
|
|
and intersection algorithms, they will show up along with the offsets of
|
|
the prefer peer itself. Note that, unlike the prefer peer, the PPS peer
|
|
samples are not protected from discard by the clustering algorithm.
|
|
These complicated procedures insure that the PPS offsets developed in
|
|
this way are the most accurate, reliable available for synchronization.
|
|
|
|
<p>The PPS peer remains active as long as it survives the intersection
|
|
algorithm and the prefer peer is active; however, like any other clock
|
|
driver, it runs a reachability algorithm on the PPS signal itself. If
|
|
for some reason the signal fails or displays gross errors, the PPS peer
|
|
will either become unreachable or stray out of the survivor population.
|
|
In this case the clock selection remitigates as described above.
|
|
|
|
<p><h4>Using the Kernel Discipline</h4>
|
|
|
|
<p>Code to implement the kernel discipline is a special feature that can
|
|
be incorporated in the kernel of some workstations as described in the
|
|
<br><a href = "kern.html"> A Kernel Model for Precision Timekeeping </a>
|
|
page. The discipline provides for the control of the local clock
|
|
oscillator time and/or frequency by means of an external PPS signal
|
|
interfaced via a modem control lead. As the PPS signal is derived from
|
|
external equipment, cables, etc., which sometimes fail, a good deal of
|
|
error checking is done in the kernel to detect signal failure and
|
|
excessive noise. The way in which the mitigation rules affect the kernel
|
|
discipline is as follows.
|
|
|
|
<p>In order to operate, the kernel discipline must be enabled by the
|
|
<code>enable pps</code> command in the configuration file and the signal
|
|
must be present and within nominal jitter and wander error tolerances.
|
|
In the NTP daemon, the kernel time discipline is active only when the
|
|
prefer peer is among the survivors of the clustering algorithm, and its
|
|
offset is within +-128 ms, as in the PPS peer. Under these conditions
|
|
the kernel disregards updates produced by the NTP daemon and uses its
|
|
internal PPS source instead. The kernel maintains a watchdog timer for
|
|
the PPS signal; if the signal has not been heard or is out of tolerance
|
|
for more than some interval, currently two minutes, the kernel
|
|
discipline is declared inoperable and operation continues as if it were
|
|
not present.
|
|
|
|
<hr><address>David L. Mills (mills@udel.edu)</address></body></html>
|