NetBSD/usr.sbin/xntp/html/prefer.html

327 lines
18 KiB
HTML

<!-- $NetBSD: prefer.html,v 1.1 1998/12/30 20:20:36 mcr Exp $ --><
!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML Strict//EN">
<html><head><title>
Mitigation Rules and the ``prefer'' Keyword
</title></head><body><h3>
Mitigation Rules and the <code>prefer</code> Keyword
</h3><hr>
<p><h4>Introduction</h4>
<p>The mechanics of the NTP algorithms which select the best data sample
from each available peer and the best subset of the peer population have
been finely crafted to resist network jitter, faults in the network or
peer operations, and to deliver the best possible accuracy. Most of the
time these algorithms do a good job without requiring explicit manual
tailoring of the configuration file. However, there are times when the
accuracy can be improved by some careful tailoring. The following
sections explain how to do this using explicit configuration items and
special signals, when available, that are generated by some radio
clocks.
<p>In order to provide robust backup sources, primary (stratum-1)
servers are usually operated in a diversity configuration, in which the
server operates with a number of remote peers in addition to one or more
radio or modem clocks operating as local peers. In these configurations
the suite of algorithms used in NTP to refine the data from each peer
separately and to select and weight the data from a number of peers are
used with the entire ensemble of remote peers and local peers. As the
result of these algorithms, a set of <i>survivors</i> are identified
which can presumably provide the most reliable and accurate time.
Ordinarily, the individual clock offsets of the survivors are combined
on a weighted average basis to produce an offset used to control the
system clock.
<p>However, because of small but significant systematic time offsets
between the survivors, it is in general not possible to achieve the
lowest jitter and highest stability in these configurations. This
happens because the selection algorithm tends to <i>clockhop</i> between
survivors of substantially the same quality, but showing small
systematic offsets between them. In addition, there are a number of
configurations involving pulse-per-second (PPS) signals, modem backup
services and other special cases, so that a set of mitigation rules
becomes necessary to select a single peer from among the survivors.
These rules are based on a set of special characteristics of the various
peers and reference clock drivers specified in the configuration file.
<p><h4>The <code>prefer</code> Peer</h4>
<p>The mitigation rules are designed to provide an intelligent selection
between various peers of substantially the same statistical quality.
They is designed to provide the best quality time without compromising
the normal operation of the NTP algorithms. The mitigation scheme in its
present form is not an integral component of the NTP specification RFC-
1305. but is likely to be included in future versions of the
specification. The scheme is based on the concept of <i>prefer peer</i>,
which is specified by including the <code>prefer</code> keyword with the
associated <code>server</code> or <code>peer</code> command in the
configuration file. This keyword can be used with any peer or server,
but is most commonly used with a radio clock. While the scheme does not
forbid it, it does not seem useful to designate more than one peer as
preferred, since the additional complexities to mitigate among them do
not seem justified from on the air experience.
<p>The prefer scheme works on the set of peers that have survived the
sanity checks and intersection algorithms of the clock selection
procedures. Ordinarily, the members of this set can be considered
<i>truechimers</i> and any one of them could in principle provide
correct time; however, due to various error contributions, not all can
provide the most stable time. The job of the clustering algorithm, which
is invoked at this point, is to select the best subset of the survivors
providing the least variance in the combined ensemble, compared to the
variance in each member of the subset. The detailed operation of the
clustering algorithm, which is given in the specification, is not
important here, other than to point out it operates in rounds, where a
survivor, presumably the worst of the lot, is discarded in each round
until one of several termination conditions is met.
<p>In the prefer scheme the clustering algorithm is modified so that the
prefer peer is never discarded; on the contrary, its potential removal
becomes a termination condition. If the original algorithm were about to
toss out the prefer peer, the algorithm terminates right there. The
prefer peer can still be discarded by the sanity checks and intersection
algorithms, of course, but it will always survive the clustering
algorithm. The prefer peer is used as long as it survives the sanity
checks and intersection algorithm. If it does not survive or for some
reason it fails to provide updates, it will eventually become
unreachable and the clock selection will remitigate to select the next
best source.
<p>Along with this behavior, the clock selection procedures are modified
so that the combining algorithm is not used when a prefer peer is
present. Instead, the offset of the prefer peer is used exclusively as
the synchronization source. In the usual case involving a radio clock
and a flock of remote stratum-1 peers, and with the radio clock
designated a prefer peer, the result is that the high quality radio time
disciplines the server clock as long as the radio itself remains
operational and with valid time, as determined from the remote peers,
sanity checks and intersection algorithm.
<p><h4>Peer Classification</h4>
<p>In order to understand the effects of the various intricate schemes
involved, it is necessary to understand some arcane details on how the
algorithms decide on a synchronization source, when more than one source
is available. This is done on the basis of a set of explicit mitigation
rules, which define special classes of remote and local peers as a
function of configuration declarations and reference clock driver type:
<ol>
<li>The prefer peer is designated using the <code>prefer</code>
keyword with the <code>server</code> or <code>peer</code> commands. All
other things being equal, this peer will be selected for synchronization
over all other survivors of the clock selection procedures.
<p><li>When a PPS signal is connected via the PPS Clock Discipline
driver (type 22), this is called the <i>PPS peer</i>. This driver
provides precision clock corrections only within one second, so is
always operated in conjunction with another peer or reference clock
driver, which provides the seconds numbering. The PPS peer is active
only under conditions explained below.
<p><li>When the Undisciplined Local Clock driver (type 1) is configured,
this is called the <i>local-clock peer</i>. This is used either as a
backup reference source (stratum greater than zero), should all other
synchronization sources fail, or as the primary reference source
(stratum zero) in cases where the kernel time is disciplined by some
other means of synchronization, such as the NIST <code>lockclock</code>
scheme, or another synchronization protocol, such as the Digital Time
Synchronization Service (DTSS).
<p><li>When a modem driver such as the Automated Computer Time Service
driver (type 18) is configured, this is called the <i>modem peer</i>.
This is used either as a backup reference source, should all other
primary sources fail, or as the (only) primary reference source.
<p><li>Where support is available, the PPS signal may be processed
directly by the kernel, as described in the <a href="kern.html">A Kernel
Model for Precision Timekeeping</a> page. This is called the <i>kernel
discipline</i>. The PPS signal can discipline the kernel in both
frequency and time. The frequency discipline is active as long as the
PPS signal itself is operating correctly, as determined by the kernel
algorithms. The time discipline is active only under conditions
explained below.
</ol>
<p>Reference clock drivers operate in the manner described in the <a
href="refclock.html">Reference Clock Drivers</a> page and its
dependencies. The drivers are ordinarily operated at stratum zero, so
that as the result of ordinary NTP operations, the server itself
operates at stratum one, as required by the NTP specification RFC-1305.
In some cases described below, the driver is intentionally operated at
an elevated stratum, so that it will be selected only if no other
survivor is present with a lower stratum. In the case of the PPS peer or
kernel time discipline, these sources appear active only if the prefer
peer has survived the intersection and clustering algorithms, as
described below, and its clock offset relative to the current local
clock is less than a specified value, currently +-128 ms.
<p>The modem clock driver is a special case. Ordinarily, the update
interval between modem calls to synchronize the system clock is many
times longer than the interval between polls of either the remote or
local peers. In order to provide the best stability, the operation of
the clock discipline algorithm changes from a phase-lock mode at the
shorter update intervals to a frequency-lock mode at the longer update
intervals. If both remote or local peers together with a modem peer are
operated in the same configuration, what can happen is that first the
clock selection algorithm can select one or more remote/local peers and
the clock discipline algorithm will optimize for the shorter update
intervals. Then, the selection algorithm can select the modem peer,
which requires a much different optimization. The intent in the design
is to allow the modem peer to control the system clock either when no
other source is available or, if the modem peer happens to be marked as
prefer, then it always controls the clock, as long as it passes the
sanity checks and intersection algorithm. There still is room for
suboptimal operation in this scheme, since a noise spike can still cause
a clockhop either way. Nevertheless, the optimization function is slow
to adapt, so that a clockhop or two does not cause much harm.
<h4>Mitigation Rules</h4>
<p>The mitigation rules apply in the intersection and clustering
algorithms described in the NTP specification. The intersection
algorithm first scans all peers with a persistent association and
includes only those that satisfy specified sanity checks. In addition to
the checks required by the specification, the mitigation rules require
either the local-clock peer or modem peer to be included only if marked
as the prefer peer. The intersection algorithm operates on the included
population to select only those peers believed to represent the correct
time. If one or more peers survive the operation, processing continues
in the clustering algorithm. Otherwise, if there is a modem peer, it is
declared the only survivor; otherwise, if there is a local-clock peer,
it is declared the only survivor. Processing then continues in the
clustering algorithm.
<p>The clustering algorithm repeatedly discards outlyers in order to
reduce the residual jitter in the survivor population. As required by
the NTP specification, these operations continue until either a
specified minimum number of survivors remain or the minimum select
dispersion of the population is greater than the maximum peer dispersion
of any member. The mitigation rules require an additional terminating
condition which stops these operations at the point where the prefer
peer is about to be discarded.
<p>The mitigation rules establish the choice of <i>system peer</i>,
which determine the stratum, reference identifier and several other
system variables which are visible to clients of the local server. In
addition, they establish which source or combination of sources control
the local clock.
<ol>
<li>If there is a prefer peer and it is the local-clock peer or the
modem peer; or, if there is a prefer peer and the kernel time discipline
is active, choose the prefer peer as the system peer and its offset as
the system clock offset. If the prefer peer is the local-clock peer, an
offset can be calculated by the driver to produce a frequency offset in
order to correct for systematic frequency errors. In case a source other
than NTP is controlling the system clock, corrections determined by NTP
can be ignored by using the <code>disable pll</code> in the
configuration file. If the prefer peer is the modem peer, it must be the
primary source for the reasons noted above. If the kernel time
discipline is active, the system clock offset is ignored and the
corrections handled directly by the kernel.
<p><li>If the above is not the case and there is a PPS peer, then choose
it as the system peer and its offset as the system clock offset.
<p><li>If the above is not the case and there is a prefer peer (not the
local-clock or modem peer in this case), then choose it as the system
peer and its offset as the system clock offset.
<p><li>If the above is not the case and the peer previously chosen as
the system peer is in the surviving population, then choose it as the
system peer and average its offset along with the other survivors to
determine the system clock offset. This behavior is designed to avoid
excess jitter due to clockhopping, when switching the system peer would
not materially improve the time accuracy.
<p><li>If the above is not the case, then choose the first candidate in
the list of survivors ranked in order of synchronization distance and
average its offset along with the other survivors to determine the
system clock offset. This is the default case and the only case
considered in the current NTP specification.
</ol>
<p><h4>Using the Pulse-per-Second (PPS) Signal</h4>
<p>Most radio clocks are connected using a serial port operating at
speeds of 9600 bps or higher. The accuracy using typical timecode
formats, where the on-time epoch is indicated by a designated ASCII
character, like carriage-return &lt;cr&gt;, is limited to a millisecond
at best and a few milliseconds in typical cases. However, some radios
produce a PPS signal which can be used to improve the accuracy in
typical workstation servers to the order of a few tens of microseconds.
The details of how this can be accomplished are discussed in the <a
href="pps.html">Pulse-per-second (PPS) Signal Interfacing</a>
page. The following paragraphs discuss how the PPS signal is affected by
the mitigation rules.
<p>First, it should be pointed out that the PPS signal is inherently
ambiguous, in that it provides a precise seconds epoch, but does not
provide a way to number the seconds. In principle and most commonly,
another source of synchronization, either the timecode from an
associated radio clock, or even one or more remote peers, is available
to perform that function. In all cases, a specific, configured peer or
server must be designated as associated with the PPS signal. This is
done using the <code>prefer</code> keyword as described previously. The
PPS signal can be associated in this way any peer, but is most commonly
used with the radio clock generating the PPS signal.
<p>In order to operate, the PPS driver must be enabled by the
<code>enable pps</code> command in the configuration file and the signal
must be present and within nominal jitter and wander error tolerances.
In addition, its associated prefer peer must have survived the sanity
checks and intersection algorithms and have become active. This insures
that the radio clock hardware is operating correctly and that,
presumably, the PPS signal is operating correctly as well. Second, the
absolute time offset from that peer must be less than
<code>CLOCK_MAX</code>, the gradual-adjustment range, which is
ordinarily set at +-128 ms, or well within the +-0.5-s unambiguous range
of the PPS signal itself. Finally, the time offsets generated by the PPS
peer are propagated via the clock filter to the clock selection
procedures just like any other peer. Should these pass the sanity checks
and intersection algorithms, they will show up along with the offsets of
the prefer peer itself. Note that, unlike the prefer peer, the PPS peer
samples are not protected from discard by the clustering algorithm.
These complicated procedures insure that the PPS offsets developed in
this way are the most accurate, reliable available for synchronization.
<p>The PPS peer remains active as long as it survives the intersection
algorithm and the prefer peer is active; however, like any other clock
driver, it runs a reachability algorithm on the PPS signal itself. If
for some reason the signal fails or displays gross errors, the PPS peer
will either become unreachable or stray out of the survivor population.
In this case the clock selection remitigates as described above.
<p><h4>Using the Kernel Discipline</h4>
<p>Code to implement the kernel discipline is a special feature that can
be incorporated in the kernel of some workstations as described in the
<br><a href = "kern.html"> A Kernel Model for Precision Timekeeping </a>
page. The discipline provides for the control of the local clock
oscillator time and/or frequency by means of an external PPS signal
interfaced via a modem control lead. As the PPS signal is derived from
external equipment, cables, etc., which sometimes fail, a good deal of
error checking is done in the kernel to detect signal failure and
excessive noise. The way in which the mitigation rules affect the kernel
discipline is as follows.
<p>In order to operate, the kernel discipline must be enabled by the
<code>enable pps</code> command in the configuration file and the signal
must be present and within nominal jitter and wander error tolerances.
In the NTP daemon, the kernel time discipline is active only when the
prefer peer is among the survivors of the clustering algorithm, and its
offset is within +-128 ms, as in the PPS peer. Under these conditions
the kernel disregards updates produced by the NTP daemon and uses its
internal PPS source instead. The kernel maintains a watchdog timer for
the PPS signal; if the signal has not been heard or is out of tolerance
for more than some interval, currently two minutes, the kernel
discipline is declared inoperable and operation continues as if it were
not present.
<hr><address>David L. Mills (mills@udel.edu)</address></body></html>