293 lines
16 KiB
HTML
293 lines
16 KiB
HTML
|
<html><head><title>
|
||
|
Executive Summary - Computer Network Time Synchronization
|
||
|
</title></head><body><H3>
|
||
|
Executive Summary - Computer Network Time Synchronization
|
||
|
</h3>
|
||
|
|
||
|
<img align=left src=pic/alice12.gif>
|
||
|
from <i>Alice's Adventures in Wonderland</i>, by Lewis Carroll,
|
||
|
illustrations by Sir John Tenniel
|
||
|
|
||
|
<p>The executive is the one on the left.
|
||
|
|
||
|
<br clear=left><hr>
|
||
|
|
||
|
<h4>Introduction</h4>
|
||
|
|
||
|
<p>The standard timescale used by most nations of the world is Universal
|
||
|
Coordinated Time (UTC), which is based on the Earth's rotation about its
|
||
|
axis, and the Gregorian Calendar, which is based on the Earth's rotation
|
||
|
about the Sun. The UTC timescale is disciplined with respect to
|
||
|
International Atomic Time (TAI) by inserting leap seconds at intervals
|
||
|
of about 18 months. UTC time is disseminated by various means, including
|
||
|
radio and satellite navigation systems, telephone modems and portable
|
||
|
clocks.
|
||
|
|
||
|
<p>Special purpose receivers are available for many time-dissemination
|
||
|
services, including the Global Position System (GPS) and other services
|
||
|
operated by various national governments. For reasons of cost and
|
||
|
convenience, it is not possible to equip every computer with one of
|
||
|
these receivers. However, it is possible to equip some number of
|
||
|
computers acting as primary time servers to synchronize a much larger
|
||
|
number of secondary servers and clients connected by a common network.
|
||
|
In order to do this, a distributed network clock synchronization
|
||
|
protocol is required which can read a server clock, transmit the reading
|
||
|
to one or more clients and adjust each client clock as required.
|
||
|
Protocols that do this include the Network Time Protocol (NTP), Digital
|
||
|
Time Synchronization Protocol (DTSS) and others found in the literature
|
||
|
(See "Further Reading" at the end of this article.)
|
||
|
|
||
|
<h4>Protocol Design Issues</h4>
|
||
|
|
||
|
<p>The synchronization protocol determines the time offset of the server
|
||
|
clock relative to the client clock. The various synchronization
|
||
|
protocols in use today provide different means to do this, but they all
|
||
|
follow the same general model. On request, the server sends a message
|
||
|
including its current clock value or <i>timestamp</i> and the client
|
||
|
records its own timestamp upon arrival of the message. For the best
|
||
|
accuracy, the client needs to measure the server-client propagation
|
||
|
delay to determine its clock offset relative to the server. Since it is
|
||
|
not possible to determine the one-way delays, unless the actual clock
|
||
|
offset is known, the protocol measures the total roundtrip delay and
|
||
|
assumes the propagation times are statistically equal in each direction.
|
||
|
In general, this is a useful approximation; however, in the Internet of
|
||
|
today, network paths and the associated delays can differ significantly
|
||
|
due to the individual service providers.
|
||
|
|
||
|
<p>The community served by the synchronization protocol can be very
|
||
|
large. For instance, the NTP community in the Internet of 1998 includes
|
||
|
over 230 primary time servers, synchronized by radio, satellite and
|
||
|
modem, and well over 100,000 secondary servers and clients. In addition,
|
||
|
there are many thousands of private communities in large government,
|
||
|
corporate and institution networks. Each community is organized as a
|
||
|
tree graph or <i>subnet</i>, with the primary servers at the root and
|
||
|
secondary servers and clients at increasing hop count, or stratum level,
|
||
|
in corporate, department and desktop networks. It is usually necessary
|
||
|
at each stratum level to employ redundant servers and diverse network
|
||
|
paths in order to protect against broken software, hardware and network
|
||
|
links.
|
||
|
<p>Synchronization protocols work in one or more association modes,
|
||
|
depending on the protocol design. Client/server mode, also called
|
||
|
master/slave mode, is supported in both DTSS and NTP. In this mode, a
|
||
|
client synchronizes to a stateless server as in the conventional RPC
|
||
|
model. NTP also supports symmetric mode, which allows either of two peer
|
||
|
servers to synchronize to the other, in order to provide mutual backup.
|
||
|
DTSS and NTP support a broadcast mode which allows many clients to
|
||
|
synchronize to one or a few servers, reducing network traffic when large
|
||
|
numbers of clients are involved. In NTP, IP multicast can be used when
|
||
|
the subnet spans multiple networks.
|
||
|
|
||
|
<p>Configuration management can be a serious problem in large subnets.
|
||
|
Various schemes which index public databases and network directory
|
||
|
services are used in DTSS and NTP to discover servers. Both protocols
|
||
|
use broadcast modes to support large client populations; but, since
|
||
|
listen-only clients cannot calibrate the delay, accuracy can suffer. In
|
||
|
NTP, clients determine the delay at the time a server is first
|
||
|
discovered by polling the server in client/server mode and then
|
||
|
reverting to listen-only mode. In addition, NTP clients can broadcast a
|
||
|
special "manycast" message to solicit responses from nearby servers and
|
||
|
continue in client/server mode with the respondents.
|
||
|
|
||
|
<h4>Computer Clock Modelling and Error Analysis</h4>
|
||
|
|
||
|
Most computers include a quartz resonator-stabilized oscillator and
|
||
|
hardware counter that interrupts the processor at intervals of a few
|
||
|
milliseconds. At each interrupt, a quantity called <i>tick</i> is added
|
||
|
to a system variable representing the clock time. The clock can be read
|
||
|
by system and application programs and set on occasion to an external
|
||
|
reference. Once set, the clock readings increment at a nominal rate,
|
||
|
depending on the value of <i>tick</i>. Typical Unix system kernels
|
||
|
provide a programmable mechanism to increase or decrease the value of
|
||
|
<i>tick</i> by a small, fixed amount in order to amortize a given time
|
||
|
adjustment smoothly over multiple <i>tick</i> intervals.
|
||
|
|
||
|
<p>Clock errors are due to variations in network delay and latencies in
|
||
|
computer hardware and software (jitter), as well as clock oscillator
|
||
|
instability (wander). The time of a client relative to its server can be
|
||
|
expressed
|
||
|
|
||
|
<p><center><i>T</i>(<i>t</i>) = <i>T</i>(<i>t</i><sub>0</sub>) +
|
||
|
<i>R</i>(<i>t - t</i><sub>0</sub>) + 1/2 <i>D</i>(<i>t -
|
||
|
T</i><sub>0</sub>)<sup>2</sup>,</center>
|
||
|
|
||
|
<p>where <i>t</i> is the current time, <i>T</i> is the time offset at
|
||
|
the last measurement update <i>t</i><sub>0</sub>, <i>R</i> is the
|
||
|
frequency offset and <i>D</i> is the drift due to resonator ageing. All
|
||
|
three terms include systematic offsets that can be corrected and random
|
||
|
variations that cannot. Some protocols, including DTSS, estimate only
|
||
|
the first term in this expression, while others, including NTP, estimate
|
||
|
the first two terms. Errors due to the third term, while important to
|
||
|
model resonator aging in precision applications, are neglected, since
|
||
|
they are usually dominated by errors in the first two terms.
|
||
|
|
||
|
<p>The synchronization protocol estimates <i>T</i>(<i>t</i><sub>0</sub>)
|
||
|
(and <i>R</i>(<i>t</i><sub>0</sub>), where relevant) at regular
|
||
|
intervals <font face="symbol">t</font> and adjusts the clock to minimize
|
||
|
<i>T</i>(<i>t</i>) in future. In common cases, <i>R</i> can have
|
||
|
systematic offsets of several hundred parts-per-million (PPM) with
|
||
|
random variations of several PPM due to ambient temperature changes. If
|
||
|
not corrected, the resulting errors can accumulate to seconds per day.
|
||
|
In order that these errors do not exceed a nominal specification, the
|
||
|
protocol must periodically re-estimate <i>T</i> and <i>R</i> and
|
||
|
compensate for variations by adjusting the clock at regular intervals.
|
||
|
As a practical matter, for nominal accuracies of tens of milliseconds,
|
||
|
this requires clients to exchange messages with servers at intervals in
|
||
|
the order of tens of minutes.
|
||
|
|
||
|
<p>Analysis of quartz-resonator stabilized oscillators show that errors
|
||
|
are a function of the averaging time, which in turn depends on the
|
||
|
interval between corrections. At correction intervals less than a few
|
||
|
hundred seconds, errors are dominated by jitter, while, at intervals
|
||
|
greater than this, errors are dominated by wander. As explained later,
|
||
|
the characteristics of each regime determine the algorithm used to
|
||
|
discipline the clock. These errors accumulate at each stratum level from
|
||
|
the root to the leaves of the subnet tree. It is possible to quantify
|
||
|
these errors by statistical means, as in NTP. This allows real-time
|
||
|
applications to adjust audio or video playout delay, for example.
|
||
|
However, the required statistics may be different for various classes of
|
||
|
applications. Some applications need absolute error bounds guaranteed
|
||
|
never to exceeded, as provided by the following correctness principles.
|
||
|
|
||
|
<h4>Correctness Principles</h4>
|
||
|
|
||
|
<p>Applications requiring reliable time synchronization such as air
|
||
|
traffic control must have confidence that the local clock is correct
|
||
|
within some bound relative to a given timescale such as UTC. There is a
|
||
|
considerable body of literature that studies these issues with respect
|
||
|
to various failure models such as fail-stop and Byzantine disagreement.
|
||
|
While these models inspire much confidence in a theoretical setting,
|
||
|
most require multiple message rounds for each measurement and would be
|
||
|
impractical in a large computer network such as the Internet. However,
|
||
|
it can be shown that the worst-case error in reading a remote server
|
||
|
clock cannot exceed one-half the roundtrip delay measured by the client.
|
||
|
This is a valuable insight, since it permits strong statements about the
|
||
|
correctness of the timekeeping system.
|
||
|
|
||
|
<p>In the Probabilistic Clock Synchronization (PCS) scheme devised by
|
||
|
Cristian, a maximum error tolerance is established in advance and time
|
||
|
value samples associated with roundtrip delays that exceed twice this
|
||
|
value are discarded. By the above argument, the remaining samples must
|
||
|
represent time values within the specified tolerance. As the tolerance
|
||
|
is decreased, more samples fail the test until a point where no samples
|
||
|
survive. The tolerance can be adjusted for the best compromise between
|
||
|
the highest accuracy consistent with acceptable sample survival rate.
|
||
|
|
||
|
<p>In a scheme devised by Marzullo and exploited in NTP and DTSS, the
|
||
|
worst-case error determined for each server determines a correctness
|
||
|
interval. If each of a number of servers are in fact synchronized to a
|
||
|
common timescale, the actual time must be contained in the intersection
|
||
|
of their correctness intervals. If some intervals do not intersect, then
|
||
|
the clique containing the maximum number of intersections is assumed
|
||
|
correct <i>truechimers</i> and the others assumed incorrect
|
||
|
<i>false<i>tick</i>ers</i>. Only the truechimers are used to adjust the
|
||
|
system
|
||
|
clock.
|
||
|
|
||
|
<h4>Data Grooming Algorithms</h4>
|
||
|
|
||
|
By its very nature, clock synchronization is a continuous process,
|
||
|
resulting in a sequence of measurements with each of possibly several
|
||
|
servers and resulting in a clock adjustment. In some protocols, crafted
|
||
|
algorithms are used to improve the time and frequency estimates and
|
||
|
refine the clock adjustment. Algorithms described in the literature are
|
||
|
based on trimmed-mean and median filter methods. The clock filter
|
||
|
algorithm used in NTP is based on the above observation that the
|
||
|
correctness interval depends on the roundtrip delay. The algorithm
|
||
|
accumulates offset/delay samples in a window of several samples and
|
||
|
selects the offset sample associated with the minimum delay. In general,
|
||
|
larger window sizes provide better estimates; however, stability
|
||
|
considerations limit the window size to about eight.
|
||
|
<p>The same principle could be used when selecting the best subset of
|
||
|
servers and combining their offsets to determine the clock adjustment.
|
||
|
However, different servers often show different systematic offsets, so
|
||
|
the best statistic for the central tendency of the server population may
|
||
|
not be obvious. Various kinds of clustering algorithms have been found
|
||
|
useful for this purpose. The one used in NTP sorts the offsets by a
|
||
|
quality metric, then calculates the variance of all servers relative to
|
||
|
each server separately. The algorithm repeatedly discards the outlyer
|
||
|
with the largest variance until further discards will not improve the
|
||
|
residual variance or until a minimum number of servers remain. The final
|
||
|
clock adjustment is computed as a weighted average of the survivors.
|
||
|
|
||
|
<p>At the heart of the synchronization protocol is the algorithm used to
|
||
|
adjust the system clock in accordance with the final adjustment
|
||
|
determined by the above algorithms. This is called the clock discipline
|
||
|
algorithm or simply the discipline. Such algorithms can be classed
|
||
|
according to whether they minimize the time offset or frequency offset
|
||
|
or both. For instance, the discipline used in DTSS minimizes only the
|
||
|
time offset, while the one used in NTP minimizes both time and frequency
|
||
|
offsets. While the DTSS algorithm cannot remove residual errors due to
|
||
|
systematic frequency errors, the NTP algorithm is more complicated and
|
||
|
less forgiving of design and implementation mistakes.
|
||
|
|
||
|
<p>All clock disciplines function as a feedback loop, with measured
|
||
|
offsets used to adjust the clock oscillator phase and frequency to match
|
||
|
the external synchronization source. The behavior of feedback loops is
|
||
|
well understood and modelled by mathematical analysis. The significant
|
||
|
design parameter is the time constant, or responsiveness to external or
|
||
|
internal variations in time or frequency. Optimum selection of time
|
||
|
constant depends on the interval between update messages. In general,
|
||
|
the longer these intervals, the larger the time constant and vice versa.
|
||
|
In practice and with typical network configurations the optimal poll
|
||
|
intervals vary between one and twenty minutes for network paths to some
|
||
|
thousands of minutes for modem paths.
|
||
|
|
||
|
<h4>Further Reading</h4>
|
||
|
|
||
|
<ol>
|
||
|
|
||
|
<p><li>Cristian, F. Probabilistic clock synchronization. In Distributed
|
||
|
Computing 3, Springer Verlag, 1989, 146-158.</li>
|
||
|
|
||
|
<p><li>Digital Time Service Functional Specification Version T.1.0.5.
|
||
|
DigitalEquipment Corporation, 1989.</li>
|
||
|
|
||
|
<p><li>Gusella, R., and S. Zatti. TEMPO - A network time controller for
|
||
|
a distributed Berkeley UNIX system. IEEE Distributed Processing
|
||
|
Technical Committee Newsletter 6, NoSI-2 (June 1984), 7-15. Also in:
|
||
|
Proc. Summer 1984 USENIX (Salt Lake City, June 1984).</li>
|
||
|
|
||
|
<p><li>Kopetz, H., and W. Ochsenreiter. Clock synchronization in
|
||
|
distributed real-time systems. IEEE Trans. Computers C-36, 8 (August
|
||
|
1987), 933-939.</li>
|
||
|
|
||
|
<p><li>Lamport, L., and P.M. Melliar-Smith. Synchronizing clocks in the
|
||
|
presence of faults. JACM 32, 1 (January 1985), 52-78.</li>
|
||
|
|
||
|
<p><li>Marzullo, K., and S. Owicki. Maintaining the time in a
|
||
|
distributed system. ACM Operating Systems Review 19, 3 (July 1985), 44-
|
||
|
54.</li>
|
||
|
|
||
|
<p><li>Mills, D.L. Internet time synchronization: the Network Time
|
||
|
Protocol. IEEE Trans. Communications COM-39, 10 (October 1991), 1482-
|
||
|
1493. Also in: Yang, Z., and T.A. Marsland (Eds.). Global States and
|
||
|
Time in Distributed Systems, IEEE Press, Los Alamitos, CA, 91-102.</li>
|
||
|
<p><li>Mills, D.L. Modelling and analysis of computer network clocks.
|
||
|
Electrical Engineering Department Report 92-5-2, University of Delaware,
|
||
|
May 1992, 29 pp.</li>
|
||
|
|
||
|
<p><li>NIST Time and Frequency Dissemination Services. NBS Special
|
||
|
Publication432 (Revised 1990), National Institute of Science and
|
||
|
Technology, U.S. Department of Commerce, 1990.</li>
|
||
|
|
||
|
<p><li>Schneider, F.B. A paradigm for reliable clock synchronization.
|
||
|
Department of Computer Science Technical Report TR 86-735, Cornell
|
||
|
University, February 1986.</li>
|
||
|
|
||
|
<p><li>Srikanth, T.K., and S. Toueg. Optimal clock synchronization. JACM
|
||
|
34, 3 (July 1987), 626-645.</li>
|
||
|
|
||
|
<p><li>Stein, S.R. Frequency and time - their measurement and
|
||
|
characterization (Chapter 12). In: E.A. Gerber and A. Ballato (Eds.).
|
||
|
Precision Frequency Control, Vol. 2, Academic Press, New York 1985, 191-
|
||
|
232, 399-416. Also in: Sullivan, D.B., D.W. Allan, D.A. Howe and F.L.
|
||
|
Walls (Eds.). Characterization of Clocks and Oscillators. National
|
||
|
Institute of Standards and Technology Technical Note 1337, U.S.
|
||
|
Government Printing Office (January, 1990), TN61-TN119.</li>
|
||
|
|
||
|
</ol>
|
||
|
|
||
|
<hr><a href=index.htm><img align=left src=pic/home.gif></a><address><a
|
||
|
href=mailto:mills@udel.edu> David L. Mills <mills@udel.edu></a>
|
||
|
</address></a></body></html>
|