134 lines
5.5 KiB
HTML
134 lines
5.5 KiB
HTML
<!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN"
|
|
"http://www.w3.org/TR/html4/loose.dtd">
|
|
|
|
<html>
|
|
|
|
<head>
|
|
|
|
<title>Postfix Queue Scheduler</title>
|
|
|
|
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
|
|
|
|
</head>
|
|
|
|
<body>
|
|
|
|
<h1><img src="postfix-logo.jpg" width="203" height="98" ALT="">Postfix Queue Scheduler</h1>
|
|
|
|
<hr>
|
|
|
|
<h2>What this file is about</h2>
|
|
|
|
<p> This is the beginning of documentation for the clever queue
|
|
manager scheduling algorithm by Patrik Rak. For a long time, this
|
|
code was made available under the name "nqmgr(8)" (new queue manager),
|
|
as an optional module. As of Postfix 2.1 this is the default queue
|
|
manager, which is always called "qmgr(8)". The old queue manager will
|
|
for some time will be available under the name of "oqmgr(8)". </p>
|
|
|
|
<h2>Why the old Postfix queue manager was replaced</h2>
|
|
|
|
<p> The old Postfix scheduler had several limitations due to
|
|
unfortunate choices in its design. </p>
|
|
|
|
<ol>
|
|
|
|
<li> <p> Round-robin selection by destination for mail that is
|
|
delivered via the same message delivery transport. The round-robin
|
|
strategy was chosen with the intention to prevent a single
|
|
(destination) site from using up too many mail delivery resources.
|
|
However, that strategy penalized inbound mail on bi-directional
|
|
gateways. The poor suffering inbound destination would be
|
|
selected only 1/number-of-destinations of the time, even when
|
|
it had more mail than other destinations, and thus mail could
|
|
be delayed. </p>
|
|
|
|
<p> Victor Duchovni found a workaround: use different message
|
|
delivery transports, and thus avoid the starvation problem.
|
|
The Patrik Rak scheduler solves this problem by using FIFO
|
|
selection. </p>
|
|
|
|
<li> <p> A second limitation of the old Postfix scheduler was
|
|
that delivery of bulk mail would block all other deliveries,
|
|
causing large delays. Patrik Rak's scheduler allows mail with
|
|
fewer recipients to slip past bulk mail in an elegant manner.
|
|
</p>
|
|
|
|
</ol>
|
|
|
|
<h2>How the queue manager scheduler works </h2>
|
|
|
|
<p> The following text is from Patrik Rak and should be read together
|
|
with the postconf(5) manual that describes each configuration
|
|
parameter in detail. </p>
|
|
|
|
<p> From user's point of view, oqmgr(8) and qmgr(8) are both the same,
|
|
except for how next message is chosen when delivery agent becomes
|
|
available. You already know that oqmgr(8) uses round-robin by destination
|
|
while qmgr(8) uses simple FIFO, except for some preemptive magic.
|
|
The postconf(5) manual documents all the knobs the user
|
|
can use to control this preemptive magic - there is nothing else
|
|
to the preemption than the quite simple conditions described in there.
|
|
</p>
|
|
|
|
<p> As for programmer-level documentation, this will have to be
|
|
extracted from all those emails we have exchanged with Wietse [rats!
|
|
I hoped that Patrik would do the work for me -- Wietse] But I think
|
|
there are no missing bits which we have not mentioned in our
|
|
conversations. </p>
|
|
|
|
<p> However, even from programmer's point of view, there is nothing
|
|
more to add to the message scheduling idea itself. There are few
|
|
things which make it look more complicated than it is, but the
|
|
algorithm is the same as the user perceives it. The summary of the
|
|
differences of the programmer's view from the user's view are: </p>
|
|
|
|
<ol>
|
|
|
|
<li> <p> Simplification of terms for users: The user knows
|
|
about messages and recipients. The program itself works with
|
|
jobs (one message is split among several jobs, one per each
|
|
transport needed to deliver the message) and queue entries
|
|
(each entry may group several recipients for same destination).
|
|
Then there is the peer structure introduced by qmgr(8) which is
|
|
simply per-job analog of the queue structure. </p>
|
|
|
|
<li> <p> Dealing with concurrency limits: The actual implementation
|
|
is complicated by the fact that the messages (resp. jobs) may
|
|
not be delivered in the exactly scheduled order because of the
|
|
concurrency limits. It is necessary to skip some "blocker" jobs
|
|
when the concurrency limit is reached and get back to them
|
|
again when the limit permits. </p>
|
|
|
|
<li> <p> Dealing with resource limits: The actual implementation is
|
|
complicated by the fact that not all recipients may be read in-core.
|
|
Therefore each message has some recipients in-core and some may
|
|
remain on-file. This means that a) the preemptive algorithm needs
|
|
to work with recipient count estimates instead of exact counts, b)
|
|
there is extra code which needs to manipulate the per-transport
|
|
pool of recipients which may be read in-core at the same time, and
|
|
c) there is extra code which needs to be able to read recipients
|
|
into core in batches and which is triggered at appropriate moments. </p>
|
|
|
|
<li> <p> Doing things efficiently: All important things I am
|
|
aware of are done in the minimum time possible (either directly
|
|
or at least when amortized complexity is used), but to choose
|
|
which job is the best candidate for preempting the current job
|
|
requires linear search of up to all transport jobs (the worst
|
|
theoretical case - the reality is much better). As this is done
|
|
every time the next queue entry to be delivered is about to be
|
|
chosen, it seemed reasonable to add cache which minimizes the
|
|
overhead. Maintenance of this candidate cache slightly obfuscates
|
|
things.
|
|
|
|
</ol>
|
|
|
|
<p> The points 2 and 3 are those which made the implementation
|
|
(look) complicated and were the real coding work, but I believe
|
|
that to understand the scheduling algorithm itself (which was the
|
|
real thinking work) is fairly easy. </p>
|
|
|
|
</body>
|
|
|
|
</html>
|