263 lines
10 KiB
HTML
263 lines
10 KiB
HTML
|
<html>
|
||
|
<body bgcolor=white>
|
||
|
<h1>Node Monitoring</h1>
|
||
|
<h6>
|
||
|
Creation Date: January 16, 2003<br>
|
||
|
Author(s): Axel Dörfler
|
||
|
</h6>
|
||
|
|
||
|
This document describes the feature of the BeOS kernel to monitor nodes. First,
|
||
|
there is an explanation of what kind of functionality we have to reproduce (along
|
||
|
with the higher level API), then we will present the implementation in OpenBeOS.
|
||
|
|
||
|
<h2>Requirements - Exported Functionality in BeOS</h2>
|
||
|
|
||
|
From user-level, BeOS exports the following API as found in the storage/NodeMonitor.h
|
||
|
header file:
|
||
|
|
||
|
<pre>
|
||
|
status_t watch_node(const node_ref *node,
|
||
|
uint32 flags,
|
||
|
BMessenger target);
|
||
|
|
||
|
status_t watch_node(const node_ref *node,
|
||
|
uint32 flags,
|
||
|
const BHandler *handler,
|
||
|
const BLooper *looper = NULL);
|
||
|
|
||
|
status_t stop_watching(BMessenger target);
|
||
|
|
||
|
status_t stop_watching(const BHandler *handler,
|
||
|
const BLooper *looper = NULL);
|
||
|
</pre>
|
||
|
|
||
|
The kernel also exports two other functions to be used from file system add-ons
|
||
|
that causes the kernel to send out notification messages:
|
||
|
|
||
|
<pre>
|
||
|
int notify_listener(int op, nspace_id nsid,
|
||
|
vnode_id vnida, vnode_id vnidb,
|
||
|
vnode_id vnidc, const char *name);
|
||
|
int send_notification(port_id port, long token,
|
||
|
ulong what, long op, nspace_id nsida,
|
||
|
nspace_id nsidb, vnode_id vnida,
|
||
|
vnode_id vnidb, vnode_id vnidc,
|
||
|
const char *name);
|
||
|
</pre>
|
||
|
|
||
|
<p>
|
||
|
The latter is only used for live query updates, but is obviously called by
|
||
|
the former. The port/token pair identify a unique BLooper/BHandler pair, and
|
||
|
it used internally to address those high-level objects from the kernel.
|
||
|
</p>
|
||
|
<p>
|
||
|
When a file system calls the <code>notify_listener()</code> function, it will have
|
||
|
a look if there are monitors for that node which meet the specified constraints -
|
||
|
and it will call <code>send_notification()</code> for every single message to be send.
|
||
|
</p>
|
||
|
<p>
|
||
|
Each of the parameters <code>vnida - vnidc</code> has a dedicated meaning:
|
||
|
<ul>
|
||
|
<li><b>vnida:</b> the parent directory of the "main" node</li>
|
||
|
<li><b>vnidb:</b> the target parent directory for a move</li>
|
||
|
<li><b>vnidc:</b> the node that has triggered the notification to be send</li>
|
||
|
</ul>
|
||
|
</p>
|
||
|
<p>
|
||
|
The flags parameter in <code>watch_node()</code> understands the following constants:
|
||
|
</p>
|
||
|
<ul>
|
||
|
<li><b>B_STOP_WATCHING</b><br>
|
||
|
watch_node() will stop to watch the specified node.</li>
|
||
|
<li><b>B_WATCH_NAME</b><br>
|
||
|
name changes are notified through a B_ENTRY_MOVED opcode.</li>
|
||
|
<li><b>B_WATCH_STAT</b><br>
|
||
|
changes to the node's stat structure are notified with a B_STAT_CHANGED code.</li>
|
||
|
<li><b>B_WATCH_ATTR</b><br>
|
||
|
attribute changes will cause a B_ATTR_CHANGED to be send.</li>
|
||
|
<li><b>B_WATCH_DIRECTORY</b><br>
|
||
|
notifies on changes made to the specified directory, i.e. B_ENTRY_REMOVED, B_ENTRY_CREATED</li>
|
||
|
<li><b>B_WATCH_ALL</b><br>
|
||
|
is a short-hand for the flags above.</li>
|
||
|
<li><b>B_WATCH_MOUNT</b><br>
|
||
|
causes B_DEVICE_MOUNTED and B_DEVICE_UNMOUNTED to be send.</li>
|
||
|
</ul>
|
||
|
<p>
|
||
|
Node monitors are maintained per team - every team can have up to 4096 monitors, although
|
||
|
there exists a private kernel call to raise this limit (for example, Tracker is using it
|
||
|
intensively).
|
||
|
</p>
|
||
|
<p>
|
||
|
The kernel is able to send the BMessages directly to the specified BLooper and BHandler;
|
||
|
it achieves this using the application kit's token mechanism. The message is constructed
|
||
|
manually in the kernel, it doesn't use any application kit services.
|
||
|
</p>
|
||
|
<br>
|
||
|
|
||
|
<h2>Meeting the Requirements in an Optimal Way - Implementation in OpenBeOS</h2>
|
||
|
|
||
|
<p>
|
||
|
If you assume that every file operation could trigger a notification message to be send,
|
||
|
it's clear that the node monitoring system must be optimized for sending messages. For
|
||
|
every call to <code>notify_listener()</code>, the kernel must check if there are any
|
||
|
monitors for the node that was updated.
|
||
|
</p>
|
||
|
<p>
|
||
|
Those monitors are put into a hash table which has the device number and the vnode ID
|
||
|
as keys. Each of the monitors maintains a list of listeners which specify which port/token
|
||
|
pair should be notified for what change. Since the vnodes are created/deleted as needed
|
||
|
from the kernel, the node monitor is maintained independently from them; a simple pointer
|
||
|
from a vnode to its monitor is not possible.
|
||
|
</p>
|
||
|
<p>
|
||
|
The main structures that are involved in providing the node monitoring functionality
|
||
|
look like this:
|
||
|
</p>
|
||
|
|
||
|
<pre>
|
||
|
struct monitor_listener {
|
||
|
monitor_listener *next;
|
||
|
monitor_listener *prev;
|
||
|
list_link monitor_link;
|
||
|
port_id port;
|
||
|
int32 token;
|
||
|
uint32 flags;
|
||
|
node_monitor *monitor;
|
||
|
};
|
||
|
|
||
|
struct node_monitor {
|
||
|
node_monitor *next;
|
||
|
mount_id device;
|
||
|
vnode_id node;
|
||
|
struct list listeners;
|
||
|
};
|
||
|
</pre>
|
||
|
|
||
|
<p>
|
||
|
The relevant part of the I/O context structure is this:
|
||
|
</p>
|
||
|
|
||
|
<pre>
|
||
|
struct io_context {
|
||
|
...
|
||
|
struct list node_monitors;
|
||
|
uint32 num_monitors;
|
||
|
uint32 max_monitors;
|
||
|
};
|
||
|
</pre>
|
||
|
|
||
|
<p>
|
||
|
If you call <code>watch_node()</code> on a file with a flags parameter unequal to
|
||
|
B_STOP_WATCHING, the following will happen in the node monitor:
|
||
|
</p>
|
||
|
<ol>
|
||
|
<li>The <code>add_node_monitor()</code> function does a hash lookup for the
|
||
|
device/vnode pair. If there is no <code>node_monitor</code> yet for this pair,
|
||
|
a new one will be created.</li>
|
||
|
<li>The list of listeners is scanned for the provided port/token pair (the
|
||
|
BLooper/BHandler pointer will already be translated in user-space), and
|
||
|
the new flag is or'd to the old field, or a new <code>monitor_listener</code>
|
||
|
is created if necessary - in the latter case, the team's node monitor
|
||
|
counter is incremented.</li>
|
||
|
</ol>
|
||
|
<p>
|
||
|
If it's called with B_STOP_WATCHING defined, the reverse operation take effect, and
|
||
|
the <code>monitor</code> field is used to see if this monitor don't have any listeners
|
||
|
anymore, in which case it will be removed.
|
||
|
</p>
|
||
|
<p>
|
||
|
Note the presence of the <code>max_monitors</code> - there is no hard limit the kernel
|
||
|
exposes to userland applications; the listeners are maintained in a doubly-linked list.
|
||
|
</p>
|
||
|
<p>
|
||
|
If a team is shut down, all listeners from its I/O context will be removed - since every
|
||
|
listener stores a pointer to its monitor, determining the monitors that can be removed
|
||
|
because of this operation is very cheap.
|
||
|
</p>
|
||
|
<p>
|
||
|
The <code>notify_listener()</code> also only does a hash lookup for the device/node
|
||
|
pair it got from the file system, and sends out as many notifications as specified by
|
||
|
the listeners of the monitor that belong to that node.
|
||
|
</p>
|
||
|
<p>
|
||
|
If a node is deleted from the disk, the corresponding <code>node_monitor</code> and its
|
||
|
listeners will be removed as well, to prevent watching a new file that accidently happen
|
||
|
to have the same device/node pair (as is possible with BFS, for example).
|
||
|
</p>
|
||
|
|
||
|
<br>
|
||
|
<h2>Differences Between Both Implementations</h2>
|
||
|
|
||
|
<p>
|
||
|
Although the aim was to create a completely compatible monitoring implementation,
|
||
|
there are some notable differences between the two.
|
||
|
</p>
|
||
|
<p>
|
||
|
BeOS reserves a certain number of slots for calls to <code>watch_node()</code> - each
|
||
|
call to that function will use one slot, even if you call it twice for the same node.
|
||
|
OpenBeOS, however, will always use one slot per node - you could call <code>watch_node()</code>
|
||
|
several times, but you would waste only one slot.
|
||
|
</p>
|
||
|
<p>
|
||
|
While this is an implementational detail, it also causes a change in behaviour for
|
||
|
applications; in BeOS, applications will get one message for every <code>watch_node()</code>
|
||
|
call, in OpenBeOS, you'll get only one message per node. If an application relies
|
||
|
on this strange behaviour of the BeOS kernel, it will no longer work correctly.
|
||
|
</p>
|
||
|
<p>
|
||
|
The other difference is that OpenBeOS exports its node monitoring functionality to
|
||
|
kernel modules as well, and provides an extra plain C API for them to use.
|
||
|
</p>
|
||
|
|
||
|
<br>
|
||
|
<h2>And Beyond?</h2>
|
||
|
|
||
|
<p>
|
||
|
The current implementation directly iterates over all listeners and sends out notifications
|
||
|
as required synchronously in the context of the thread that triggered the notification to
|
||
|
be sent.
|
||
|
</p>
|
||
|
<p>
|
||
|
If a node monitor needs to send out several messages, this could theoretically greatly
|
||
|
decrease file system performance. To optimize for this case, the required data of the
|
||
|
notification could be put into a queue and be sent by a dedicated worker thread. Since
|
||
|
this requires an additional copy operation and a reserved address space for this queue,
|
||
|
this optimization could be more expensive than the current implementation, depending
|
||
|
on the usage pattern of the node monitoring mechanism.
|
||
|
</p>
|
||
|
<p>
|
||
|
With BFS, it would be possible to introduce the possibility to automatically watch all
|
||
|
files in a specified directory. While this would be very convenient at application level,
|
||
|
it comes with several disadvantages:
|
||
|
</p>
|
||
|
<ol>
|
||
|
<li>This feature might not be easily accomplishable for many file systems; a file system
|
||
|
must be able to retrieve a node by ID only - it might not be feasible to find
|
||
|
out about the parent directory for many file systems.</li>
|
||
|
<li>Although it could potentially safe node monitors, it might cause the kernel to
|
||
|
send out a lot more messages to the application than it needs. With the restriction
|
||
|
the kernel imposes to the number of watched nodes for a team, the application's
|
||
|
designer might try to be much stricter with the number of monitors his application
|
||
|
will consume.</li>
|
||
|
</ol>
|
||
|
<p>
|
||
|
While 1.) might be a real show stopper, 2.) is almost invalidated because of Tracker's
|
||
|
usage of node monitors; it consumes a monitor for every entry it displays, which might
|
||
|
be several thousands. Implementing this feature would not only greatly speed up maintaining
|
||
|
this massive need of monitors, and cut down memory usage, but also ease the implementation
|
||
|
at application level.
|
||
|
</p>
|
||
|
<p>
|
||
|
Even 1.) could be solved if the kernel could query a file system if it can support
|
||
|
this particular feature; it could then automatically monitor all files in that directory
|
||
|
without adding complexity to the application using this feature. Of course,
|
||
|
the effort to provide this functionality is much larger then - but for applications
|
||
|
like Tracker, the complexity would be removed from the application without extra cost.
|
||
|
</p>
|
||
|
<p>
|
||
|
However, none of the discussed feature extensions have been implemented for the currently
|
||
|
developed version R1 of OpenBeOS.
|
||
|
</p>
|
||
|
</body>
|
||
|
</html>
|