.\" $NetBSD: sysctl.9,v 1.7 2005/06/26 08:41:42 wiz Exp $ .\" .\" Copyright (c) 2004 The NetBSD Foundation, Inc. .\" All rights reserved. .\" .\" This code is derived from software contributed to The NetBSD Foundation .\" by Andrew Brown. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" 3. Neither the name of The NetBSD Foundation nor the names of its .\" contributors may be used to endorse or promote products derived .\" from this software without specific prior written permission. .\" .\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS .\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED .\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR .\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS .\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR .\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF .\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS .\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN .\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) .\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE .\" POSSIBILITY OF SUCH DAMAGE. .\" .Dd June 20, 2005 .Dt SYSCTL 9 .Os .Sh NAME .Nm sysctl .Nd system variable control interfaces .Sh SYNOPSIS .In sys/param.h .In sys/sysctl.h .Pp Primary external interfaces: .Ft void .Fn sysctl_init void .Ft int .Fn sysctl_lock "struct lwp *l" "void *oldp" "size_t savelen" .Ft int .Fn sysctl_dispatch "const int *name" "u_int namelen" "void *oldp" \ "size_t *oldlenp" "const void *newp" "size_t newlen" "const int *oname" \ "struct lwp *l" "const struct sysctlnode *rnode" .Ft void .Fn sysctl_unlock "struct lwp *l" .Ft int .Fn sysctl_createv "struct sysctllog **log" "int cflags" \ "const struct sysctlnode **rnode" "const struct sysctlnode **cnode" \ "int flags" "int type" "const char *namep" "const char *desc" \ "sysctlfn func" "u_quad_t qv" "void *newp" "size_t newlen" ... .Ft int .Fn sysctl_destroyv "struct sysctlnode *rnode" ... .Ft void .Fn sysctl_free "struct sysctlnode *rnode" .Ft void .Fn sysctl_teardown "struct sysctllog **" .Ft int .Fn old_sysctl "int *name" "u_int namelen" "void *oldp" \ "size_t *oldlenp" "void *newp" "size_t newlen" "struct lwp *l" .Pp Core internal functions: .Ft int .Fn sysctl_locate "struct lwp *l" "const int *name" "u_int namelen" \ "const struct sysctlnode **rnode" "int *nip" .Ft int .Fn sysctl_lookup "const int *name" "u_int namelen" "void *oldp" \ "size_t *oldlenp" "const void *newp" "size_t newlen" "const int *oname" \ "struct lwp *l" "const struct sysctlnode *rnode" .Ft int .Fn sysctl_create "const int *name" "u_int namelen" "void *oldp" \ "size_t *oldlenp" "const void *newp" "size_t newlen" "const int *oname" \ "struct lwp *l" "const struct sysctlnode *rnode" .Ft int .Fn sysctl_destroy "const int *name" "u_int namelen" "void *oldp" \ "size_t *oldlenp" "const void *newp" "size_t newlen" "const int *oname" \ "struct lwp *l" "const struct sysctlnode *rnode" .Ft int .Fn sysctl_query "const int *name" "u_int namelen" "void *oldp" \ "size_t *oldlenp" "const void *newp" "size_t newlen" "const int *oname" \ "struct lwp *l" "const struct sysctlnode *rnode" .Pp Simple .Dq helper functions: .Ft int .Fn sysctl_needfunc "const int *name" "u_int namelen" "void *oldp" \ "size_t *oldlenp" "const void *newp" "size_t newlen" "const int *oname" \ "struct lwp *l" "const struct sysctlnode *rnode" .Ft int .Fn sysctl_notavail "const int *name" "u_int namelen" "void *oldp" \ "size_t *oldlenp" "const void *newp" "size_t newlen" "const int *oname" \ "struct lwp *l" "const struct sysctlnode *rnode" .Ft int .Fn sysctl_null "const int *name" "u_int namelen" "void *oldp" \ "size_t *oldlenp" "const void *newp" "size_t newlen" "const int *oname" \ "struct lwp *l" "const struct sysctlnode *rnode"" .Sh DESCRIPTION The SYSCTL subsystem instruments a number of kernel tunables and other data structures via a simple MIB-like interface, primarily for consumption by userland programs, but also for use internally by the kernel. .Sh LOCKING All operations on the SYSCTL tree must be protected by acquiring the main SYSCTL lock. The only functions that can be called when the lock is not held are .Fn sysctl_lock , .Fn sysctl_createv , .Fn sysctl_destroyv , and .Fn old_sysctl . All other functions require the tree to be locked. This is to prevent other users of the tree from moving nodes around during an add operation, or from destroying nodes or subtrees that are actively being used. The lock is acquired by calling .Fn sysctl_lock with a pointer to the process's lwp .Fa l .Dv ( NULL may be passed to all functions as the lwp pointer if no lwp is appropriate, though any changes made via .Fn sysctl_create , .Fn sysctl_destroy , .Fn sysctl_lookup , or by any helper function will be done with effective superuser privileges). The .Fa oldp and .Fa savelen arguments are a pointer to and the size of the memory region the caller will be using to collect data from SYSCTL. These may also be .Dv NULL and 0, respectively. .Pp The memory region will be locked via .Fn uvm_vslock if it is a region in userspace. The address and size of the region are recorded so that when the SYSCTL lock is to be released via .Fn sysctl_unlock , only the lwp pointer .Fa l is required. .Sh LOOKUPS Once the lock has been acquired, it is typical to call .Fn sysctl_dispatch to handle the request. .Fn sysctl_dispatch will examine the contents of .Fa name , an array of integers at least .Fa namelen long, which is to be located in kernel space, in order to determine which function to call to handle the specific request. .Pp .Fn sysctl_dispatch uses the following algorithm to determine the function to call: .Pp .Bl -bullet .It Scan the tree using .Fn sysctl_locate .It If the node returned has a .Dq helper function, call it .It If the requested node was found but has no function, call .Fn sysctl_lookup .It If the node was not found and .Fa name specifies one of .Fn sysctl_query , .Fn sysctl_create , or .Fn sysctl_destroy , call the appropriate function .It If none of these options applies and no other error was yet recorded, return .Er EOPNOTSUPP .Pp .El The .Fa oldp and .Fa oldlenp arguments to .Fn sysctl_dispatch , as with all the other core functions, describe an area into which the current or requested value may be copied. .Fa oldp may or may not be a pointer into userspace (as dictated by whether .Fa l is .Dv NULL or not). .Fa oldlenp is a .No non- Ns Dv NULL pointer to a size_t. .Fa newp and .Fa newlen describe an area where the new value for the request may be found; .Fa newp may also be a pointer into userspace. The .Fa oname argument is a .No non- Ns Dv NULL pointer to the base of the request currently being processed. By simple arithmetic on .Fa name , .Fa namelen , and .Fa oname , one can easily determine the entire original request and .Fa namelen values, if needed. The .Fa rnode value, as passed to .Fn sysctl_dispatch represents the root of the tree into which the current request is to be dispatched. If .Dv NULL , the main tree will be used. .Pp .Fn sysctl_locate scans a tree for the node most specific to a request. If the pointer referenced by .Fa rnode is not .Dv NULL , the tree indicated is searched, otherwise the main tree will be used. The address of the most relevant node will be returned via .Fa rnode and the number of MIB entries consumed will be returned via .Fa nip , if it is not .Dv NULL . .Pp The .Fn sysctl_lookup function takes the same arguments as .Fn sysctl_dispatch with the caveat that the value for .Fa namelen must be zero in order to indicate that the node referenced by the .Fa rnode argument is the one to which the lookup is being applied. .Sh CREATION AND DESTRUCTION OF NODES New nodes are created and destroyed by the .Fn sysctl_create and .Fn sysctl_destroy functions. These functions take the same arguments as .Fn sysctl_dispatch with the additional requirement that the .Fa namelen argument must be 1 and the .Fa name argument must point to an integer valued either .Dv CTL_CREATE or .Dv CTL_CREATESYM when creating a new node, or .Dv CTL_DESTROY when destroying a node. The .Fa newp and .Fa newlen arguments should point to a copy of the node to be created or destroyed. If the create or destroy operation was successful, a copy of the node created or destroyed will be placed in the space indicated by .Fa oldp and .Fa oldlenp . If the create operation fails because of a conflict with an existing node, a copy of that node will be returned instead. .Pp In order to facilitate the creation and destruction of nodes from a given tree by kernel subsystems, the functions .Fn sysctl_createv and .Fn sysctl_destroyv are provided. These functions take care of the overhead of filling in the contents of the create or destroy request, dealing with locking, locating the appropriate parent node, etc. .Pp The arguments to .Fn sysctl_createv are used to construct the new node. If the .Fa log argument is not .Dv NULL , a sysctllog structure will be allocated and the pointer referenced will be changed to address it. The same log may be used for any number of nodes, provided they are all inserted into the same tree. This allows for a series of nodes to be created and later removed from the tree in a single transaction (via .Fn sysctl_teardown ) without the need for any record keeping on the caller's part. The .Fa cflags argument is currently unused and must be zero. The .Fa rnode argument must either be .Dv NULL or a valid pointer to a reference to the root of the tree into which the new node must be placed. If it is .Dv NULL , the main tree will be used. It is illegal for .Fa rnode to refer to a .Dv NULL pointer. If the .Fa cnode argument is not .Dv NULL , on return it will be adjusted to point to the address of the new node. .Pp The .Fa flags and .Fa type arguments are combined into the .Fa sysctl_flags field, and the current value for .Dv SYSCTL_VERSION is added in. Note: the .Dv CTLFLAG_PERMANENT flag can only be set from SYSCTL setup routines (see .Sx SETUP FUNCTIONS ) as called by .Fn sysctl_init . The .Fa namep argument is copied into the .Fa sysctl_name field and must be less than .Dv SYSCTL_NAMELEN characters in length. The string indicated by .Fa desc will be copied if the .Dv CTLFLAG_OWNDESC flag is set, and will be used as the node's description. Note: if .Fn sysctl_destroyv attempts to delete a node that does not own its own description (and is not marked as permanent), but the deletion fails, the description will be copied and .Fn sysctl_destroyv will set the .Dv CTLFLAG_OWNDESC flag. .Pp The .Fa func argument is the name of a .Dq helper function (see .Sx HELPER FUNCTIONS AND MACROS ) . If the .Dv CTLFLAG_IMMEDIATE flag is set, the .Fa qv argument will be interpreted as the initial value for the new .Dq int or .Dq quad node. This flag does not apply to any other type of node. The .Fa newp and .Fa newlen arguments describe the data external to SYSCTL that is to be instrumented. One of .Fa func , .Fa qv and the .Dv CTLFLAG_IMMEDIATE flag, or .Fa newp and .Fa newlen must be given for nodes that instrument data, otherwise an error is returned. .Pp The remaining arguments are a list of integers specifying the path through the MIB to the node being created. The list must be terminated by the .Dv CTL_EOL value. The penultimate value in the list may be .Dv CTL_CREATE if a dynamic MIB entry is to be made for this node. .Fn sysctl_createv specifically does not support .Dv CTL_CREATESYM , since setup routines are expected to be able to use the in-kernel .Xr ksyms 4 interface to discover the location of the data to be instrumented. If the node to be created matches a node that already exists, a return code of 0 is given, indicating success. .Pp When using .Fn sysctl_destroyv to destroy a given node, the .Fa rnode argument, if not .Dv NULL , is taken to be the root of the tree from which the node is to be destroyed, otherwise the main tree is used. The rest of the arguments are a list of integers specifying the path through the MIB to the node being destroyed. If the node being destroyed does not exist, a successful return code is given. Nodes marked with the .Dv CTLFLAG_PERMANENT flag cannot be destroyed. .Sh HELPER FUNCTIONS AND MACROS Helper functions are invoked with the same common argument set as .Fn sysctl_dispatch except that the .Fa rnode argument will never be .Dv NULL . It will be set to point to the node that corresponds most closely to the current request. Helpers are forbidden from modifying the node they are passed; they should instead copy the structure if changes are required in order to effect access control or other checks. The .Dq helper prototype and function that needs to ensure that a newly assigned value is within a certain range (presuming external data) would look like the following: .Pp .Bd -literal -offset indent -compact static int sysctl_helper(SYSCTLFN_PROTO); .sp static int sysctl_helper(SYSCTLFN_ARGS) { struct sysctlnode node; int t, error; .sp node = *rnode; node.sysctl_data = \*[Am]t; error = sysctl_lookup(SYSCTLFN_CALL(\*[Am]node)); if (error || newp == NULL) return (error); .sp if (t \*[Lt] 0 || t \*[Gt] 20) return (EINVAL); .sp *(int*)rnode-\*[Gt]sysctl_data = t; return (0); } .Ed .Pp The use of the .Dv SYSCTLFN_PROTO , .Dv SYSCTLFN_ARGS, and .Dv SYSCTLFN_CALL macros ensure that all arguments are passed properly. The single argument to the .Dv SYSCTLFN_CALL macro is the pointer to the node being examined. .Pp Three basic helper functions are available for use. .Fn sysctl_needfunc will emit a warning to the system console whenever it is invoked and provides a simplistic read-only interface to the given node. .Fn sysctl_notavail will forward .Dq queries to .Fn sysctl_query so that subtrees can be discovered, but will return .Er EOPNOTSUPP for any other condition. .Fn sysctl_null specifically ignores any arguments given, sets the value indicated by .Fa oldlenp to zero, and returns success. .Sh SETUP FUNCTIONS Though nodes can be added to the SYSCTL tree at any time, in order to add nodes during the kernel bootstrap phase, a proper .Dq setup function must be used. Setup functions are declared using the .Dv SYSCTL_SETUP macro, which takes the name of the function and a short string description of the function as arguments. The address of the function is added to a list of functions that .Fn sysctl_init traverses during initialization. .Pp Setup functions to not have to add nodes to the main tree, but can set up their own trees for emulation or other purposes. Emulations that require use of a main tree but with some nodes changed to suit their own purposes can arrange to overlay a sparse private tree onto their main tree by making the .Fa e_sysctlovly member of their struct emul definition point to the overlaid tree. .Pp Setup functions should take care to create all nodes from the root down to the subtree they are creating, since the order in which setup functions are called is arbitrary (the order in which setup functions are called is only determined by the ordering of the object files as passed to the linker when the kernel is built). .Sh MISCELLANEOUS FUNCTIONS .Fn sysctl_init is called early in the kernel bootstrap process. It initializes the SYSCTL lock, calls all the registered setup functions, and marks the tree as permanent. .Pp .Fn sysctl_free will unconditionally delete any and all nodes below the given node. Its intended use is for the deletion of entire trees, not subtrees. If a subtree is to be removed, .Fn sysctl_destroy or .Fn sysctl_destroyv should be used to ensure that nodes not owned by the sub-system being deactivated are not mistakenly destroyed. The SYSCTL lock must be held when calling this function. .Pp .Fn sysctl_teardown unwinds a sysctllog and deletes the nodes in the opposite order in which they were created. .Pp .Fn old_sysctl provides an interface similar to the old SYSCTL implementation, with the exception that access checks on a per-node basis are performed if the .Fa l argument is .No non- Ns Dv NULL . If called with a .Dv NULL argument, the values for .Fa newp and .Fa oldp are interpreted as kernel addresses, and access is performed as for the superuser. .Sh NOTES It is expected that nodes will be added to (or removed from) the tree during the following stages of a machine's lifetime: .Pp .Bl -bullet -compact .It initialization -- when the kernel is booting .It autoconfiguration -- when devices are being probed at boot time .It .Dq plug and play device attachment -- when a PC-Card, USB, or other device is plugged in or attached .It LKM initialization -- when an LKM is being loaded .It .Dq run-time -- when a process creates a node via the .Xr sysctl 3 interface .El .Pp Nodes marked with .Dv CTLFLAG_PERMANENT can only be added to a tree during the first or initialization phase, and can never be removed. The initialization phase terminates when the main tree's root is marked with the .Dv CTLFLAG_PERMANENT flag. Once the main tree is marked in this manner, no nodes can be added to any tree that is marked with .Dv CTLFLAG_READONLY at its root, and no nodes can be added at all if the main tree's root is so marked. .Pp Nodes added by device drivers, LKMs, and at device insertion time can be added to (and removed from) .Dq read-only parent nodes. .Pp Nodes created by processes can only be added to .Dq writable parent nodes. See .Xr sysctl 3 for a description of the flags that are allowed to be used by when creating nodes. .Sh SEE ALSO .Xr sysctl 3 .Sh HISTORY The dynamic SYSCTL implementation first appeared in .Nx 2.0 . .Sh AUTHORS .An Andrew Brown .Aq atatat@NetBSD.org designed and implemented the dynamic SYSCTL implementation.