NetBSD/doc/TODO.modules

/* $NetBSD: TODO.modules,v 1.22 2020/07/05 02:04:25 pgoyette Exp $ */

Some notes on the limitations of our current (as of 7.99.35) module
subsystem.  This list was triggered by an Email exchange between
christos and pgoyette.

 1. Builtin drivers can't depend on modularized drivers (the modularized
    drivers are attempted to load as builtins).

	The assumption is that dependencies are loaded before those
	modules which depend on them.  At load time, a module's
	undefined global symbols are resolved;  if any symbols can't
	be resolved, the load fails.  Similarly, if a module is
	included in (built-into) the kernel, all of its symbols must
	be resolvable by the linker, otherwise the link fails.

	There are ways around this (such as, having the parent
	module's initialization command recursively call the module
	load code), but they're often gross hacks.

	Another alternative (which is used by ppp) is to provide a
	"registration" mechanism for the "child" modules, and then when
	the need for a specific child module is encountered, use
	module_autoload() to load the child module.  Of course, this
	requires that the parent module know about all potentially
	loadable children.

 2. Currently, config(1) has no way to "no define" drivers
	XXX: I don't think this is true anymore. I think we can
	undefine drivers now, see MODULAR in amd64, which does
	no ath* and no select sppp*

 3. It is not always obvious by their names which drivers/options
    correspond to which modules.

 4. Right now critical drivers that would need to be pre-loaded (ffs,
    exec_elf64) are still built-in so that we don't need to alter the boot
    blocks to boot.

	This was a conscious decision by core@ some years ago.  It is
	not a requirement that ffs or exec_* be built-in.  The only
	requirement is that the root file-system's module must be
	available when the module subsystem is initialized, in order
	to load other modules.  This can be accomplished by having the
	boot loader "push" the module at boot time.  (It used to do
	this in all cases; currently the "push" only occurs if the
	booted filesystem is not ffs.)

 5. Not all parent bus drivers are capable of rescan, so some drivers
    just have to be built-in.

 6. Many (most?) drivers are not yet modularized

 7. There's currently no provisions for autoconfig to figure out which
    modules are needed, and thus to load the required modules.

	In the "normal" built-in world, autoconfigure can only ask
	existing drivers if they're willing to manage (ie, attach) a
	device.  Removing the built-in drivers tends to limit the
	availability of possible managers.  There's currently no
	mechanism for identifying and loading drivers based on what
	devices might be found.

 8. Even for existing modules, there are "surprise" dependencies with
    code that has not yet been modularized.

	For example, even though the bpf code has been modularized,
	there is some shared code in bpf_filter.c which is needed by
	both ipfilter and ppp.  ipf is already modularized, but ppp
	is not.  Thus, even though bpf_filter is modular, it MUST be
	included as a built-in module if you also have ppp in your
	configuration.

	Another example is sysmon_taskq module.  It is required by
	other parts of the sysmon subsystem, including the
	"sysmon_power" module.  Unfortunately, even though the
	sysmon_power code is modularized, it is referenced by the
	acpi code which has not been modularized.  Therefore, if your
	configuration has acpi, then you must include the "sysmon_power"
	module built-in the kernel.  And therefore you also need to
	have "sysmon_taskq" and "sysmon" built-in since "sysmon_power"
	rerefences them.

 9. As a corollary to #8 above, having dependencies on modules from code
    which has not been modularized makes it extremely difficult to test
    the module code adequately.  Testing of module code should include
    both testing-as-a-built-in module and testing-as-a-loaded-module, and
    all dependencies need to be identified.

10. The current /stand/$ARCH/$VERSION/modules/ hierarchy won't scale as
    we get more and more modules.  There are hundreds of potential device
    driver modules.

11. There currently isn't any good way to handle attachment-specific
    modules.  The build infrastructure (ie, sys/modules/Makefile) doesn't
    readily lend itself to bus-specific modules irrespective of $ARCH,
    and maintaining distrib/sets/lists/modules/* is awkward at best.

    Furthermore, devices such as ld(4), which can attach to a large set
    of parent devices, need to be modified.  The parent devices need to
    provide a common attribute (for example, ld_bus), and the ld driver
    should attach to that attribute rather than to each parent.  But
    currently, config(1) doesn't handle this - it doesn't allow an
    attribute to be used as the device tree's pseudo-root. The current
    directory structure where driver foo is split between ic/foo.c
    and bus1/foo_bus1.c ... busn/foo_busn.c is annoying. It would be
    better to switch to the FreeBSD model which puts all the driver
    files in one directory.

12. Item #11 gets even murkier when a particular parent can provide more
    than one attribute.

13. It seems that we might want some additional sets-lists "attributes"
    to control contents of distributions.  As an example, many of our
    architectures have PCI bus capabilities, but not all.  It is rather
    painful to need to maintain individual architectures' modules/md_*
    sets lists, especially when we already have to conditionalize the
    build of the modules based on architecture.  If we had a single
    "attribute" for PCI-bus-capable, the same attribute could be used to
    select which modules to build and which modules from modules/mi to
    include in the release.  (This is not limited to PCI;  recently we
    encounter similar issues with spkr aka spkr_synth module.)

14. As has been pointed out more than once, the current method of storing
    modules in a version-specific subdirectory of /stand is sub-optimal
    and leads to much difficulty and/or confusion.  A better mechanism of
    associating a kernel and its modules needs to be developed.  Some
    have suggested having a top-level directory (say, /netbsd) with a
    kernel and its modules at /netbsd/kernel and /netbsd/modules/...
    Whatever new mechanism we arrive at will probably require changes to
    installation procedures and bootstrap code, and will need to handle
    both the new and old mechanisms for compatability.

    One additional option mentioned is to be able to specify, at boot
    loader time, an alternate value for the os-release portion of the
    default module path,  i.e. /stand/$MACHINE/$ALT-RELEASE/modules/

    The following statement regarding this issue was previously issued
    by the "core" group:

    Date: Fri, 27 Jul 2012 08:02:56 +0200
    From: <redacted>
    To: <redacted>
    Subject: Core statement on directory naming for kernel modules

    The core group would also like to see the following changes in
    the near future:

       Implementation of the scheme described by Luke Mewburn in
       <http://mail-index.NetBSD.org/current-users/2009/05/10/msg009372.html>
       to allow a kernel and its modules to be kept together.
       Changes to config(1) to extend the existing notion of whether or not
       an option is built-in to the kernel, to three states: built-in, not
       built-in but loadable as a module, entirely excluded and not even
       loadable as a module.


15. The existing config(5) framework provides an excellent mechanism
    for managing the content of kernels.  Unfortunately, this mechanism
    does not apply for modules, and instead we need to manually manage
    a list of files to include in the module, the set of compiler
    definitions with which to build those files, and also the set of
    other modules on which a module depends.  We really need a common
    mechanism to define and build modules, whether they are included as
    "built-in" modules or as separately-loadable modules.

    (From John Nemeth) Some sort of mechanism for a (driver) module
    to declare the list of vendor/product/other tuples that it can
    handle would be nice.  Perhaps this would go in the module's .plist
    file?  (See #17 below.)  Then drivers that scan for children might
    be able to search the modules directory for an "appropriate" module
    for each child, and auto-load.

16. PR kern/52821 exposes another limitation of config(1) WRT modules.
    Here, an explicit device attachment is required, because we cannot
    rely on all kernel configs to contain the attribute at which the
    modular driver wants to attach.  Unfortunately, the explicit
    attachment causes conflicts with built-in drivers.  (See the PR for
    more details.)

17. (From John Nemeth) It would be potentially useful if a "push" from
    the bootloader could also load-and-push a module's .plist (if it
    exists.

18. (From John Nemeth) Some sort of schema for a module to declare the
    options (or other things?) that the module understands.  This could
    result in a module-options editor to manipulate the .plist

19. (From John Nemeth) Currently, the order of module initialization is
    based on module classes and declared dependencies.  It might be
    useful to have additional classes (or sub-classes) with additional
    invocations of module_class_init(), and it might be useful to have a
    non-dependency mechanism to provide "IF module-A and module-B are
    BOTH present, module-A needs to be initialized before module-B".

20. (Long-ago memory rises to the surface) Note that currently there is
    nothing that requires a module's name to correspond in any way with
    the name of file from which the module is loaded.  Thus, it is
    possible to attempt to access device /dev/x, discover that there is
    no such device so we autoload /stand/.../x/x.kmod and initialize
    the module loaded, even if the loaded module is for some other
    device entirely!

21. We currently do not support "weak" symbols in the in-kernel linker.
    It would take some serious thought to get such support right.  For
    example, consider module A with a weak reference to symbol S which
    is defined in module B.  If module B is loaded first, and then
    module A, the symbol gets resolved.  But if module A is loaded first,
    the symbol won't be resolved.  If we subsequently load module B, we
    would have to "go back" and re-run the linker for module A.

    Additional difficulties arise when the module which defines the
    weak symbol gets unloaded.  Then, you would need to re-run the
    linker and _unresolve_ the weak symbol which is no longer defined.

22. A fairly large number of modules still require a maximum warning
    level of WARNS=3 due to signed-vs-unsigned integer comparisons.  We
    really ought to clean these up.  (I haven't looked at them in any
    detail, but I have to wonder how code that compiles cleanly in a
    normal kernel has these issues when compiled in a module, when both
    are done with WARNS=5).

23. The current process of "load all the emulation/exec modules in case
    one of them might handle the image currently being exec'd" isn't
    really cool.  (See sys/kern/kern_exec.c?)  It ends up auto-loading
    a whole bunch of modules, involving file-system access, just to have
    most of the modules getting unloaded a few seconds later.  We don't
    have any way to identify which module is needed for which image (ie,
    we can't determine that an image needs compat_linux vs some other
    module).

24. Details are no longer remembered, but there are some issues with
    building xen-variant modules (on amd4, and likely i386).  In some
    cases, wrong headers are included (because a XEN-related #define
    is missing), but even if you add the definition some headers get
    included in the wrong order.  One particular fallout from this is
    the inability to have a compat version of x86_64 cpu-microcode
    module.  PR port-xen/53130

    This is likely to be fixed by Chuck Silvers on 2020-07-04 which
    removed the differences between the xen and non-xen module ABIs.