this prevents a hang trying to shut down other CPUs on x86,
and in general we could be called in any context from a panic
so it's best to skip unnecessary operations in that case.
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.
meta-data of this pool takes more space than the actual data..
- Reduce lowat/hiwat to 1..8, since intensity is very low.
- Remove unused pew_next_free from pmf_event_workitem_t.
and make suspension by self, by drvctl(8), and by ACPI system sleep
play nice together. Start solidifying some temporary API changes.
1. Extract a new header file, <sys/device_if.h>, from <sys/device.h> and
#include it from <sys/pmf.h> instead of <sys/device.h> to break the
circular dependency between <sys/device.h> and <sys/pmf.h>.
2. Introduce pmf_qual_t, an aggregate of qualifications on a PMF
suspend/resume call. Start to replace instances of PMF_FN_PROTO,
PMF_FN_ARGS, et cetera, with a pmf_qual_t.
3. Introduce the notion of a "suspensor," an entity that holds a
device in suspension. More than one suspensor may hold a device
at once. A device stays suspended as long as at least one
suspensor holds it. A device resumes when the last suspensor
releases it.
Currently, the kernel defines three suspensors,
3a the system-suspensor: for system suspension, initiated
by 'sysctl -w machdep.sleep_state=3', by lid closure, by
power-button press, et cetera,
3b the drvctl-suspensor: for device suspension by /dev/drvctl
ioctl, e.g., drvctl -S sip0.
3c the system self-suspensor: for device drivers that suspend
themselves and their children. Several drivers for network
interfaces put the network device to sleep while it is not
administratively up, that is, after the kernel calls if_stop(,
1). The self-suspensor should not be used directly. See
the description of suspensor delegates, below.
A suspensor can have one or more "delegates". A suspensor can
release devices that its delegates hold suspended. Right now,
only the system self-suspensor has delegates. For each device
that a self-suspending driver attaches, it creates the device's
self-suspensor, a delegate of the system self-suspensor.
Suspensors stop a system-wide suspend/resume cycle from waking
devices that the operator put to sleep with drvctl before the cycle.
They also help self-suspension to work more simply, safely, and in
accord with expectations.
4. Add the notion of device activation level, devact_level_t,
and a routine for checking the current activation level,
device_activation(). Current activation levels are DEVACT_LEVEL_BUS,
DEVACT_LEVEL_DRIVER, and DEVACT_LEVEL_CLASS, which respectively
indicate that the device's bus is active, that the bus and device are
active, and that the bus, device, and the functions of the device's
class (network, audio) are active.
Suspend/resume calls can be qualified with a devact_level_t.
The power-management framework treats a devact_level_t that
qualifies a device suspension as the device's current activation
level; it only runs hooks to reduce the activation level from
the presumed current level to the fully suspended state. The
framework treats a devact_level_t qualifying device resumption
as the target activation level; it only runs hooks to raise the
activation level to the target.
5. Use pmf_qual_t, devact_level_t, and self-suspensors in several
drivers.
6. Temporarily add an unused power-management workqueue that I will
remove or replace, soon.
must not allocate a pmf_event_workitem_t using kmem_alloc(9). Use
pool_cache(9), instead, because it is safe in interrupt context.
Thanks, rmind@, for catching the problem and suggesting the solution.
(void *)pew is one way to get a struct work *, but let's
write&pew->pew_work, instead. It is more defensive and persuasive.
Make miscellaneous changes in support of tearing down arbitrary
stacks of filesystems and devices during shutdown:
1 Move struct shutdown_state, shutdown_first(), and shutdown_next(),
from kern_pmf.c to subr_autoconf.c. Rename detach_all() to
config_detach_all(), and move it from kern_pmf.c to subr_autoconf.c.
Export all of those routines.
2 In pmf_system_shutdown(), do not suspend user process scheduling, and
do not detach all devices: I am going to do that in cpu_reboot(),
instead. (Soon I will do it in an MI cpu_reboot() routine.) Do still
call PMF shutdown hooks.
3 In config_detach(), add a DIAGNOSTIC assertion: if we're exiting
config_detach() at the bottom, alldevs_nwrite had better not be 0,
because config_detach() is a writer of the device list.
4 In deviter_release(), check to see if we're iterating the device list
for reading, *first*, and if so, decrease the number of readers. Used
to be that if we happened to be reading during shutdown, we ran the
shutdown branch. Thus the number of writers reached 0, the number
of readers remained > 0, and no writer could iterate again. Under
certain circumstances that would cause a hang during shutdown.
over and over to detach all of the devices. Stop when we cannot detach
even a single device in a cycle. Call shutdown hooks on all of the
devices that remain attached.
This is another step toward the detach/unmount cycle that will help us
tear down arbitrary stacks of filesystems, ccd(4), raid(4), and vnd(4).
Call the detach routine for every device in the device tree, starting
with the leaves and moving toward the root, expecting that each
(pseudo-)device driver will use the opportunity to gracefully commit
outstandings transactions to the underlying (pseudo-)device and to
relinquish control of the hardware to the system BIOS.
Detaching devices is not suitable for every shutdown: in an emergency,
or if the system state is inconsistent, we should resort to a fast,
simple shutdown that uses only the pmf(9) shutdown hooks and the
(deprecated) shutdownhooks. For now, if the flag RB_NOSYNC is set in
boothowto, opt for the fast, simple shutdown.
Add a device flag, DVF_DETACH_SHUTDOWN, that indicates by its presence
that it is safe to detach a device during shutdown. Introduce macros
CFATTACH_DECL3() and CFATTACH_DECL3_NEW() for creating autoconf
attachments with default device flags. Add DVF_DETACH_SHUTDOWN
to configuration attachments for atabus(4), atw(4) at cardbus(4),
cardbus(4), cardslot(4), com(4) at isa(4), elanpar(4), elanpex(4),
elansc(4), gpio(4), npx(4) at isa(4), nsphyter(4), pci(4), pcib(4),
pcmcia(4), ppb(4), sip(4), wd(4), and wdc(4) at isa(4).
Add a device-detachment "reason" flag, DETACH_SHUTDOWN, that tells the
autoconf code and a device driver that the reason for detachment is
system shutdown.
Add a sysctl, kern.detachall, that tells the system to try to detach
every device at shutdown, regardless of any device's DVF_DETACH_SHUTDOWN
flag. The default for kern.detachall is 0. SET IT TO 1, PLEASE, TO
HELP TEST AND DEBUG DEVICE DETACHMENT AT SHUTDOWN.
This is a work in progress. In future work, I aim to treat
pseudo-devices more thoroughly, and to gracefully tear down a stack of
(pseudo-)disk drivers and filesystems, including cgd(4), vnd(4), and
raid(4) instances at shutdown.
Also commit some changes that are not easily untangled from the rest:
(1) begin to simplify device_t locking: rename struct pmf_private to
device_lock, and incorporate device_lock into struct device.
(2) #include <sys/device.h> in sys/pmf.h in order to get some
definitions that it needs. Stop unnecessarily #including <sys/device.h>
in sys/arch/x86/include/pic.h to keep the amd64, xen, and i386 releases
building.
Improve PMF-ability.
Add a 'flags' argument to suspend/resume handlers and
callers such as pmf_system_suspend().
Define a flag, PMF_F_SELF, which indicates to PMF that a
device is suspending/resuming itself. Add helper routines,
pmf_device_suspend_self(dev) and pmf_device_resume_self(dev),
that call pmf_device_suspend(dev, PMF_F_SELF) and
pmf_device_resume(dev, PMF_F_SELF), respectively. Use
PMF_F_SELF to suspend/resume self in ath(4), audio(4),
rtw(4), and sip(4).
In ath(4) and in rtw(4), replace the icky sc_enable/sc_disable
callbacks, provided by the bus front-end, with
self-suspension/resumption. Also, clean up the bus
front-ends. Make sure that the interrupt handler is
disestablished during suspension. Get rid of driver-private
flags (e.g., RTW_F_ENABLED, ath_softc->sc_invalid); use
device_is_active()/device_has_power() calls, instead.
In the network-class suspend handler, call if_stop(, 0)
instead of if_stop(, 1), because the latter is superfluous
(bus- and driver-suspension hooks will 'disable' the NIC),
and it may cause recursion.
In the network-class resume handler, prevent infinite
recursion through if_init() by getting out early if we are
self-suspending (PMF_F_SELF).
rtw(4) improvements:
Destroy rtw(4) callouts when we detach it. Make rtw at
pci detachable. Print some more information with the "rx
frame too long" warning.
Remove activate() methods:
Get rid of rtw_activate() and ath_activate(). The device
activate() methods are not good for much these days.
Make ath at cardbus resume with crypto functions intact:
Introduce a boolean device property, "pmf-powerdown". If
pmf-powerdown is present and false, it indicates that a
bus back-end should not remove power from a device.
Honor this property in cardbus_child_suspend().
Set this property to 'false' in ath_attach(), since removing
power from an ath at cardbus seems to lobotomize the WPA
crypto engine. XXX Should the pmf-powerdown property
propagate toward the root of the device tree?
Miscellaneous ath(4) changes:
Warn if ath(4) tries to write crypto keys to suspended
hardware.
Reduce differences between FreeBSD and NetBSD in ath(4)
multicast filter setup.
Make ath_printrxbuf() print an rx descriptor's status &
key index, to help debug crypto errors.
Shorten a staircase in ath_ioctl(). Don't check for
ieee80211_ioctl() return code ERESTART, it never happens.
Add a device iterator object, deviter_t, and methods deviter_init(),
deviter_first(), and deviter_next() for visiting each device in
the device tree.
Take care not to re-shutdown a device in the event that the machine
panics during reboot and the operator types 'reboot' at the kernel
debugger prompt.
While I'm here, sprinkle PMF_FN_ARGS, PMF_FN_PROTO, et cetera.
can register a shutdown handler explicitely.
Install a pci bus shutdown handler which disables bus master accesses
for all childs, so the drivers don't need to care.
This will hopefully be sufficient to replace the shutdownhooks
(together with the powerhooks). (It has been suggested to use some
general event notification framework for shutdown handlers, but there
might be cases where shutdown handlers must be run in an order following
the device hierarchy, which wouldn't be easy with event handlers
not tied to drivers.)
approved by David Young
respect the kernel device tree. (It is arguably ugly to special-case
wscons here, but as long as there is only one driver to be dealt with
it is not worth to introduce another set of hooks.)
Resume the X server at the end of resume, if everything went well.
Acquire the big KERNEL_LOCK before the device tree is walked on
suspend, until after the walk on resume. This is needed to avoid
device accesses by secondary CPUs, and it effectively keeps user
programs from interfering with the suspend process. This might be
revisited when all drivers are using private locks for MP-safeness
(but FreeBSD still does the same afaics).
It should be unnecessary now to switch secondary CPUs offline in
the powerd suspend script.
Fix previous - use foreach, and just return after first found entry.
The pmf_all_events list should not have duplicate entries (perhaps
pmf(9) should document this point).
separate powering up devices from restoring their state. This is required
on some machines where AcpiLeaveSleepState can fail due to an attempt to
access a powered off device.