manual/termcap-1.3/html_chapter/termcap_4.html#SEC23 the cursor move multiple
escapes have undefined results when moving out of the screen. Stop using DO
to move down multiple lines and use a loop of newlines instead.
In normal operations with multiple queues, the nvme driver will attempt
to schedule I/O requests on the submitting CPU. This breaks down when any
one of the queues becomes full; the driver returns EAGAIN to the disk
layer, which causes the disk layer to stop submitting more requests until
the blocked request is consumed. When space becomes available in the full
queue, it pulls the next buffer from the bufq and fills the queue again,
until finally hitting EAGAIN and preventing other queues from processing
requests.
Two changes here to fix the problem:
- When processing requests from the bufq, attempt to assign them to the
queue associated with the CPU that originated the request.
- If that queue is busy, try to find another queue with available space
before returning EAGAIN. This way, only when all queues are full will
the disk layer stop submitting more requests.
Now for some real numbers. On a Rockchip RK3399 board (6 CPUs), with 6
concurrent readers:
Old code:
4294967296 bytes transferred in 52.420 secs (81933752 bytes/sec)
4294967296 bytes transferred in 53.969 secs (79582117 bytes/sec)
4294967296 bytes transferred in 55.391 secs (77539082 bytes/sec)
4294967296 bytes transferred in 55.649 secs (77179595 bytes/sec)
4294967296 bytes transferred in 56.102 secs (76556402 bytes/sec)
4294967296 bytes transferred in 72.901 secs (58915066 bytes/sec)
New code:
4294967296 bytes transferred in 37.171 secs (115546186 bytes/sec)
4294967296 bytes transferred in 37.611 secs (114194445 bytes/sec)
4294967296 bytes transferred in 37.655 secs (114061009 bytes/sec)
4294967296 bytes transferred in 38.247 secs (112295534 bytes/sec)
4294967296 bytes transferred in 38.496 secs (111569183 bytes/sec)
4294967296 bytes transferred in 38.595 secs (111282997 bytes/sec)
On sending a packet over a STALE cache, the cache should be tried a reachability
confirmation, which is described in RFC 2461/4861 7.3.3. On the fast path in
nd6_resolve, however, the treatment for STALE caches has been skipped
accidentally. So STALE caches never be back to the REACHABLE state.
To fix the issue, branch to the fast path only when the cache entry is the
REACHABLE state and leave other caches to the slow path that includes the
treatment. To this end we need to allow to return a link-layer address if a
valid address is available on the slow path too, which is the same behavior as
FreeBSD and OpenBSD.
- convert IFF_ALLMULTI to ETHER_F_ALLMULTI, using ETHER_LOCK()
- remove IFF_OACTIVE use, and simply check the ring count in start
- assert/take more locks
- XXX: IFF_RUNNING is not properly protected (all driver problem)
- fix axen_timer setting so it actually runs
- document a locking issue in stop callback:
stop is called with the softc lock held, but the lock order
in all other places is ifnet -> softc -> rx -> tx, so taking
ifnet lock when softc lock is held would be problematic
- in rxeof check for stopping/dying more often. i managed to
trigger a pagefault in cdce_rxeof() when yanking an active
device as it attempted to usbd_setup_xfer() on closed pipes.
- add missing USBD_MPSAFE and cdce_stopping resetting for cdce(4)
between this and other recent clean ups increase performance of
these drivers mostly. some numbers (in mbit/sec):
old: new:
driver in out in+out in out in+out
---- -- --- ------ -- --- ------
cdce 39 32 44 38 33 54
axen 44 34 45 48 37 42
ure 36 34 35 36 38 38
i'm not sure why axen drops a little with in+out. cdce is
helped quite a lot, and ure a little. it is disappointing that
ure does not outperform cdce -- it's the same actual hardware,
and the device-specific (ure) should outperform the generic
cdce driver...
- Move ixgbe_toggle_txdctl() to ixgbe_common.c and modify a bit.
No functional change because this function is currently used for SR-IOV
and it's not used in NetBSD.
- Some modification to match the latest netmap API.
- Modify ixgbe_hic_unlocked(). No functional change because neither
IXGBE_HOST_INTERFACE_APPLY_UPDATE_CMD(0x38) nor
IXGBE_HOST_INTERFACE_SHADOW_RAM_READ_CMD(0x31) are used.
- Add ixgbe_clear_mbx(). No functional change because this function is not
used yet.
- Add some not-yet-used register definitions.
- Whitespace fixes.
of compat32 being on or off, because we want the headers to work when
compiling modular kernels. Of course the 32 bit structs do not make sense
on platforms that don't have 32 bit modes (alpha), but we don't have
a define for that and it does not hurt.
Write 0xff to ICC_PMR_EL1 and read back how many bits are implemented,
then do the same with a GICD_IPRIORITYR<n> priority value field.
If the values differ, assume we have a shifted view of IPRIORITYR.