Thank you, wiz, for the few mandoc suggestions.
I slightly reworded part of the description and removed the
advertising clause from the version I posted to tech-userlevel.
instead of adding/subtracting our own IPv4 header.
There are many benefits: gre(4) needn't grok the outer encapsulation
header any longer, so this simplifies the gre(4) code. The IP
stack needn't grok GRE, so it is simplified, too. gre(4) will
benefit from optimizations in the socket code. Eventually, gre(4)
will gain an IPv6 encapsulation with very few new lines of code.
There is a small performance loss. A 133 MHz, 486-class AMD Elan
sinks/sources a TCP stream over GRE with about 93% the throughput
of the old code. TCP throughput on a 266 MHz, 586-class AMD Geode
is about 96% the throughput of the old code. A 175-MHz ADM5120
(MIPS) only sinks a TCP stream over GRE at about 90% of the old
code; I am still investigating that.
I produced stripped-down versions of sosend() and soreceive() for
gre(4) to use. They are guaranteed not to block, so they can be
called from a software interrupt and from a socket upcall,
respectively.
A kernel thread is no longer necessary for socket transmit/receive,
but I didn't get around to removing it, yet.
Thanks to Matt Thomas for suggesting the use of stripped-down socket
code and software interrupts, and to Andrew Doran for advice and
answers concerning software interrupts, threads, and performance.
-falign-functions=32, since these two really get hammered on. To make them
faster needs a threadreg or TLS, unless there is a way to tell gcc that a
library-local (pthread__threadmask) variable does not need to be PIC.
preparing to sleep on a mutex are slow in relative terms, so this allows
us to recover from short lock holds without blocking, while not wasting
too much time on longer holds.
- remove the `submatch' argument of config_found_ia()
- precise that config_found_ia() callis config_found_sm_loc() with both
`locs' and `submatch' set to NULL
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.
- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.
closer to its single caller in if_eon.c, try to move fewer bytes
by moving the IP header forward instead of moving the tail of the
mbuf backward, and use m_adj(9) instead of fiddling directly with
mbuf data members.