addresses. Make the kernel support SIOC[SG]IFADDRPREF for IPv6
interface addresses.
In in6ifa_ifpforlinklocal(), consult preference numbers before
making an otherwise arbitrary choice of in6_ifaddr. Otherwise,
preference numbers are *not* consulted by the kernel, but that will
be rather easy for somebody with a little bit of free time to fix.
Please note that setting the preference number for a link-local
IPv6 address does not work right, yet, but that ought to be fixed
soon.
In support of the changes above,
1 Add a method to struct domain for "externalizing" a sockaddr, and
provide an implementation for IPv6. Expect more work in this area: it
may be more proper to say that the IPv6 implementation "internalizes"
a sockaddr. Add sockaddr_externalize().
2 Add a subroutine, sofamily(), that returns a struct socket's address
family or AF_UNSPEC.
3 Make a lot of IPv4-specific code generic, and move it from
sys/netinet/ to sys/net/ for re-use by IPv6 parts of the kernel and
ifconfig(8).
name and address. Room for an address will do. This should fix
a regression in 'arp -s ...' on interfaces such as xennet0 with
unusually long names.
I will request a pull-up to netbsd-5.
The IP_MINTTL option may be used on SOCK_STREAM sockets to discard
packets with a TTL lower than the option value. This can be used to
implement the Generalized TTL Security Mechanism (GTSM) according to
RFC 3682.
OK'ed by christos@.
If the IP_RECVTTL option is enabled on a SOCK_DGRAM socket, the
recvmsg(2) call will return the TTL of the received datagram. The
msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the TTL value.
Modeled after FreeBSD implementation.
Don't check gainst the last ack received, but the expected sequence number.
This makes RST handling independent of delayed ACK. From Joanne M Mikkelson.
will be processed when the radix "subsystem" is initialized -- all
users must be attached before any inits to know the max keylength.
Use of link sets is no longer required, and only attached domains
need to be considered.
- Properly authorize port binding in in_pcbsetport() and in6_pcbsetport()
- Pass struct sockaddr_in6 to in6_pcbsetport() instead of just the address,
so that we have a more complete context
- Adjust udp6_output() to craft a sockaddr_in6 as it calls in6_pcbsetport()
- Fix an issue in in_pcbbind() where we used the "dom_sa_any" pointer and
not a copy of it, pointed out by bouyer@, thanks!
Mailing list reference:
http://mail-index.netbsd.org/tech-net/2009/04/29/msg001259.html
- Extract guts to in_pcbbind_{addr,port}()
- Put the port auto-assignment logic in in_pcbsetport(), which looks very
similar to in6_pcbsetport()
- Fix a bug where "sin" was passed to kauth(9) without being set to
anything
No objections on tech-net@.
PRU_CONNECT cases of tcp_usrreq(). It seems they were forgotten a long
time ago.
Similar code in FreeBSD and OpenBSD passes the thread (credentials)/proc.
structures. This is far from optimal, but gets rid of iffy
#ifdef INET in radix.c. The radix bonsai still needs lots of love
before loading domains dynamically is possible...
When a TCP timer is disarmed (with callout_stop()) in the general case
callout_invoking() isn't checked, so the timer handler could be called run
when the current interrupt handler exits, athough the timer is disarmed.
This case cause bad things like TCPT_REXMT and TCPT_PERSIST being both pending,
causing a panic (see the PR for details).
Close the issue by aborting the handler if the timer is not callout_expired().
(the EXPIRED flag being cleared by callout_stop()).
When a link-layer address changes (e.g., ifconfig ex0 link
02🇩🇪ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.
Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)
Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.
Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.
Return consistent, appropriate error codes from network drivers.
Improve readability. KNF.
*** Details ***
In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.
In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.
Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.
In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.
Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:
switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}
Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,
switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}
unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).
In ipw(4), remove an if_set_sadl() call that is out of place.
In nfe(4), reuse the jumbo MTU logic in ether_ioctl().
Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.
Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.
Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.
Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.
In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.
Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.
In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.
Let ifioctl_common() handle SIOCGIFADDR.
Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.
In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.
bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
cleans up function and one small fix is that we now stop copying user
options to the mbuf when the _EOL is given, previously this function
would continue to copy options.
cases with in-kernel consumers which might send data on the same socket,
we can deadlock on the reassembly queue otherwise (observed while testing
accept filters).
support for specifying an accept filter for a service (mostly as a usage
example, but it can be handy for other things). Manual pages to follow
in a day or so.
OK core@.
- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.
With much feedback from matt@ and plunky@.
- Make ipflow_reap() private to ip_flow.c, and introduce ipflow_prune()
for external callers to use (avoids returning an ipflow * that is never
actually used anyway).
Otherwise we end up sending a dubious "0" whenever we cannot find a
proper association for the packet.
- Reset sack_newdata along with snd_nxt to avoid improper integer
arithmetics that lead to sending data from an incorrect place in the
stream, making it appear as corrupted.
Patch by Michael Van Elst, based on an analysis by Michael for the IPSEC
stuff and I for the SACK issue.
- Add a lot of missing selinit() and seldestroy() calls.
- Merge selwakeup() and selnotify() calls into a single selnotify().
- Add an additional 'events' argument to selnotify() call. It will
indicate which event (POLL_IN, POLL_OUT, etc) happen. If unknown,
zero may be used.
Note: please pass appropriate value of 'events' where possible.
Proposed on: <tech-kern>
- add a comment to explain why:
+ * We start with 1, because 0 doesn't work with linux, which
+ * considers timestamp 0 in a SYN packet as a bug and disables
+ * timestamps.
revision 1.230
date: 2005/06/30 02:58:28; author: christos; state: Exp; lines: +20 -4
Normalize our PAWS code with Free and Open, as mentioned in tech-security.
reviewed by christos@ and matt@.
- All three functions are included in the kernel by default.
They call a backend function cpu_in_cksum after possibly
computing the checksum of the pseudo header.
- cpu_in_cksum is the core to implement the one-complement sum.
The default implementation is moderate fast on most platforms
and provides a 32bit accumulator with 16bit addends for L32 platforms
and a 64bit accumulator with 32bit addends for L64 platforms.
It handles edge cases like very large mbuf chains (could happen with
native IPv6 in the future) and provides a good base for new native
implementations.
- Modify i386 and amd64 assembly to use the new interface.
This disables the MD implementations on !x86 until the conversion is
done. For Alpha, the portable version is faster.
Only record an IPSEC_OUT_DONE tag when we have finished the processing
In ip{,6}_output, check this tag to know if we have already processed this
packet.
Remove some dead code (IPSEC_PENDING_TDB is not used in NetBSD)
Fix pr/36870
primarily used with TCP SYN and RST packets and such packets are less than
the smallest sized packet that an IP stack is allowed to fragment, we simply
set ip_id to 0 for all packets 68 bytes or less.
Add if_set_sadl() that both sets the link-layer address length and
replaces the current link-layer address with a new one, and use it
throughout the tree.
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.
Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.
Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.
Make ND6_HINT an inline, lowercase subroutine, nd6_hint.
I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
in_pcbbind().
Okay dyoung@.
Note that the network code is another candidate for major cleanup... also
note that this issue is likely to be present in netinet6 code, too.
instead of adding/subtracting our own IPv4 header.
There are many benefits: gre(4) needn't grok the outer encapsulation
header any longer, so this simplifies the gre(4) code. The IP
stack needn't grok GRE, so it is simplified, too. gre(4) will
benefit from optimizations in the socket code. Eventually, gre(4)
will gain an IPv6 encapsulation with very few new lines of code.
There is a small performance loss. A 133 MHz, 486-class AMD Elan
sinks/sources a TCP stream over GRE with about 93% the throughput
of the old code. TCP throughput on a 266 MHz, 586-class AMD Geode
is about 96% the throughput of the old code. A 175-MHz ADM5120
(MIPS) only sinks a TCP stream over GRE at about 90% of the old
code; I am still investigating that.
I produced stripped-down versions of sosend() and soreceive() for
gre(4) to use. They are guaranteed not to block, so they can be
called from a software interrupt and from a socket upcall,
respectively.
A kernel thread is no longer necessary for socket transmit/receive,
but I didn't get around to removing it, yet.
Thanks to Matt Thomas for suggesting the use of stripped-down socket
code and software interrupts, and to Andrew Doran for advice and
answers concerning software interrupts, threads, and performance.
closer to its single caller in if_eon.c, try to move fewer bytes
by moving the IP header forward instead of moving the tail of the
mbuf backward, and use m_adj(9) instead of fiddling directly with
mbuf data members.
ifreq * arguments to ether_addmulti() and ether_delmulti() to const
struct sockaddr *, since ether_{add,del}multi() only ever read the
sockaddr ifreq member, ifr_addr. Update uses in carp(4) and in
vlan(4).
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:
int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);
2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.
3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.
Shorten some pr_ctloutput staircases for readability.
4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():
struct mbuf *m_getsombuf(struct socket *so)
Create an mbuf and make its owner the socket `so'.
struct mbuf *m_intopt(struct socket *so, int val)
Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).
int fsocreate(..., int *fd)
Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.
void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)
Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.
socklen_t sockaddr_getlen(const struct sockaddr *sa)
Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.
const struct sockaddr *sockaddr_any(const struct sockaddr *sa)
Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.
const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)
Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.
because that's just going to cause problems down the road. (Suppose
we can have two CPUs in the network stack someday?) Instead, use
sockaddr_in_init() to initialize a sockaddr_in on the stack.
Use ifreq_setaddr() to initialize ifreq.ifr_addr.
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.
Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.
Avoid using sizeof(struct sockaddr_dl) in the kernel.
Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.
Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().
Constify: LLADDR() -> CLLADDR().
Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.
Ethernet multicast address.
Reported by jmcneill@, fix discussed with dyoung@, _very_ light testing by
myself, some more money for my dealer of anxiolytics after reading
ip_output()'s twisted code maze.
identify sockaddr_dl abuse that remains in the kernel, especially
the potential for overwriting memory past the end of a sockaddr_dl
with, e.g., memcpy(LLADDR(), ...).
Use sockaddr_dl_setaddr() in a few places.
at IPL_NET, because rtcache_check() may read the forwarding table.
Elsewhere, the kernel only blocks interrupts at priority IPL_SOFTNET
and below while it modifies the forwarding table, so rtcache_check()
could be reading the table in an inconsistent state. Use
rtcache_done(), instead.
XXX netinet/ip_flow.c and netinet6/ip6_flow.c are virtually identical.
XXX They should share code.
from the forwarding table's users:
Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.
Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.
Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).
Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).
Cosmetic:
Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.
Stop using variadic arguments for rip6_output(), it is
unnecessary.
Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.
Make rt_maskedcopy() easier to read by using meaningful variable
names.
Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.
Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.
One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.
I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.
Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.