and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.
Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.
Avoid using sizeof(struct sockaddr_dl) in the kernel.
Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.
Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().
Constify: LLADDR() -> CLLADDR().
Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.
identify sockaddr_dl abuse that remains in the kernel, especially
the potential for overwriting memory past the end of a sockaddr_dl
with, e.g., memcpy(LLADDR(), ...).
Use sockaddr_dl_setaddr() in a few places.
* Create the kernel thread in gre_clone_create() instead of trying
to create it in gre_ioctl(). (Thanks ad@ for suggesting it, and
pointing out that I can't kthread_create while I hold a spin
lock.) Run the thread always, but put it to sleep while the
gre(4) is not in UDP mode.
* Use sockaddr_in_init().
* Move some thread state off of the stack and into the softc.
* Extract subroutines gre_do_recv(), gre_do_send(), and gre_reconf()
from gre_thread1(), making the code more readable.
into sdl_data[].
Move the macro satocsdl() to net/if_dl.h, and introduce satosdl().
Add some helpers for initializing sockaddr_dl (sockaddr_dl_init),
for finding out the length to put in a sockaddr_dl's sdl_len member
(sockaddr_dl_measure), and for setting the link-layer address in
a sockaddr_dl to a new value (sockaddr_dl_setaddr).
Make sockaddr_copy() panic if the caller tries to copy a sockaddr
to a destination where it will not fit.
from the forwarding table's users:
Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.
Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.
Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).
Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).
Cosmetic:
Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.
Stop using variadic arguments for rip6_output(), it is
unnecessary.
Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.
Make rt_maskedcopy() easier to read by using meaningful variable
names.
Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.
Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.
One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.
I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.
Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
MRU to the link's MTU and initiate an MRU negotiation with the peer.
This is useful when the PPP session is bridged from Ethernet to ATM
by an ADSL modem (such as the Linksys AM200). Unless we negotiate the
lower MRU, the peer is unaware that 1500-byte packets will not make
it umolested across the link (the Linksys AM200 silently truncates them
to 1498 bytes, creating a nice PMTU blackhole).
Note that the PPP RFC says peers MUST accept 1500 byte packets,
regardless of the negotiated MRU, so most ISPs which use PPPoA will
probably still send 1500-byte packets. However, I persuaded my ISP
(Andrews and Arnold) to modify their software to generate an ICMP error
"fragment needed" for packets with IP.DF set which are larger than the
negotiated MRU. They will still forward non-IP.DF packets, with the
associated truncation, but at least my PMTU troubles have gone.
set to rn_walktree.
Introduce rt_walktree(), which applies a subroutine to every route
in a particular address family. Use it instead of rn_walktree()
virtually everywhere. This helps to hide the routing table
implementation.
- maintain space left correctly. the pointer is advanced by the size
of struct ifreq when length of address is small.
- single sizeof operator is enough to take the size of struct.
- the type of `sz' must be singed type since it is/was compared against to
the variable which may become negative.
- no need to traverse rest of interfaces once we got an error. note that
the latter `break' statement was inside inner loop.
compatibility with the older ioctls. This avoids stack smashing and
abuse of "struct sockaddr" when ioctls placed "struct sockaddr_foo's" that
were longer than "struct sockaddr".
XXX: Some of the emulations might be broken; I tried to add code for
them but I did not test them.
Fix a defect in the locking of file descriptors as we delegate a
UDP socket from userland to the kernel. Move sc_fp out of sc_soparm.
Synchronize access to sc_fp by gre_ioctl() and the kernel thread
using a condition variable. For simplicity's sake, make it the
kernel helper thread's responsibility to close its UDP socket.
instead of before, because if_detach() may cause the cache to be
reloaded. (I already fixed this in both etherip(4) and gre(4).
Ewww, rampant code duplication.)
route_in6, struct route_iso), replacing all caches with a struct
route.
The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.
Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.
DETAILS
1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:
struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);
sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.
sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.
The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).
2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.
3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:
int rtcache_setdst(struct route *, const struct sockaddr *);
rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.
It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.
4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
because if_detach() may cause us to transmit a packet, which
ordinarily entails reloading the route cache. This fixes a bug
where the kernel would panic later in rtflush(). Thanks Michael
Earnhart for reporting the bug.
In gre_output(), do not leak mbufs.
increase ifi_noproto. If the GRE header contains routing options,
increase the input-error count, ifi_ierrors.
While I am here, make some cosmetic changes: remove unnecessary
'proto' argument from gre_input3(). Shorten some staircases.
truth" of either integers or pointers, so that it's clear
what's going on. Remove superfluous () from return statements.
bcmp -> memcmp, bcopy -> memcpy.
Misc. cosmetic: join some lines, remove a few empty lines, remove
spaces from type casts. Don't open-code IFNET_FOREACH(). Shorten
some staircases.
hooks that signal the interface's departure run before IPv6 sends
messages to indicate that it is leaving its multicast groups; when
pf filters the departure messages, it does not recognize the output
interface, so it complains at the departure of gre65, for example:
pf_test6: kif == NULL, if_xname gre65
I have changed if_detach() so that it calls pr_usrreq(PRU_PURGEIF)
before pfil_run_hooks(PFIL_IFNET_DETACH), instead of the other way
around. That quiets the pf_test6: messages.
If ip6_forward successfully forwards a packet, a cache, in this case a
ip6flow struct entry, will be created. ether_input and friends will
then be able to call ip6flow_fastforward with the packet which will then
be passed to if_output (unless an issue is found - in that case the packet
is passed back to ip6_input).
ok matt@ christos@ dyoung@ and joerg@
parentheses in return statements.
Cosmetic: don't open-code TAILQ_FOREACH().
Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.
Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.
Constify:
1 Introduce const accessor for route->ro_dst, rtcache_getdst().
2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.
3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.
4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
Extract subroutine rn_delete1() to ease RADIX_MPATH integration,
should we ever do that.
Remove RN_DEBUG code that does not compile.
Join some lines of the type
type var1;
type var2;
type var3;
making
type var1, var2, var3.
Break lines of the type if (expr) stmt1; else stmt2; so that normal
people can read them.
reason it's dropped before passing to bridge: when a vlan interface is
in promisc mode, it will loop the packet back to ether_input() with
M_PROMISC set, and when carp calls ether_input again the flag is still
there and the packet is dropped. If the carp interface doesn't take
the packet M_PROMISC is set just after is needed anyway.
Tested on a box with multiple carp on vlans, no comments about this patch
on tech-net@
like PR 35272 and 35318. When the kernel is compiled with
-DRTCACHE_DEBUG, all rtcache entries are logged to a list with the place
they got initialised. This allows overwrites, double inits and other
manual messing to be detected.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.
Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.
Based on the earlier patch from dyoung@, reviewed and discussed with
him.