set to rn_walktree.
Introduce rt_walktree(), which applies a subroutine to every route
in a particular address family. Use it instead of rn_walktree()
virtually everywhere. This helps to hide the routing table
implementation.
update in ICMP error messages. In the test case for this, the wrong
input was actually being used (it should be the headers from the previous
packet output) and therefore the expected results were also wildly wrong.
kern/36309
"state lock" flag (if-bound, gr-bound, floating) at the end of a
NAT rule. The new syntax is backwards-compatbile with the old
syntax.
PF (kernel): change the macro BOUND_IFACE() to the inline function
bound_iface(), and add a new argument, the applicable NAT rule.
Use both the flags on the applicable filter rule and on the applicable
NAT rule to decide whether or not to bind a state to the interface
or the group where it is created.
route_in6, struct route_iso), replacing all caches with a struct
route.
The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.
Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.
DETAILS
1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:
struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);
sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.
sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.
The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).
2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.
3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:
int rtcache_setdst(struct route *, const struct sockaddr *);
rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.
It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.
4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
special cases for it. This is to simplify the code to prepare it
for pullup to netbsd-4 and submitting back upstream. The change
was requested by martin@.
parentheses in return statements.
Cosmetic: don't open-code TAILQ_FOREACH().
Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.
Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.
Constify:
1 Introduce const accessor for route->ro_dst, rtcache_getdst().
2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.
3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.
4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.
Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.
Based on the earlier patch from dyoung@, reviewed and discussed with
him.
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).
Stale IPv6 and ISO route caches will be treated by separate patches.
Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.
Here are the details:
Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.
Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.
Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.
In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.
In gif(4), discard the workaround for stale caches that involves
expiring them every so often.
Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).
Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.
Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.
In domain initializers, use .dom_xxx tags.
KNF here and there.
Apply OpenBSD src/sys/net/pf.c rev 1.486 and 1.487:
1.486:
When synproxy sends packets to the destination host, make sure to copy
the 'tag' from the original state entry into the outgoing mbuf.
1.487:
When synproxy completes the replayed handshake and modifies the state
into a normal one, it sets both peers' sequence windows. Fix a bug where
the previously advertised windows are applied to the wrong side (i.e.
peer A's seqhi is peer A's seqlo plus peer B's, not A's, window). This
went undetected because mostly the windows are similar and/or re-
advertised soon. But there are (rare) cases where a synproxy'd connection
would stall right after handshake. Found by Gleb Smirnoff.
and if ALTQ and pf are both enabled, it leads to compile errors. So,
change all tests for ALTQ to ALTQ_NEW, which won't be defined.
This allows simultaneous compilation of pf and ALTQ and is a temporary
measure before the peter-altq brach is merged.
Tested and approved by Peter Postma.
- Add a few scopes to the kernel: system, network, and machdep.
- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.
- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.
- Update all relevant documentation.
- Add some code and docs to help folks who want to actually use this stuff:
* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.
This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...
* And of course, documentation describing how to do the above for quick
reference, including code samples.
All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:
http://kauth.linbsd.org/kauthwiki
NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:
- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.
(or if you feel you have to, contact me first)
This is still work in progress; It's far from being done, but now it'll
be a lot easier.
Relevant mailing list threads:
http://mail-index.netbsd.org/tech-security/2006/01/25/0011.htmlhttp://mail-index.netbsd.org/tech-security/2006/03/24/0001.htmlhttp://mail-index.netbsd.org/tech-security/2006/04/18/0000.htmlhttp://mail-index.netbsd.org/tech-security/2006/05/15/0000.htmlhttp://mail-index.netbsd.org/tech-security/2006/08/01/0000.htmlhttp://mail-index.netbsd.org/tech-security/2006/08/25/0000.html
Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).
Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.
Happy birthday Randi! :)
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
for a code path that should never actually happen (fr_newauth() should only
be called for auth rules - i.e. when fin_fr != NULL. If it is possible to
call fr_newauth() with fin_fr == NULL then this change introduces a
regression compared to prior importing of 4.1.13.
> revision 1.104
> date: 2006/01/18 22:03:21; author: dhartmei; state: Exp; lines: +2 -2
> fix a bug in the fragment cache (used for 'scrub fragment crop/drop-ovl',
> but not 'fragment reassemble'), which can cause some fragments to get
> inserted into the cache twice, thereby violating an invariant, and panic-
> ing the system subsequently. ok deraadt@
was expecting it to be set, thus ignored it.
bin/29509 - because ipft_cookie wasn't reset to 0 before making the ioctl
call for each variable, only the first name to find was used, each successive
call just used the cookie.
CVn: ----------------------------------------------------------------------
8 bytes of data were dropped.
- If the packet is a fragment, return. There is no UDP header in this case.
- Don't set the FI_SHORT flag. Already tested in `frpr_short()'.
- Remove unneeded test `!fin->fin_off'.
Approved by: Christos Zoulas <christos@netbsd.org>
ok yamt@
> MFC:
> Fix by dhartmei@
>
> replace finer-grained spl locking in pfioctl() with a single broad lock
> around the entire body. this resolves the (misleading) panics in
> pf_tag_packet() during heavy ioctl operations (like when using authpf)
> that occur because softclock can interrupt ioctl on i386 since SMP.
> patch from camield@.
ok yamt@
> MFC:
> Fix by dhartmei@
>
> ICMP state entries use the ICMP ID as port for the unique state key. When
> checking for a usable key, construct the key in the same way. Otherwise,
> a colliding key might be missed or a state insertion might be refused even
> though it could be inserted. The second case triggers the endless loop
> fixed by 1.474, possibly allowing a NATed LAN client to lock up the kernel.
> Report and test data by Srebrenko Sehic.
MFC:
Fix by dhartmei@
IPv6 packets can contain headers (like options) before the TCP/UDP/ICMP6
header. pf finds the first TCP/UDP/ICMP6 header to filter by traversing
the header chain. In the case where headers are skipped, the protocol
checksum verification used the wrong length (included the skipped headers),
leading to incorrectly mismatching checksums. Such IPv6 packets with
headers were silently dropped. Reported by Bernhard Schmidt.
ok deraadt@ dhartmei@ mcbride@
MFC:
Fix by mcbride@
Initialise init_addr in pf_map_addr() in the PF_POOL_ROUNDROBIN,
prevents a possible endless loop in pf_get_sport() with 'static-port'
Reported by adm at celeritystorm dot com in FreeBSD PR74930, debugging
by dhartmei@
ok mcbride@ dhartmei@ deraadt@ henning@
allowing rules to be set to match only ipv4/ipv6. And so ipnat must be updated
to actually set this field correctly but to keep things working for old
versions of ipnat (that will set this to 0), make the ioctl handler "update"
the 0 to a 4 to keep things working when people just upgrade kernels. This
forces NAT rule matching to be limited to ipv4 only, here forward, fixing
kern/28662
MFC:
Fix by dhartmei@
fix a bug that leads to a crash when binat rules of the form
'binat from ... to ... -> (if)' are used, where the interface
is dynamic. reported by kos(at)bastard(dot)net, analyzed by
Pyun YongHyeon.
state is not applicable. The fix just reverts the new code that blocked
packets where fr_addstate() fails. This is not correct in all cases, but
blocking them is a bit drastic and breaks existing functionality. The proper
fix is to change fr_addstate() to return:
- state added
- adding state failed
- adding state is not applicable
and then filter packets only in the second case. I am leaving this for someone
else.
MFC:
Fix by dhartmei@
The flag to re-filter pf-generated packets was set wrong by synproxy
for ACKs. It should filter the ACK replayed to the server, instead of
of the one to the client.
MFC:
Fix by dhartmei@
For RST generated due to state mismatch during handshake, don't set
th_flags TH_ACK and leave th_ack 0, just like the RST generated by
the stack in this case. Fixes the Raptor workaround.