Commit Graph

84 Commits

Author SHA1 Message Date
joerg
8632294e2e Add a debug option for the route cache to help tracing down issues
like PR 35272 and 35318. When the kernel is compiled with
-DRTCACHE_DEBUG, all rtcache entries are logged to a list with the place
they got initialised. This allows overwrites, double inits and other
manual messing to be detected.
2007-01-05 16:40:08 +00:00
joerg
eb04733c4e Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.
2006-12-15 21:18:52 +00:00
dyoung
c308b1c661 Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route).  Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL.  Provide
in_rtcache() for adding a route to the chain.  Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches.  In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain.  In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
2006-12-09 05:33:04 +00:00
joerg
b49bdf49d7 Deinline rt_get_ifa. Keep it in route.c as it is part of the routing
API, even though rtsock.c is the only user right now.
2006-12-07 19:37:08 +00:00
joerg
d87b42b41f Deinline rt_replace_ifa and move rt_set_ifa and rt_set_ifa1 to
route.c as they are not used outside that file.
2006-12-07 19:20:14 +00:00
dyoung
7e5a475027 Replace the temporary variable ndst with rt_key(rt). This will
simplify the application of RADIX_MPATH patches.

No functional change intended.
2006-12-04 00:56:44 +00:00
dyoung
8a5a3d2d66 Paranoid protection against use after free: in rtfree(), set rt_ifa
and rt_ifp to NULL.
2006-12-04 00:52:47 +00:00
dyoung
b0520122af Cosmetic: remove extra empty line. 2006-12-04 00:48:59 +00:00
christos
168cd830d2 __unused removal on arguments; approved by core. 2006-11-16 01:32:37 +00:00
dyoung
e362cfc96f In rtalloc(), release our reference to the prior rtentry before
referencing a new rtentry.
2006-11-13 17:51:02 +00:00
dyoung
a25eaede91 Add a source-address selection policy mechanism to the kernel.
Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses.  Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

        1 Factor out some common code, producing rt_replace_ifa().

        2 Abbreviate a for-loop with TAILQ_FOREACH().

        3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
          IN_PRIVATE(), that are true for link-local unicast
          (169.254/16) and RFC1918 private addresses, respectively.
          Add the predicate IN_ANY_LOCAL() that is true for link-local
          unicast and multicast.

        4 Add IPv4-specific interface attach/detach routines,
          in_domifattach and in_domifdetach, which build #ifdef
          IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.
2006-11-13 05:13:38 +00:00
christos
4d595fd7b1 - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
2006-10-12 01:30:41 +00:00
tls
8cc016b4bc Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

	s=splfoo();
	/* frob data structure */
	splx(s);
	pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next.  "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them.  One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html
2006-10-05 17:35:19 +00:00
dogcow
f2d329dca0 remove more vestiges of CCITT, LLC, HDLC, NS, and NSIP. 2006-09-07 02:40:31 +00:00
kardel
de4337ab21 merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
  time.tv_sec -> time_second
- struct timeval mono_time is gone
  mono_time.tv_sec -> time_uptime
- access to time via
	{get,}{micro,nano,bin}time()
	get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
  Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
  NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
2006-06-07 22:33:33 +00:00
christos
29a12667b7 Coverity CID 855: Add a KASSERT for null route from successful rtrequest. 2006-04-15 02:19:00 +00:00
christos
4bb7462638 PR/33231: Anraud Degroote: Miscellaneous cleanups in the route code:
- use of 0 instead of NULL
    - questionnable macros
2006-04-10 19:06:37 +00:00
christos
95e1ffb156 merge ktrace-lwp. 2005-12-11 12:16:03 +00:00
christos
333e176687 - sprinkle const
- remove unneeded casts
- use more mem*() instead of b*() funcs.
2005-05-29 21:22:52 +00:00
perry
f07677dd81 nuke trailing whitespace 2005-02-26 22:45:09 +00:00
matt
d341be30f4 Change initialzie of domains to use link sets. Switch to using STAILQ.
Add a convenience macro DOMAIN_FOREACH to interate through the domain.
2005-01-23 18:41:56 +00:00
christos
a9703830cd Fix problem in previous commit; we need to create a new sockaddr. 2004-09-30 00:14:05 +00:00
christos
efff5f0097 PR/22849: Sean Boudreau: rtrequest() w/ RTM_DELETE not honouring netmask
as it does w/ RTM_ADD.
2004-09-29 21:19:33 +00:00
simonb
b5d0e6bf06 Initialise (most) pools from a link set instead of explicit calls
to pool_init.  Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

 Convert struct session, ucred and lockf to pools.
2004-04-25 16:42:40 +00:00
matt
7cf8938ddd ANSI-fy and some additional de-__P and constification. 2004-04-21 21:03:43 +00:00
matt
e3b919c754 Constify if.c radix.c and route.c (and fix related fallout). 2004-04-21 04:17:28 +00:00
agc
aad01611e7 Move UCB-licensed code from 4-clause to 3-clause licence.
Patches provided by Joel Baker in PR 22364, verified by myself.
2003-08-07 16:26:28 +00:00
fvdl
d5aece61d6 Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
2003-06-29 22:28:00 +00:00
darrenr
960df3c8d1 Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records.  The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
2003-06-28 14:20:43 +00:00
itojun
50a545a34b remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.
2002-11-12 02:10:13 +00:00
itojun
96910acf99 add an argument to rt_timer_remove_all(), to specify if we need to call
timeout routine on removal.
2002-11-12 01:37:30 +00:00
thorpej
83b3b86fd6 Fix a signed/unsigned comparison warning from GCC 3.3. 2002-08-26 01:42:28 +00:00
matt
2d83d27dfa Eliminate more commons. 2002-05-12 20:40:11 +00:00
thorpej
a180cee23b Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map).  Try to deal with this:

* Group all information about the backend allocator for a pool in a
  separate structure.  The pool references this structure, rather than
  the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
  to become available, but will still fail if it cannot callocate KVA
  space for the pages.  If this happens, carefully drain all pools using
  the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
  some pages, and use that information to make draining easier and more
  efficient.
* Get rid of PR_URGENT.  There was only one use of it, and it could be
  dealt with by the caller.

From art@openbsd.org.
2002-03-08 20:48:27 +00:00
lukem
34d65a3414 add RCSIDs 2001-11-12 23:49:33 +00:00
matt
b5e785f38d Switch to using queue access macros instead of refering to the member
fields explicitly.
2001-11-05 18:02:15 +00:00
itojun
3594efccf6 on RTM_DELETE, reduce refcnt on rt->rt_parent, to avoid leaks.
from IIJ seil team
2001-10-16 02:42:36 +00:00
itojun
dc1d7df811 do not initialize rmx_mtu on RTM_ADD.
on gateway change, copy rmx_mtu from gateway only under the following condition:
- current MTU is not locked
- current MTU was discovered via PMTUD

XXX if gateway has MTU == 0, current MTU is set to 0 and we are going to
rediscover PMTU again.  is it good or bad?
2001-07-26 05:47:37 +00:00
itojun
efe956a93f do not copy rmx_mtu on RTM_ADD/RESOLVE. the fragment was mistakenly
introduced on 1.25, from other *bsd via kame.  from thorpej
2001-07-25 07:13:44 +00:00
itojun
866fc79bb9 validate sa_len on equal() macro. without the change we may touch the content
of a2 beyond a2->sa_len mistakelnly.  sync with kame
2001-07-20 18:52:18 +00:00
thorpej
cbf41a143a bzero -> memset 2001-07-18 16:43:09 +00:00
itojun
e79a9123a3 use u_quad_t for rtstat.
not sure if it really matters, but short (32K) looks way too small given
recent fat pipes connecting *BSD boxes, and our great uptime :-).
2001-02-21 05:45:11 +00:00
itojun
f38fdf081e change non-intuitive function name. s/rtflushit/rtflushclone1/ 2001-01-27 11:07:59 +00:00
itojun
02adaaf197 cleanup cloned route when parent route (RTF_CLONING) goes away.
adds rt_parent to link parent from child (like NRL did, ours do refcnt
rt_refcnt properly).

bsdi rt_walkbranch would speedup the processing, but since the code will not
be visited too frequently, the current code (with rt_walktree) should be okay.
2001-01-27 10:39:33 +00:00
itojun
fee00b1a78 mark cloned routes with RTF_CLONED. present it with netstat -r by "c".
let static routes overwrite cloned routes, as cloned routes can come back again
if necessary.  behavior same as freebsd/bsdi, code partially from bsdi42.
(NRL rt->rt_parent was not added)
should fix PR 11916 and maybe some other PRs with ARP behavior.

recompilation of usr.sbin/route6d is suggested.
2001-01-27 04:49:31 +00:00
itojun
df9784d749 pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2 (UCB copyrighted).
have sys/net/route.c:rtrequest1(), which takes rt_addrinfo * as the argument.
pass rt_addrinfo all the way down to rtrequest, and ifa->ifa_rtrequest.
3rd arg of ifa->ifa_rtrequest is now rt_addrinfo * instead of sockaddr *
(almost noone is using it anyways).

benefit: the follwoing command now works.  previously we need two route(8)
invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

remove unsafe typecast in rtrequest(), from rtentry * to sockaddr *.  it was
introduced by 4.3BSD-reno and never corrected.

XXX is eon_rtrequest() change correct regarding to 3rd arg?
eon_rtrequest() and rtrequest() were incorrect since 4.3BSD-reno,
so i do not have correct answer in the source code.
someone with more clue about netiso-over-ip, please help.
2001-01-17 04:05:41 +00:00
itojun
93292b8aaa do not touch region after free 2000-12-11 07:52:48 +00:00
itojun
5eae50d991 update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
  route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
  route entries (there's upper limit, so bad guys cannot blow up our routing
  table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case
2000-12-09 01:29:45 +00:00
augustss
c1ebd1929a Kill some more register declarations. 2000-03-30 09:45:33 +00:00
thorpej
fc96443d15 New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
  resource allocation.
- Insertion and removal of callouts is constant time, important as
  this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.
2000-03-23 07:01:25 +00:00