Commit Graph

3180 Commits

Author SHA1 Message Date
ozaki-r 302ac4ae0e Make some rt_timer functions and variables static
No functional change.
2016-10-21 09:01:44 +00:00
ozaki-r e07d22aae6 Avoid temporal dangling reference 2016-10-21 03:04:33 +00:00
ozaki-r df2616c199 Remove unused rtcache_lookup_noclone 2016-10-18 09:43:20 +00:00
ozaki-r 3be3142886 Don't hold global locks if NET_MPSAFE is enabled
If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@
2016-10-18 07:30:30 +00:00
roy 103ec7fade Mark arprequest static and introduce arpannounce so that gratuitous
ARP requests are only send from valid addresses.
2016-10-11 12:32:30 +00:00
joerg 3cc071a817 Since IFF_MULTICAST's value can't be represented without implicit cast
as signed short, make if_flags unsigned.
2016-10-08 17:40:12 +00:00
joerg fce2ad3141 Use uint8_t for opt as some of the values don't fit into the (positive)
range of a signed char.
2016-10-08 17:37:32 +00:00
ozaki-r 8f4376cb6f Fix race condition on ifqueue used by traditional netisr
If a underlying network device driver supports MSI/MSI-X, RX interrupts
can be delivered to arbitrary CPUs. This means that Layer 2 subroutines
such as ether_input (softint) and subsequent Layer 3 subroutines (softint)
which are called via traditional netisr can be dispatched on an arbitrary
CPU. Layer 2 subroutines now run without any locks (expected) and so a
Layer 2 subroutine and a Layer 3 subroutine can run in parallel.

There is a shared data between a Layer 2 routine and a Layer 3 routine,
that is ifqueue and IF_ENQUEUE (from L2) and IF_DEQUEUE (from L3) on it
are racy now.

To fix the race condition, use ifqueue#ifq_lock to protect ifqueue
instead of splnet that is meaningless now.

The same race condition exists in route_intr. Fix it as well.

Reviewed by knakahara@
2016-10-03 11:06:06 +00:00
ozaki-r 14c6c81f32 Add missing return 2016-10-03 07:13:29 +00:00
christos 9c7db92f68 MFREE -> m_free 2016-10-02 14:16:02 +00:00
roy 9288933cf3 Set dstaddr in in_ifinit so that sppp consumers announce the correct
dstaddr in routing messages.
2016-09-29 15:04:17 +00:00
roy fb8ac61d3a Ensure we only call pfil_run_hooks if if_init succeeded.
While here, improve improve some logging.
2016-09-29 14:08:40 +00:00
roy 98b0d70fff Add ifam_pid and ifam_addrflags to ifa_msghdr.
Re-version RTM_NEWADDR, RTM_DELADDR, RTM_CHGADDR and NET_RT_IFLIST.
Add compat code for old version.
2016-09-21 10:50:22 +00:00
roy 70c02d276f Drop hostIsNew from in_ifinit, let the function work out if the address
has changed.
Sync address flag setup with the IPv6 counterpart.
When scrubbing the address, or setting up the address fails, restore the
old address flags as well as the old address.
2016-09-16 14:17:23 +00:00
pgoyette 06402e0a42 Move kern_ctf.c into the dtrace_fbt module (the only place it is used)
rather than including in kernels with KDTRACE_HOOKS defined.  Update
the dtrace_fbt module to depend on the zlib module.

Bump kernel version to avoid module mismatch.

Welcome to 7.99.38 !
2016-09-16 03:10:45 +00:00
christos 9faa331084 Always do the mbuf checks. The packet filters (npf) expect the mbuf to be
pulled-up. (Krists Krilovs)
2016-09-15 14:40:43 +00:00
knakahara feed793fff kmem_alloc(size, KM_SLEEP) return value NULL check is not required any more.
kmem_alloc(size, KM_SLEEP) is already fixed, that is, it never return NULL.
see: sys/kern/subr_kmem.c:r1.62
2016-09-15 06:59:32 +00:00
roy 2a96904518 Call ifmedia_delete_instance() for safety. 2016-09-14 11:54:42 +00:00
roy 169f562155 Introduce IFM_GENERIC.
This allows use of the media interface, but without media as such.
It's sole purpose is to facilitate the reporting of the link status.
2016-09-14 11:43:08 +00:00
roy 64cd8217dd Add interface media for sppp consumers.
While there is no actual media to select,
the ioctl is used to query link status from userland.
2016-09-14 10:58:38 +00:00
joerg 12114a9bee Report link state changes for sppp consumers. The link is considered up,
if the current phase is SPPP_PHASE_NETWORK, otherwise it is down. Useful
when using dhcpcd for DHCPv6 PD.
2016-09-13 19:51:12 +00:00
pgoyette 9a575d933d Move tun.c into the module's own directory, since it is specific to the
module subsystem.
2016-09-10 03:26:10 +00:00
pgoyette eb2b2a3e77 Add a dummy "tun" module, whose only job is to trigger an autoload of
required module "if_tun".  This allows access to /dev/tunN to autload
the require interface module.

XXX There's might be a better place/name for net/tun.c
2016-09-10 02:20:10 +00:00
christos 6324edf045 PR/51464: Shoichi YAMAGUCHI: chap authenticator of pppoe does not work 2016-09-09 12:41:14 +00:00
ozaki-r e50076cac8 Fix tun_enable
Before the rearrangement of ifaddr initializations (in.c,v 1.169),
when we called tun_enable via ioctl(SIOCINITIFADDR), an ifaddr
in question was inserted in the interface address list. However,
after the change the ifaddr isn't in the list at that point. So
we shouldn't rely on that we can find the ifaddr by
IFADDR_READER_FOREACH. Instead simply use the ifaddr passed by
ioctl(SIOCINITIFADDR).
2016-09-07 10:27:44 +00:00
ozaki-r 86bbab733a Rename tuncreate to tun_enable
It should be more proper.
2016-09-07 10:24:57 +00:00
ozaki-r 6dc1297521 Support tun devices on rump kernels 2016-09-05 02:25:37 +00:00
ozaki-r 586dc438d1 Fix typo in a comment 2016-09-05 01:57:54 +00:00
roy 5ffab45fad Split out sysctl_iflist into sysctl_iflist_if and sysctl_iflist_addr.
Setup a command and function pointer in one case statement
instead of having a seconary case statement within a loop.
This makes the code much easier to follow, and possibly to add more compat
in the future.

Don't panic when running an old binary without compat support.
2016-09-01 19:04:30 +00:00
knakahara 4ba8ad0bcb gif(4)'s if_output() is already MP-safe. It should enable IFEF_OUTPUT_MPSAFE. 2016-09-01 06:50:09 +00:00
ozaki-r 60ae5732ab KNF; replace white spaces with hard tabs
No functional change.
2016-08-29 03:31:59 +00:00
knakahara 7b53554dfc fix: failed to create sysctl entries for module version gif(4).
The sysctl entries are below 2 entries.
    - net.inet.ip.gifttl
    - net.inet6.ip6.gifhlim
2016-08-18 11:44:22 +00:00
knakahara 950010ff93 eliminate stf(4)'s dependency on gif(4).
stf(4) depends on not gif(4) but ip_encap.
2016-08-18 11:38:58 +00:00
maxv 3b447e1ea5 Memory leak, found by brainy; not tested, but obvious enough 2016-08-15 09:14:12 +00:00
christos bbc7b97ded remove MODULAR/COMPAT_40 ifdef. 2016-08-15 05:10:33 +00:00
christos a74f222e94 fix rump tests. 2016-08-14 11:03:21 +00:00
christos dc521af48a kill unknown sessions ifdef, link set for sysctl. 2016-08-11 15:16:07 +00:00
kre 58cdd27b4a Avoid init'ing lo0 twice ... which rump kernels do without this hack.
If rump gets fixed, this could be removed (though it is harmless in
any case.)

This should fix several more of the currently failing ATF tests.
2016-08-11 13:57:02 +00:00
kre d6b671c40b On the first day (that being the eighth day of the eighth month,) the
building was completed only to discover that within there lay havoc.

On the second day all just groaned and moaned, and it must be someone
else's problen.

On the third day, St. Martin stepped in and traced the culprit, which
provided inspiration, and a correction was made.

Forevermore all were agog at just how such a trivial thing could do
so much damage...


OK...   to be a little less vague.   The loopback interface is a truly
"special" thing, and rump knew that - and treated it very specially.
Unfortunately, when the loopback interface is changed, and rump does
not keep up, bad things happen.

This (overall) might, or might not, be the correct fix - but for now
it appears to work.   If someone, sometime, finds a better way to
deal with the issues of the loopback interfaces true majesty, feel
free to revert this and do it another way.
2016-08-10 10:09:42 +00:00
knakahara 79536286e1 follow renaming ifmpls to mpls.
This fixes i386 ALL build.
2016-08-10 05:56:30 +00:00
kre 73176b8121 create++, destroy-- 2016-08-08 16:40:39 +00:00
pgoyette 822a1852a7 Typo (missing ampersand) 2016-08-08 09:51:39 +00:00
pgoyette 2bdbc91cbc Final part of fixing if_tap. The module needs to attach its cdevsw (and
detach it later).
2016-08-08 09:42:33 +00:00
pgoyette 1a2474cc13 Add the devsw_attach stuff, since the tap device can be accessed via
/dev/tap

This is a partial fix for the build.  The rump tap component will be
fixed shortly.
2016-08-08 09:23:13 +00:00
pgoyette b70b5f48f4 Partial fix - restore creation of our sysctl subtree for _MODULE
builds (it's already handled for built-in builds via registration
in a link-set).

XXX The build is still broken in rump...
2016-08-08 07:35:12 +00:00
roy 5f2c1f90c4 Fix compile without modules. 2016-08-08 07:23:27 +00:00
pgoyette 093a61346d Don't try to set-up our sysctl sub-tree if we're built-in - this will
happen automatically (via "registration" of the setup function in a
link-set), and if we're not a module, the SYSCTL_SETUP_PROTO() will
not have declared a function prototype!
2016-08-08 02:50:05 +00:00
christos 1d8e08d4c8 modularize some more drivers and merge the module glue 2016-08-07 17:38:33 +00:00
pgoyette 69aa6fadc2 For modular configurations, always build with PPPOE_TERM_UNKNOWN_SESSIONS
defined, and provide a sysctl variable for enabling/disabling the option.

Update man page accordingly.
2016-08-07 01:59:43 +00:00
pgoyette 5328fd5944 Modularize the pppoe driver 2016-08-06 23:46:30 +00:00
pgoyette abe8e5ebff Destroy the mutex when detaching ppp. Otherwise on a re-attach (ie,
module reload) we can end up with a panic "lock already initialized"
2016-08-06 22:54:34 +00:00
pgoyette 5dd5da5fa0 Catch up with the renaming of module ppp --> if_ppp and avoid warning
messages at boot (or module load) time.
2016-08-06 22:38:18 +00:00
pgoyette c075b7e43f Modularize the sppp_subr stuff so it can be shared by pppoe and lmc
drivers as they get modularized.
2016-08-06 22:03:45 +00:00
christos c20d3604bf make strip and slip modular, and cosmetic for ppp. 2016-08-06 12:48:23 +00:00
pgoyette 57989e45da Change the internal name of the module to match its external (file
system) name.  Otherwise "bad things" can happen, such as modload(8)
being able to load a second copy!
2016-08-06 12:42:40 +00:00
pgoyette e7e9717270 Modularize the ppp driver, and adjust dependencies of the compressor
modules.

For now, this is still included as a built-in module in GENERIC kernels.
2016-08-06 02:35:05 +00:00
pgoyette 87cc8eeb68 Actually commit the changes for making this into a loadable module. The
module infrastructure was committed earlier, but the "guts" of the commit
were somehow missed.
2016-08-05 08:56:36 +00:00
ozaki-r 9b97df78c1 CID 1364759: fix using uninitialized value 2016-08-05 00:52:02 +00:00
ozaki-r a403cbd4f5 Apply pserialize and psref to struct ifaddr and its variants
This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
2016-08-01 03:15:30 +00:00
ozaki-r 74fbff1628 Revert "Revert part of "Switch the address list of intefaces to pslist(9)" (r1.220)"
netstat now uses sysctl instead of kvm(3) to get address information from
the kernel. So we can avoid the issue introduced by the reverted commit
(PR kern/51325) by updating netstat with the latest source code.
2016-08-01 02:50:03 +00:00
alnsn db4395c55a Don't trigger BJ_ASSERT(false) on invalid BPF_Jxxx opcode in jmp_to_op().
This change helps survive AFL fuzzing without calling bpf_validate() first.

Also change alu_to_op() function to have a similar interface.
2016-07-29 20:29:38 +00:00
martin 1982ce327f PR kern/51371: avoid shifting negative values 2016-07-28 07:54:31 +00:00
rjs 0dd1bf859c Restore correct test for return value from aarpresolve(). 2016-07-25 23:46:09 +00:00
knakahara ef38d1c0f4 Reduce KERNEL_LOCK thereby ifq_lock is used by default.
if_snd is always excluded by ifq_lock now. So, the KERNEL_LOCK in if_transmit()
which serializes packet output processing is not needed now.
2016-07-22 07:13:56 +00:00
knakahara b14a26cee3 Toward NET_MPSAFE-on in future, if_snd uses if_snd->ifq_lock by default.
That can reduce confusing difference between NET_MPSAFE on and off.
2016-07-22 07:09:40 +00:00
ozaki-r 60f4a9a871 Make complex RTM_CHANGE code understandable
Tests for route change added recently would reduce the possibility of
regressions.

Reviewed by ryo@
2016-07-21 03:45:56 +00:00
ozaki-r 4f21a42704 Apply pserialize to some iterations of IP address lists 2016-07-20 07:37:51 +00:00
pgoyette 7c20c5d3bb Fix regression introduced in tests/net/bpf and tests/net/bpfilter
The rump code needs to call devsw_attach() in order to assign a dev_major
for bpf;  it then uses this to create rumps /dev/bpf node.  Unfortunately,
this leaves the devsw attached, so when the bpf module tries to initialize
itself, it gets an EEXIST error and fails.

So, once rump has figured what the dev_major should be, call devsw_detach()
to remove the devsw.  Then, when the module initialization code calls
devsw_attach() it will succeed.
2016-07-19 02:47:45 +00:00
pgoyette b380080ebc Now that we're only calling devsw_attach() in the modular driver, it
is not ok for the driver/module to already exist.  So don't ignore
EEXIST.
2016-07-17 02:48:07 +00:00
pgoyette 3c6a976d2d Don't initialize variables that no longer exist in built-in module. 2016-07-17 01:16:30 +00:00
pgoyette 5233aa279b Don't try to call devsw_attach() for built-in driver code. 2016-07-17 01:03:46 +00:00
martin 17f84ba4fd Mark the rt_timer callout MPSAFE and move the first reset a few lines
down so the the workqueue is properly prepared (the latter being more
a cosmetical change). Ok: ozaki-r@
2016-07-15 09:25:47 +00:00
hannken da7d165fe0 rtcache_clear_rtentry: use LIST_FOREACH_SAFE as the element gets
removed from the list.
2016-07-13 09:56:20 +00:00
msaitoh 71fbb921c3 KNF. No functional change. 2016-07-11 11:31:49 +00:00
ozaki-r dca032f9f4 Run timers in workqueue
Timers (such as nd6_timer) typically free/destroy some data in callout
(softint). If we apply psz/psref for such data, we cannot do free/destroy
process in there because synchronization of psz/psref cannot be used in
softint. So run timer callbacks in workqueue works (normal LWP context).

Doing workqueue_enqueue a work twice (i.e., call workqueue_enqueue before
a previous task is scheduled) isn't allowed. For nd6_timer and
rt_timer_timer, this doesn't happen because callout_reset is called only
from workqueue's work. OTOH, ip{,6}flow_slowtimo's callout can be called
before its work starts and completes because the callout is periodically
called regardless of completion of the work. To avoid such a situation,
add a flag for each protocol; the flag is set true when a work is
enqueued and set false after the work finished. workqueue_enqueue is
called only if the flag is false.

Proposed on tech-net and tech-kern.
2016-07-11 07:37:00 +00:00
ozaki-r 8d7855e3f6 Revert part of "Switch the address list of intefaces to pslist(9)" (r1.220)
Reverting the whole change set just messes up many files uselessly
because changes to them (except for if.h) are proper.

- Remove ifa_pslist_entry that breaks kvm(3) users (e.g., netstat -ia)
- Change IFADDR_{READER,WRITER}_* macros to use old IFADDR_* (or just NOP)
  for now

Fix PR kern/51325
2016-07-11 02:14:27 +00:00
ozaki-r 4133a8eca8 Replace macros to get an IP address with proper inline functions
The inline functions are more friendly for applying psz/psref;
they consist of only simple interations.
2016-07-08 04:33:30 +00:00
ozaki-r 9e4c2bda8a Switch the address list of intefaces to pslist(9)
As usual, we leave the old list to avoid breaking kvm(3) users.
2016-07-07 09:32:01 +00:00
msaitoh 8bc54e5be6 KNF. Remove extra spaces. No functional change. 2016-07-07 06:55:38 +00:00
ozaki-r 350c782980 Switch the IPv4 address list to pslist(9)
Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.
2016-07-06 08:42:34 +00:00
ozaki-r e8fc43f394 Add and use pslist(9)-based hashtable for IPv4 addresses
Note that we leave the old hashtable to keep vmstat -H working.
2016-07-06 05:27:52 +00:00
knakahara 59de60c0af fix evbsh3 build 2016-07-05 07:42:51 +00:00
knakahara 7216411e28 Don't use IFQ_ENQUEUE/IFQ_DEQUEUE in the MP-ified interface without whole lock.
That causes reoder per flow, as there can be below situation
    (1) CPU#A does IFQ_DEQUEUE
    (2) CPU#A sleeps by some reason
    (3) CPU#B does IFQ_DEQUEUE
2016-07-04 04:43:46 +00:00
knakahara 6284b35822 make gif(4) and ip_encap MP-ify 2016-07-04 04:40:13 +00:00
knakahara c544c867ee make encap_lock_{enter,exit} interruptable. 2016-07-04 04:35:09 +00:00
knakahara a6d7586724 fix: gif(4) receive side race
A panic cause in rn_match() called by encap[46]_lookup(). The reason is that
gif(4) does not suspend receive packet processing in spite of suspending
transmit packet processing while anyone is doing gif(4) ioctl.
2016-07-04 04:22:47 +00:00
knakahara b71542e5bc let gif(4) promise softint(9) contract (2/2) : ip_encap side
The last commit does not care encaptab. This commit fixes encaptab race which
is used not only gif(4).
2016-07-04 04:17:25 +00:00
knakahara d81cd78ed7 let gif(4) promise softint(9) contract (1/2) : gif(4) side
To prevent calling softint_schedule() after called softint_disestablish(),
the following modifications are added
    + ioctl (writing configuration) side
      - off IFF_RUNNING flag before changing configuration
      - wait softint handler completion before changing configuration
    + packet processing (reading configuraiotn) side
      - if IFF_RUNNING flag is on, do nothing
    + in whole
      - add gif_list_lock_{enter,exit} to prevent the same configuration is
        set to other gif(4) interfaces
2016-07-04 04:14:47 +00:00
ozaki-r 51f82533da Tweak p2p_rtrequest as well for ifaddr initialization change
We need to set lo0ifp to rt->rt_ifp if the interface is RTF_LOCAL.

Fix PR kern/51301.
2016-07-04 01:36:06 +00:00
ozaki-r 17b4eb5edd Make sure to free all interface addresses in if_detach
Addresses of an interface (struct ifaddr) have a (reverse) pointer of an
interface object (ifa->ifa_ifp). If the addresses are surely freed when
their interface is destroyed, the pointer is always valid and we don't
need a tweak of replacing the pointer to if_index like mbuf.

In order to make sure the assumption, the following changes are required:
- Deactivate the interface at the firstish of if_detach. This prevents
  in6_unlink_ifa from saving multicast addresses (wrongly)
- Invalidate rtcache(s) and clear a rtentry referencing an address on
  RTM_DELETE. rtcache(s) may delay freeing an address
- Replace callout_stop with callout_halt of DAD timers to ensure stopping
  such timers in if_detach
2016-07-01 05:22:33 +00:00
ozaki-r b9853dec6e Add debug helper function for interface addresses
It checks whether all addresses of an interface being destroyed
are freed (no reference remains) at the end of if_detach.
2016-07-01 05:15:40 +00:00
ozaki-r 5abd4152b3 Get rid of duplicate prototype of ifafree 2016-06-30 09:44:58 +00:00
ozaki-r d4c71b34a8 Make sure that ifaddr is published after its initialization finished
Basically we should insert an item to a collection (say a list) after
item's initialization has been completed to avoid accessing an item
that is initialized halfway. ifaddr (in{,6}_ifaddr) isn't processed
like so and needs to be fixed.

In order to do so, we need to tweak {arp,nd6}_rtrequest that depend
on that an ifaddr is inserted during its initialization; they explore
interface's address list to determine that rt_getkey(rt) of a given
rtentry is in the list to know whether the route's interface should
be a loopback, which doesn't work after the change. To make it work,
first check RTF_LOCAL flag that is set in rt_ifa_addlocal that calls
{arp,nd6}_rtrequest eventually. Note that we still need the original
code for the case to remove and re-add a local interface route.
2016-06-30 01:34:53 +00:00
ozaki-r a577cf2aa0 Introduce if_is_deactivated
Checking ifp->if_output == if_nulloutput is too implicit.

No functional change.
2016-06-28 02:36:54 +00:00
ozaki-r ca4ea29d93 Add missing NULL checks for m_get_rcvif_psref 2016-06-28 02:02:56 +00:00
knakahara d5a6877789 fix spelling mistake pointed out by roy@n.o 2016-06-27 10:09:02 +00:00
knakahara c261887dfc gif(4) does not need link state changing interrupts 2016-06-27 09:06:56 +00:00
knakahara 48648fbd1f reduce link state changing softint if it is not required
ok by ozaki-r@n.o
2016-06-27 08:58:50 +00:00
knakahara 461bf703a5 eliminate unused softint for gif(4) Rx 2016-06-24 06:32:47 +00:00
knakahara ccd8e6e6b3 eliminate gif(4) Tx softint
- remove gif_si from struct gif_softc
- directly call gifintr() from gif_output()
- rename gifintr() to gif_start()
- remove Tx softint processing from gif_set_tunnel() and gif_delete_tunnel()
2016-06-24 04:38:12 +00:00
knakahara e41ad56b34 fix: locking about IFQ_ENQUEUE and ALTQ
- If NET_MPSAFE is not defined, IFQ_LOCK is nop. Currently, that means
  IFQ_ENQUEUE() of some paths such as bridge_enqueue() is called parallel
  wrongly.
- If ALTQ is enabled, Tx processing should call if_transmit() (= IFQ_ENQUEUE
  + ifp->if_start()) instead of ifp->if_transmit() to call ALTQ_ENQUEUE()
  and ALTQ_DEQUEUE().
  Furthermore, ALTQ processing is always required KERNEL_LOCK currently.
2016-06-22 10:44:31 +00:00
ozaki-r 4b54d200aa Remove unnecessary NULL checks of ifa->ifa_addr
If it's NULL, it should be a bug. There many IFADDR_FOREACH that don't do
NULL check. If it can be NULL, they should fire already.
2016-06-22 07:48:17 +00:00
ozaki-r 4badfc204a Make sure returning ifp from in6_select* functions psref-ed
To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.
2016-06-21 10:25:27 +00:00
knakahara 36a04107c0 fix ATF net/carp failure 2016-06-21 03:54:04 +00:00
ozaki-r 43c5ab376f Replace ifp of ip_moptions and ip6_moptions with if_index
The motivation is the same as the mbuf's rcvif case; avoid having a pointer
of an ifnet object in ip_moptions and ip6_moptions, which is not MP-safe.

ip_moptions and ip6_moptions can be stored in a PCB for inet or inet6
that's life time is different from ifnet one and so an ifnet object can be
disappeared anytime we get it via them. Thus we need to look up an ifnet
object by if_index every time for safe.
2016-06-21 03:28:27 +00:00
ozaki-r f634cbd046 Introduce if_index_t 2016-06-21 03:07:54 +00:00
knakahara 10b439df13 fix: kern/51259 2016-06-20 22:59:36 +00:00
knakahara 69c0ff04b9 apply if_start_lock() to L2 callers which call ifp->if_start() of device derivers 2016-06-20 08:30:58 +00:00
knakahara 910b5ef147 introduce if_start_lock()
if_start_lock() calls ifp->if_start() holding KERNEL_LOCK if it is required.
2016-06-20 08:24:36 +00:00
knakahara edf75a0767 fix: i386 build failure 2016-06-20 08:18:59 +00:00
knakahara ded2d2ff82 fix: should not assert IFEF_OUTPUT_MPSAFE in bridge_output() 2016-06-20 08:14:41 +00:00
knakahara 0fb3bf480b tentative fix for ATF(net/if_bridge/t_bridge) 2016-06-20 07:23:56 +00:00
knakahara 16fd605766 make bridge_output MP-safe, so that bridge(4) can enable IFEF_OUTPUT_MPSAFE.
making MP-scalable is future work.
2016-06-20 07:06:06 +00:00
knakahara 58e1ba6e9c make ether_output() MP-safe, so that if_ether can enable IFEF_OUTPUT_MPSAFE.
making MP-scalable is future work.
2016-06-20 07:01:45 +00:00
knakahara 163d060d6a make looutput() MP-safe, so that lo(4) can enable IFEF_OUTPUT_MPSAFE.
making MP-scalable is future work.
2016-06-20 06:52:44 +00:00
knakahara 95fc145695 apply if_output_lock() to L3 callers which call ifp->if_output() of L2(or L3 tunneling). 2016-06-20 06:46:37 +00:00
ozaki-r cb0a0a2168 Do psref_target_destroy after purging packets
Because purging packets may try to send packets, which requires psref yet.
2016-06-20 06:41:30 +00:00
knakahara 6eeb832178 introduce if_output_lock()
if_output_lock() calls ifp->if_output() holding KERNEL_LOCK if it is required.
2016-06-20 06:41:15 +00:00
knakahara 53dcbf18a2 introduce if_extflags (was if__pad1) 2016-06-20 06:35:05 +00:00
riastradh 5364e46502 Fix error branches of if_sdl_sysctl.
Can't release the psref if we didn't even find the interface!
2016-06-16 15:18:33 +00:00
ozaki-r f0423d34e6 Use if_get_byindex instead of if_byindex for MP-safe 2016-06-16 03:03:33 +00:00
ozaki-r e1135cd9b9 Use curlwp_bind and curlwp_bindx instead of open-coding LP_BOUND 2016-06-16 02:38:40 +00:00
ozaki-r fe6d427551 Avoid storing a pointer of an interface in a mbuf
Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
2016-06-10 13:31:43 +00:00
ozaki-r d938d837b3 Introduce m_set_rcvif and m_reset_rcvif
The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
2016-06-10 13:27:10 +00:00
pgoyette 532241d269 Create separate modules for i2c_bitbang and bpf_filter so these files
can be included in kernels which need them without also duplicating
them in other modules.  Removes the duplicate symbols I found which
prevented loading i2c and bpf modules after having fixed PR 45125.
2016-06-07 01:06:27 +00:00
ozaki-r 0c85f1e532 Optimize if_get_byindex by adding __predict_true 2016-05-31 04:05:01 +00:00
knakahara d60d8acaf7 modify some functions static. no functional change. 2016-05-31 03:52:40 +00:00
alnsn 44dbc048e9 Adapt to the new version of sljit@r313. 2016-05-29 17:20:22 +00:00
ozaki-r 9b5dfda043 Fix RT_IN_PRINT 2016-05-17 15:21:14 +00:00
ozaki-r cba53ba7a5 Tidy up route_output
Avoid jumping into the middle of a switch statement, use a function instead.
2016-05-17 12:58:21 +00:00
ozaki-r 2ccabb7fd7 Apply if_get and if_put to bridge(4) 2016-05-16 01:23:51 +00:00
ozaki-r fd97b511fe Replace ifnet_lock with if_get and if_put
ifnet_lock is a dedicated method to safely destroy an interface over running
ioctl operations. Replace it with a more generic mechanism using psref(9).
2016-05-16 01:16:24 +00:00
ozaki-r b59e9a736b Introduce if_get, if_get_byindex and if_put
The new API enables to obtain an ifnet object with protected by psref(9).
It is intended to be used where an obtained ifnet object is used over
sleepable operations.
2016-05-16 01:06:31 +00:00
ozaki-r 040205ae93 Protect ifnet list with psz and psref
The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.
2016-05-12 02:24:16 +00:00
christos d7ac614b0a Don't increment the reference count only when it was 0...
From Jean-Jacques.Puig
2016-05-09 15:05:15 +00:00
roy 88be616fa6 Allow multicast/broadcast packets from a bridge member to other members.
Note this should just call bridge_broadcast when more locking issues are
resolved.
2016-05-04 18:59:55 +00:00
skrll 18ee45f035 Typo in comment 2016-05-02 08:03:23 +00:00
ozaki-r a931ad2746 Constify remaining rtentry of if_output (fix build) 2016-04-28 14:40:09 +00:00
knakahara 9f3a294e64 introduce new ifnet MP-scalable sending interface "if_transmit". 2016-04-28 01:37:17 +00:00
ozaki-r 2cf7873b92 Constify rtentry of if_output
We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
2016-04-28 00:16:56 +00:00
ozaki-r 3f11155830 Stop using rt_gwroute completely 2016-04-26 09:31:18 +00:00
ozaki-r 9e0f6c5e36 Stop using rt_gwroute on packet sending paths
rt_gwroute of rtentry is a reference to a rtentry of the gateway
for a rtentry with RTF_GATEWAY. That was used by L2 (arp and ndp)
to look up L2 addresses. By separating L2 nexthop caches, we don't
need a route for the purpose and we can stop using rt_gwroute.
By doing so, we can reduce referencing and modifying rtentries,
which makes it easy to apply a lock (and/or psref) to the
routing table and rtentries.

One issue to do this is to keep RTF_REJECT behavior. It seems it
was broken when we moved rtalloc1 things from L2 output routines
(e.g., ether_output) to ip_hresolv_output, but (fortunately?)
it works unexpectedly. What we mistook are:
- RTF_REJECT was checked for any routes in L2 output routines,
  but in ip_hresolv_output it is checked only when the route
  is RTF_GATEWAY
- The RTF_REJECT check wasn't copied to IPv6 (nd6_output)

It seems that rt_gwroute checks hid the mistakes and it looked
work (unexpectedly) and removing rt_gwroute checks unveil the
issue. So we need to fix RTF_REJECT checks in ip_hresolv_output
and also add them to nd6_output.

One more point we have to care is returning an errno; we need
to mimic looutput behavior. Originally RTF_REJECT check was
done either in L2 output routines or in looutput. The latter is
applied when a reject route directs to a loopback interface.
However, now RTF_REJECT check is done before looutput so to keep
the original behavior we need to return an errno which looutput
chooses. Added rt_check_reject_route does such tweaks.
2016-04-26 09:30:01 +00:00
roy 18c890be0c Set rtm_pid = curproc->p_pid for a few more messages. 2016-04-25 15:43:49 +00:00
ozaki-r 0c74cec625 Check error of rt_setgate and rt_settag 2016-04-25 14:38:08 +00:00
ozaki-r 156a79f975 Don't rt_setkey twice 2016-04-25 14:30:42 +00:00
ozaki-r c429bd747c Fix errno on rt_setgate error
I bet it's not EDQUOT (Disc quota exceeded).
2016-04-25 10:55:01 +00:00
christos 89a51b8a0a CID 1358673: dead code 2016-04-24 18:08:40 +00:00
christos 005582cf7d CID 1210544: Tainted scalar 2016-04-24 17:56:31 +00:00
christos e4c50db432 CID 980345: missing breaks 2016-04-24 17:32:06 +00:00
christos 107dc46d92 CID 980057, 980058, use strlcpy() 2016-04-24 16:59:15 +00:00
martin 2b65c0c0c6 Add missing breaks (cosmetic change only) 2016-04-23 12:15:38 +00:00
roy aabc63fcaa Change used from int to bool.
If used, abort the loop because we think we're already at the end.
2016-04-22 00:25:42 +00:00
christos a56964c222 /32 and /128 are valid netmasks. 2016-04-20 15:46:08 +00:00
knakahara b76ec0b083 IFQ_ENQUEUE refactor (3/3) : eliminate pktattr argument from IFQ_ENQUEUE caller 2016-04-20 09:01:03 +00:00
knakahara 40b1061c07 IFQ_ENQUEUE refactor (2/3) : eliminate pktattr argument from altq implemantation 2016-04-20 08:58:48 +00:00
knakahara 6190bb13a7 IFQ_ENQUEUE refactor (1/3) : add altq_pktattr fields to m_pkthdr
Reviewed by joerg@n.o and tls@n.o, thanks.
2016-04-20 08:56:32 +00:00
ozaki-r cc3dd2e07b Apply psref(9) to bridge(4)
Note that there is an issue that ioctls for an interface and a destruction
of the interface can run in parallel and it causes race conditions on
bridge as well (it rarely happens). The issue will be addressed in the
interface common code (if.c).
2016-04-19 07:10:22 +00:00
ozaki-r d81e97fa58 Remove BRIDGE_MPSAFE switch and enable MP-safe code by default
We need to enable it by default because bridge_input now runs
in softint, but bridge_input w/o BRIDGE_MPSAFE was designed as
it runs in hardware interrupt.

Note that there remains a racy code in bridge_output; it will be
solved in the upcoming change (applying psref(9)).
2016-04-19 07:03:12 +00:00
ozaki-r c49e748c57 Hide PPPoE variables from if_ethersubr.c
This improves modularity of if_pppoe.

From s-yamaguchi@IIJ
2016-04-15 01:31:29 +00:00
ozaki-r 4f0eb37aac ddb: rename show arptab to show routes
show arptab command of ddb is now inappropriate because it actually dumps
routes but arp entries aren't routes anymore. So rename it to show routes
and move the code from if_arp.c to route.c.

ok christos@
2016-04-13 00:47:01 +00:00
ozaki-r 6a7ab186c9 Don't use radix tree API directly 2016-04-11 09:21:18 +00:00
ozaki-r 1fcfead163 Remove out-dated comments and unnecessary splsoftnet for pool_{get,put} 2016-04-11 08:26:33 +00:00
ozaki-r ba236d2e0a Fix usage of pslist(9)
Pointed out by riastradh@.
2016-04-11 05:40:47 +00:00
ozaki-r 9129220b60 Move #include <sys/pslist.h> inside #ifdef _KERNEL for building brconfig 2016-04-11 03:46:47 +00:00
ozaki-r 814cd05c8b Use pslist(9) in bridge(4)
This adds missing memory barriers to list operations for pserialize.
2016-04-11 02:04:14 +00:00
christos d5e7bf8bbf - remove printf
- fix indent
2016-04-08 12:01:22 +00:00
christos 5d7cee0467 Use sockaddr_dl_init 2016-04-07 21:41:02 +00:00
christos c586dc1f9c remove useless cast. 2016-04-07 04:04:47 +00:00
christos b988d754df - tidy up error messages
- add a length argument to arpresolve()
- add KASSERT for overflow
2016-04-07 03:22:15 +00:00
christos cba63d3064 Don't create an RTM_MISS message for every route allocation.
GC unused code and variables.
2016-04-07 03:09:56 +00:00
christos ee5f11c12c pretty-print link addresses. 2016-04-06 18:04:58 +00:00
christos d5ee3894c1 Don't interpret routing requests by interface index as arp entry additions! 2016-04-06 17:34:33 +00:00
ozaki-r 59c50f3fa9 Fill rtm_addrs properly
This fixes that arp(8) on some archs (only 32bit?) shows "(weird)"
for every entries unexpectedly.

Confirmed on evbarm by ryo@ and i386 by me.
2016-04-06 08:45:46 +00:00
ozaki-r 25d196eae4 Fill sdl with sockaddr_dl_init
And add an assertion of if_addrlen and ll_addr.

From christos@
2016-04-06 07:59:26 +00:00
pgoyette b05cba24c7 Add modular dependency on zlib module. 2016-04-05 23:44:05 +00:00
pgoyette fb73b1d308 Update dependency: zlib is only needed for the swcrypto device, not for
any other component of opencrypto.
2016-04-05 22:51:01 +00:00
ozaki-r b7639842cb Unbreak build of kernels without INET 2016-04-05 10:03:33 +00:00
ozaki-r 09973b35ac Separate nexthop caches from the routing table
By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
  - sysctl(NET_RT_DUMP) doesn't return them
  - If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
  - RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
  - It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
  - -[no]cloning remains because it seems there are users
  - -[no]connected is introduced and recommended
    to be used instead of -[no]cloning
- route show/netstat -r drops some flags
  - 'L' and 'c' are not seen anymore
  - 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
  a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
  http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html
2016-04-04 07:37:07 +00:00
ozaki-r 2a8dd88007 Don't request returning rtentry if not use it 2016-04-01 09:52:39 +00:00
ozaki-r 8b4ba7b737 Remove unnecessary RTTIMER_CALLOUT macro
rttimer#rtt_func never be NULL.
2016-04-01 09:00:27 +00:00
ozaki-r 2d846e45e6 Make some global variables static 2016-04-01 02:00:14 +00:00
ozaki-r fb96f2a80f Remove unused global bridge list
Pointed out by riastradh@
2016-03-28 04:38:04 +00:00
ozaki-r 3084fea4ef Constify rt_newmsg's arguments 2016-03-24 06:18:27 +00:00
knakahara 969d82c4f7 add drop count which means the sum of struct if_percpuq's per-CPU queues.
ok by ozaki-r@n.o
2016-03-23 07:05:28 +00:00
ozaki-r f1d17afed8 Fix LIST_FOREACH argument 2016-03-23 05:44:01 +00:00
ozaki-r 85320fb21e Use LIST_FOREACH instead of LIST_FOREACH_SAFE
No need to use *_SAFE because we don't remove any items in the loop.
2016-03-23 04:56:21 +00:00
mrg 66d72dce7d minimal changes necessary to link into an INET6-less kernel. 2016-03-18 10:09:46 +00:00
ozaki-r 5472fb5b21 Add missing percpu_putref to error path 2016-03-07 01:41:55 +00:00
knakahara e80f101289 To eliminate gif_softc_list linear search, add extra argument to encapsw.pr_ctlinput(). 2016-02-26 07:35:17 +00:00
roy 72b9424275 Implement a queue for if_link_state_change() calls to fix a race condition
introduced in the prior patch.

The queue has capacity to store 8 link state changes, if it overflows then
the oldest state change is lost, but the oldest DOWN state change is
preserved to ensure any subsequent UP state changes reflect properly.

Because there are only 3 states to queue, the queue itself is implemented
by storing 2-bit numbers in a bigger one.
To increase the size of the queue, just increase the size of the backing
store to a bigger number.
2016-02-19 20:05:43 +00:00
ozaki-r 1926a64c9e Remove workaround for GATEWAY
The workaround was introduced because lltable/llentry uses rwlock
but it may be executed in hardware interrupt due to fast forward.
Now we don't run fast forward in hardware interrupt anymore, so
we can remove the workaround.
2016-02-16 01:31:26 +00:00
ozaki-r 297068212d Run if_link_state_change in softint
if_link_state_change can execute the network stack that is expected to
not run in hardware interrupt (at least now), however network drivers
may call it in hardware interrupt. Avoid that by introducing a new
softint for if_link_state_change.

The original patch is provided by mlelstv@ and tweaked a bit by me.

Should fix PR kern/50602.
2016-02-15 08:08:04 +00:00
ozaki-r b7a310ca27 Simplify bridge(4)
Thanks to introducing softint-based if_input, the entire bridge code now
never run in hardware interrupt context. So we can simplify the code.

- Remove spin mutexes
  - They were needed because some code of bridge could run in
    hardware interrupt context
  - We now need only an adaptive mutex for each shared object
    (a member list and a forwarding table)
- Remove pktqueue
  - bridge_input is already in softint, using another softint
    (for bridge_forward) is useless
  - Packet distribution should be down at device drivers
2016-02-15 01:11:41 +00:00
ozaki-r 057a6a480f Don't share struct work, instead have one per softc
Pointed out by riastradh@
2016-02-10 06:30:23 +00:00
ozaki-r 28e7d22e93 Fix build 2016-02-09 14:43:16 +00:00
ozaki-r 9c4cd06355 Introduce softint-based if_input
This change intends to run the whole network stack in softint context
(or normal LWP), not hardware interrupt context. Note that the work is
still incomplete by this change; to that end, we also have to softint-ify
if_link_state_change (and bpf) which can still run in hardware interrupt.

This change softint-ifies at ifp->if_input that is called from
each device driver (and ieee80211_input) to ensure Layer 2 runs
in softint (e.g., ether_input and bridge_input). To this end,
we provide a framework (called percpuq) that utlizes softint(9)
and percpu ifqueues. With this patch, rxintr of most drivers just
queues received packets and schedules a softint, and the softint
dequeues packets and does rest packet processing.

To minimize changes to each driver, percpuq is allocated in struct
ifnet for now and that is initialized by default (in if_attach).
We probably have to move percpuq to softc of each driver, but it's
future work. At this point, only wm(4) has percpuq in its softc
as a reference implementation.

Additional information including performance numbers can be found
in the thread at tech-kern@ and tech-net@:
http://mail-index.netbsd.org/tech-kern/2016/01/14/msg019997.html

Acknowledgment: riastradh@ greatly helped this work.
Thank you very much!
2016-02-09 08:32:07 +00:00
christos 654889b793 Do less work under the kernel lock, otherwise dhcpcd aborting causes us
to deadlock.
2016-02-01 16:32:28 +00:00
ozaki-r a8e10f3452 Tidy up
- KNF
- Remove obsolete ifdefs for other OSes
- Remove unnecessary else block

No functional change.
2016-01-28 04:37:01 +00:00
knakahara 2656d88eb3 fix my wrong modification 2016-01-28 00:28:11 +00:00
knakahara b546d5277b implement encapsw instead of protosw and uniform prototype.
suggested and advised by riastradh@n.o, thanks.

BTW, It seems in_stf_input() had bugs...
2016-01-26 05:58:05 +00:00
riastradh e588d95c25 Back out previous change to introduce struct encapsw.
This change was intended, but Nakahara-san had already made a better
one locally!  So I'll let him commit that one, and I'll try not to
step on anyone's toes again.
2016-01-22 23:27:12 +00:00
riastradh 87bc652e3d Don't abuse struct protosw for ip_encap -- introduce struct encapsw.
Mostly mechanical change to replace it, culling some now-needless
boilerplate around all the users.

This does not substantively change the ip_encap API or eliminate
abuse of sketchy pointer casts -- that will come later, and will be
easier now that it is not tangled up with struct protosw.
2016-01-22 05:15:10 +00:00
riastradh 7c7b1739c8 Revert previous: ran cvs commit when I meant cvs diff. Sorry!
Hit up-arrow one too few times.
2016-01-21 15:41:29 +00:00
riastradh b41d562bd0 Give proper prototype to ip_output. 2016-01-21 15:27:48 +00:00
riastradh 65a8f527af Eliminate struct protosw::pr_output.
You can't use this unless you know what it is a priori: the formal
prototype is variadic, and the different instances (e.g., ip_output,
route_output) have different real prototypes.

Convert the only user of it, raw_send in net/raw_cb.c, to take an
explicit callback argument.  Convert the only instances of it,
route_output and key_output, to such explicit callbacks for raw_send.
Use assertions to make sure the conversion to explicit callbacks is
warranted.

Discussed on tech-net with no objections:
https://mail-index.netbsd.org/tech-net/2016/01/16/msg005484.html
2016-01-20 21:43:59 +00:00
knakahara d7b9bb29c0 Refactor protosw codes in gif(4). No functional change.
- remove unnecessary include
    - reduce scopes
2016-01-18 06:08:26 +00:00
knakahara 1c5d304e9c eliminate ip_input.c and ip6_input.c dependency on gif(4) 2016-01-08 03:55:39 +00:00
ozaki-r db10df1ae6 Fix the destruction of the afdata lock
Pointed out by mlelstv@
2016-01-04 09:08:38 +00:00
knakahara 143a395d31 Revert extra wating codes.
PR kern/50522 is actually fixed by sys/kern/kern_softint.c:r1.42, so waiting
codes in if_gif.c is not required.
2016-01-04 07:50:08 +00:00
alnsn 40bda2ee5c Replace the nsaveds() function with #define NSAVEDS 3. No functional change.
Patch from Michael McConville.
2015-12-29 21:49:58 +00:00
ozaki-r a532303129 Tweak return value handling
rtrequest1 ensures to return an rtentry on success.
2015-12-22 01:59:21 +00:00
mlelstv 30470de52f make DDB print ipv6 addresses too 2015-12-17 12:17:13 +00:00
mlelstv 378c085bac handle delayed cksums also for ipv6 2015-12-17 12:16:21 +00:00
ozaki-r 66d9895f20 Fix memory leak of llentry#la_opaque
llentry#la_opaque which is for token ring is allocated in arp.c
and freed in arp.c when freeing llentry. However, llentry can be
freed from other places, e.g., lltable_free. In such cases,
la_opaque is never freed.

To fix that, add a new callback (lle_ll_free) to llentry and
register a destruction function of la_opque to it. On freeing a
llentry, we can surely free la_opque via the callback.
2015-12-17 02:38:33 +00:00
christos 43eac92e53 don't free mbuf twice.
XXX: pullup 7.
2015-12-16 23:14:42 +00:00
ozaki-r 213b8d3cc6 Fix token_rif extractions from llentry 2015-12-16 05:44:59 +00:00
knakahara a00e94f4ff PR kern/50522: gif(4) ioctl causes panic while someone is using the gif(4) interface.
It is required to wait other CPU's softint completion before disestablishing
the softint handler.
2015-12-11 07:59:14 +00:00
knakahara ef0c59f955 revert KASSERT. It should use 'if' instead of KASSERT.
see updated(later than r1.18) kmem(9) man.
2015-12-11 04:29:24 +00:00
knakahara 5b880df5ec kmem_zalloc(, KM_SLEEP) must not return NULL. 2015-12-10 08:11:03 +00:00
knakahara eaf1fb5902 add NULL check 2015-12-10 01:20:12 +00:00
knakahara 849e83fa25 gif(4) uses kmem_alloc APIs instead of malloc. 2015-12-09 05:56:24 +00:00
knakahara 5e4601c62c Refactor gif_set_tunnel(). No functional change. 2015-12-09 03:33:32 +00:00
knakahara 118f179f3d Improve gif_set_tunnel() rollback code. 2015-12-09 03:31:28 +00:00
knakahara c705cd3ca5 gif(4): Infinite recursion calls prevention code works again now.
The prevention code haven't worked since gif(4) was changed
to use softint(9). To work this prevention, git_output uses
m_tag(9) like FreeBSD and OpenBSD.

I tested with following code.
====================
# ifconfig gif0 create
# ifconfig gif0 tunnel 10.1.1.1  10.1.1.2
# ifconfig gif0 inet 192.168.100.1 192.168.100.100

# ifconfig gif1 create
# ifconfig gif1 tunnel 192.168.100.1 192.168.100.100
# ifconfig gif1 inet 192.168.101.1 192.168.101.101

# ifconfig gif2 create
# ifconfig gif2 tunnel 192.168.101.1 192.168.101.101
# ifconfig gif2 inet 192.168.102.1 192.168.102.102

# ping -w 1 -c 1 192.168.102.102
# dmesg | tail -n 1
gif0: recursively called too many times(2)
====================
2015-12-04 02:26:11 +00:00
knakahara 48ec8fb3d6 LIST_REMOVE should be done before clearing members of the list element. 2015-12-03 03:03:58 +00:00
knakahara 44351c996d remove extra encap_detach().
encap_detach() is already done in gif_delete_tunnel()->in{,6}_gif_detach().
2015-12-03 02:50:49 +00:00
ozaki-r f373fa78e6 Fix build dependency of if_llatbl.c
if_llatbl.c is required if inet or inet6 is enabled. Depending on ether
doesn't suit for NDP case.
2015-11-26 01:41:20 +00:00
ozaki-r ecd5b23eef Use lltable/llentry for NDP
lltable and llentry were introduced to replace ARP cache data structure
for further restructuring of the routing table: L2 nexthop cache
separation. This change replaces the NDP cache data structure
(llinfo_nd6) with them as well as ARP.

One noticeable change is for neighbor cache GC mechanism that was
introduced to prevent IPv6 DoS attacks. net.inet6.ip6.neighborgcthresh
was the max number of caches that we store in the system. After
introducing lltable/llentry, the value is changed to be per-interface
basis because lltable/llentry stores neighbor caches in each interface
separately. And the change brings one degradation; the old GC mechanism
dropped exceeded packets based on LRU while the new implementation drops
packets in order from the beginning of lltable (a hash table + linked
lists). It would be improved in the future.

Added functions in in6.c come from FreeBSD (as of r286629) and are
tweaked for NetBSD.

Proposed on tech-kern and tech-net.
2015-11-25 06:21:26 +00:00
ozaki-r a84874a1a0 Remove an ifnet object from the global list before destructing it 2015-11-20 08:10:36 +00:00
christos 88b3ee5eb5 Add handling of VLAN packets in if_bridge where the parent interface supports
them (Jean-Jacques.Puig@espci.fr). Factor out the vlan_mtu enabling and
disabling code.
2015-11-19 16:23:54 +00:00
knakahara fd06f20054 fix CID 980463 2015-11-11 03:57:57 +00:00
knakahara e96c855269 fix panic after "ifconfig gifX tunnel src dst" failed for the reason of address pair duplication.
e.g.
    ====================
    # ifconfig gif0 create
    # ifconfig gif0 tunnel 192.168.0.1 192.168.0.2
    # ifconfig gif0 inet 172.16.0.1/24 172.16.0.2
    # route add 10.1.0.0/24 172.16.0.1

    # ifconfig gif1 create
    # ifconfig gif1 tunnel 192.168.0.1 192.168.0.3

    # ifconfig gif0 tunnel 192.168.0.1 192.168.0.3
    ifconfig: SIOCSLIFPHYADDR: Can't assign requested address # expected
    # ping 10.1.0.1
    (panic)
    ====================
2015-11-11 02:57:17 +00:00
christos 04542e546d correct mistake in previous 2015-11-10 18:22:46 +00:00
christos fa11598f4c CID 980463: Provide common error path for rollback. Remove extra check for
success.
2015-11-10 17:59:37 +00:00
ozaki-r e7339922fb Improve lock traces and add reference traces 2015-11-05 06:50:51 +00:00
christos 805ab1e001 Simplify even further and fix non-modular kernels:
We cannot use the init at attach() trick, because other npf ext modules
will load before the attach function is called on non modular kernels.
2015-10-29 15:19:43 +00:00
christos c0dba4ec09 remove bogus KASSERT, there are error paths that don't satisfy this.
XXX: should improve error reporting to userland.
2015-10-28 01:54:10 +00:00
christos 32f4b28226 modules don't define MODULAR. 2015-10-27 19:58:09 +00:00
christos afd95c9bf1 simplify (and fix) logic. 2015-10-27 19:31:55 +00:00
maxv 2f8be1878d Harmless alloc inconsistency; make sure the exact same argument is given to
kmem_alloc/kmem_free. Found by Brainy.
2015-10-20 14:46:45 +00:00
ozaki-r e4a5751875 Stop using softnet_lock (fix possible deadlock)
Using softnet_lock for mutual exclusion between lltable_free and
arptimer was wrong and had an issue causing a deadlock between
them;  lltable_free waits arptimer completion by calling
callout_halt with softnet_lock that is held in arptimer, however
lltable_free also holds llentry's lock that is also held in
arptimer so arptimer never obtain the lock and both never go
forward eventually.  We have to pass llentry's lock to
callout_halt instead.
2015-10-20 07:35:15 +00:00
martin d041befd31 Ifdef npf_init() the same way as all it's callers are protected. 2015-10-19 09:28:24 +00:00
christos a6022a4b9e Fix the code so that it works in all 3 cases: non-modular, modular/builtin,
modular/filesystem. In the non-modular case we initialize through attach.
In the modular/builtin case we define the module to be class misc so it
attaches late (after percpu is initialized) since driver modules attach
too early.  In the modular/filesystem case we define it to be a driver
module since we autoload it via /dev/npf open.
2015-10-19 00:29:57 +00:00
jmcneill f0bb3f7042 Defer initialization of built-in npf module until other pseudo-devices
are initialized. MODULE_CLASS_DRIVER modules are now initialized before
autoconfiguration starts, but npf_init has a dependency on percpu(9) which
doesn't work until CPUs have attached (at least on ARM).
2015-10-18 20:39:53 +00:00
christos 635094c1f5 needs to be driver, otherwise it will not load! 2015-10-18 18:48:01 +00:00
jmcneill 4e97921379 mark this MODULE_CLASS_MISC as npf_init cannot run when builtin driver modules are initialized 2015-10-17 13:53:40 +00:00
christos d522fec9f5 PR/49386: Ryota Ozaki: Add a mutex for bpf creation/removal to avoid races.
Add M_CANFAIL to malloc.
2015-10-14 19:40:09 +00:00
rjs 8c2654abca Add core networking support for SCTP. 2015-10-13 21:28:34 +00:00
roy 222d6fab6a arpresolve() now returns 0 on success otherwise an error code.
Callers of arpresolve() now pass the error code back to their caller,
masking out EWOULDBLOCK.

This allows applications such as ping(8) to display a suitable error
condition.
2015-10-13 12:33:07 +00:00
ozaki-r 6a74db0c04 Fix LLE_TRY_UPGRADE when GATEWAY
It's expected to return a value.
2015-10-09 01:50:09 +00:00
roy e600c51d95 Remove rt_ifa_localrequest().
In it's place, use rtrequest1() inside rt_ifa_addlocal() and
rtdeletemsg() inside rt_ifa_remlocal().

This removes the need for INET/INET6 specific code and allows
greater control over the creation of the local address route.
2015-10-07 09:44:26 +00:00
ozaki-r ef5da9a970 Enqueue frames to a curcpu's pktqueue
Currently RX can run on a CPU other than CPU#0, so always enqueuing
to a pktqueue of CPU#0 makes no sense. Let's use a curcpu's pktqueue,
although bridge_foward softint doesn't run in parallel without
NET_MPSAFE.

This is a temporal solution. We need a fundamental solution.
2015-10-07 08:48:04 +00:00
ozaki-r 0e7ec84c8c Fix typo 2015-10-02 03:08:26 +00:00
ozaki-r 99284d7cc5 Make GATEWAY (fastforward) work again
With GATEWAY (fastforward), the whole forwarding processing runs in
hardware interrupt context. So we cannot use rwlock for lltable and
llentry in that case.

This change replaces rwlock with mutex(IPL_NET) for lltable and llentry
when GATEWAY is enabled. We need to tweak locking only around rtree
in lltable_free. Other than that, what we need to do is to change macros
for locks.

I hope fastforward runs in softint some day in the future...
2015-09-30 07:12:32 +00:00
ozaki-r ad91e721ff Remove extra opt_gateway.h 2015-09-30 06:25:59 +00:00
ozaki-r fc47734756 Tweak mutex_enter(softnet_lock) position
The previous code took locks the following order:
- LLE_WLOCKs
- mutex_enter(softnet_lock)
- LLE_WUNLOCKs
- mutex_exit(softnet_lock)

This fix moves mutex_enter(softnet_lock) before LLE_WLOCKs.
2015-09-28 07:55:26 +00:00
ozaki-r 30818f7132 Fix race condition on la_rt between lltable_free and other places touching la_rt
We have to touch la_rt always with holding softnet_lock. And we have to
use callout_halt with softnet_lock instead of callout_stop for
la_timer (arptimer) because arptimer holds softnet_lock inside it.

This fix may solve a kernel panic christos@ encountered.
2015-09-09 01:26:50 +00:00
dholland 1fbab01a93 More on PR 41200: headers that declare ioctls should include sys/ioccom.h.
This covers (I think) all the MI headers outside of external/ (and dist/).
2015-09-06 06:00:59 +00:00
dholland 0be2106b75 Uses _IOR/_IOW/etc. and thus needs sys/ioccom.h. PR 41200 2015-09-05 20:01:21 +00:00
ozaki-r 5392bf8aeb Add refcnt constraint checks for debugging
It's useful to know where the constraint is violated (by extra rtfree).
It's enabled only if DEBUG because it's heavy (O(n)).
2015-09-03 02:04:31 +00:00
ozaki-r 54c4f3b688 Do rt_refcnt++ when set a rtentry to another rtentry's rt_gwroute
And also do rtfree when deref a rtentry from rt_gwroute.
2015-09-02 11:35:11 +00:00
pooka 1d2b607cca #if __NetBSD__ -> #if defined(__NetBSD__) 2015-08-31 12:57:45 +00:00
ozaki-r 8997ac8f09 Replace ARP cache (llinfo) with lltable/llentry
Highlights of the change are:
- Use llentry instead of llinfo to manage ARP caches
  - ARP specific data are stored in the hashed list
    of an interface instead of the global list (llinfo_arp)
- Fine-grain locking on llentry
- arptimer (callout) per ARP cache
  - the global timer callout with the big locks can be
    removed (though softnet_lock is still required for now)
- net.inet.arp.prune is now obsoleted
  - it was the interval of the global timer callout
- net.inet.arp.refresh is now obsoleted
  - it was a parameter that prevents expiration of active caches
  - Removed to simplify the timer logic, but we may be able to
    restore the feature if really needed

Proposed on tech-kern and tech-net.
2015-08-31 08:05:20 +00:00
ozaki-r 879526da38 Hook up lltable/llentry with the kernel (and rumpkernel)
It is built and initialized on bootup, but there is no user for now.

Most codes in in.c are imported from FreeBSD as well as lltable/llentry.
2015-08-31 08:02:44 +00:00
ozaki-r 85e26631cc Import lltable/llentry from FreeBSD
lltable/llentry is new L2 nexthop cache data structures that
store caches in each interface (struct ifnet). It is imported
to replace the current ARP cache implementation that uses the
global list with the big kernel lock, and provide fine-grain
locking for cache operations. It is also planned to replace
NDP caches.

The code is based on FreeBSD's lltable/llentry as of r286629
and tweaked for NetBSD.
2015-08-31 07:56:58 +00:00
ozaki-r 3aedc74443 Make rt_refcnt take into account rt_timer 2015-08-31 06:25:15 +00:00
rjs 34d5c6e6a9 Don't set M_PROTO1 in mbuf flags.
This was left over from the old usage of gif(4) with bridges.
2015-08-28 14:23:18 +00:00
pooka 1c4a50f192 sprinkle _KERNEL_OPT 2015-08-24 22:21:26 +00:00
ozaki-r 8a0c9bd6b5 Add an assertion; if rtcache has an rtentry, its refcnt must be > 0 2015-08-24 04:44:54 +00:00
christos e7ae23fd9e include "ioconf.h" to get the 'void <driver>attach(int count);' prototype. 2015-08-20 14:40:16 +00:00
uebayasi 17ee3e05f5 Honor pseudo attach decl generated by config(1). 2015-08-20 11:18:53 +00:00
ozaki-r c1f0857176 Remove extra rt_refcnt++ in rtalloc1
rtrequest has already done it. So we don't need to do it once more.

This fixes regressed behavior of ARP cache expiration which an expired
cache doesn't disappear.
2015-08-13 10:14:26 +00:00
ozaki-r e12cf6b309 Move rtfree to a common place
This change also plugs a missing rtfree on an error path.
2015-08-13 07:59:05 +00:00
ozaki-r 972f005299 Tidy up header inclusions 2015-08-12 02:20:31 +00:00
ozaki-r 55140c1926 Use time_uptime instead of time_second to avoid time leaps
Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.
2015-08-07 08:11:33 +00:00
ozaki-r 18566c8cca Fix PR 48104
So far bridge cannot receive frames via a member interface when the frames
come from another member interface. So when we assign an IP address to
a member interface, hosts connected to another member interface cannot
ping to the IP address. That behavior isn't expected. See PR 48104 for
more realistic examples of this issue.

The change does:
- drop M_PROMISC before ether_input, which allows a bridge member interface
  to receive a frame coming from another bridge member interface
- receive broadcast/multicast frames via all bridge member interfaces,
  which is required to receive IPv6 multicast packets destined to a
  multicast group belonging to a bridge member interface that is different
  from a packet arrival interface

roy@ helped testing of the fix, thanks!
2015-07-23 10:52:34 +00:00
ozaki-r 9eae87d0c8 Reform use of rt_refcnt
rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
  of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)
2015-07-17 02:21:08 +00:00
ozaki-r f2abd6a2e3 Move rt_gwroute operation out of stripoutput
We should do it in ip_hresolv_needed.
2015-07-14 08:44:59 +00:00
ozaki-r cd4fff4016 Remove unnecessary if_type setting
if_type is set as IFT_SLIP below.
2015-07-14 08:15:41 +00:00
ozaki-r 0317b9d373 KNF 2015-07-14 08:06:44 +00:00
rmind 810dfeba15 npfkern: eliminate INACTIVE_ID and use 0 for unregistered interfaces. 2015-07-12 23:51:53 +00:00
ozaki-r f81368b844 Use ip_hresolv_output for if_token as well
I thought we cannot apply ip_hresolv_output to if_token because
rt0 looked being needed by arpresolve in token_output. However,
rt0 is actually not used by arpresolve in NetBSD (see obsolete
ARPRESOLVE macro).
2015-07-01 03:39:36 +00:00
ozaki-r ca923dc320 Remove ifnet_addrs
We can assume that ifnet_addrs[ifp->if_index] is always the same as
ifp->if_dl, so we can replace ifnet_addrs[ifp->if_index] with ifp->if_dl
and remove ifnet_addrs entirely.

ok martin@
2015-06-29 09:40:36 +00:00
roy b4462420d3 Guard against the possibility the there is no ready address. 2015-06-08 08:21:49 +00:00
rmind 1662d4f47c - npfctl: fix the confusion in the parser (0/0 case with no other filter).
- Always populate the error dictionary, not only for DEBUG/DIAGNOSTIC.
2015-06-08 01:00:43 +00:00
ozaki-r 6ea8c2e666 Pull out route lookups from L2 output routines
Route lookups for routes of RTF_GATEWAY were done in L2 output
routines such as ether_output, but they should be done in L3
i.e., before L2 output routines. This change places the lookups
between L3 output routines (say ip_output) and the L2 output
routines.

The change is based on dyoung's patch submitted in the thread:
https://mail-index.netbsd.org/tech-net/2013/02/01/msg003847.html
You can find out detailed investigations by dyoung about the
issue in there.

Note that the change introduces a workaround for MPLS. ether_output
knew that it needs to fill the ethertype of a frame as MPLS,
based on a tag of an original route (rtentry), but now we don't
pass it to ehter_output. So we have to tell that in another way.
We use mtag to do so for now, which introduces some overhead.
We should fix it somehow in the future.

Discussed on tech-kern and tech-net.
2015-06-04 09:19:59 +00:00
martin de7496de88 Include <sys/socketvar.h> for softnet_lock. 2015-06-03 13:06:26 +00:00
hsuenaga 99a3c13a27 Obtain softnet_lock before entering IP networking stack from gif software
interrupt.
2015-06-03 02:17:51 +00:00
matt ef434b48cb Modify the BRDGGIFS and BRDGRTS cmds to be more COMPAT_NETBSD32 friendly.
(XXX whitespace)
2015-06-01 06:14:43 +00:00
roy 555a592508 Back out prior
gimpy1@ we don't #include driver .h in netbsd32
2015-06-01 00:15:07 +00:00
roy 643289594b Revert prior change, optionally include PPPOE and SPPP support again.
Fix compat_netbsd32 module building by enforcing both.
2015-05-31 23:40:19 +00:00
roy eeb5f0236a Revert prior as it's no longer needed. 2015-05-31 23:01:04 +00:00
roy 9f3fcd35ec Allow sppp to be #if NSPPP > 0 2015-05-31 22:09:38 +00:00
joerg adac2d746a Improve wording. 2015-05-30 19:14:46 +00:00
ozaki-r 37deb1d6e1 Remove leftover DECNET-related stuffs
No objection on tech-kern and tech-net.
2015-05-25 08:31:34 +00:00
ozaki-r b71bd7bda7 Remove leftover IPX-related stuffs
No objection on tech-kern and tech-net.
2015-05-25 08:29:01 +00:00
ozaki-r b8199900dc Remove leftover use of AF_NS and NS option
Unnecessary NETISR_NS is also removed.
2015-05-20 09:17:17 +00:00
martin c5617ba863 Implement SIOCIFGCLONERS for netbsd32, so ifconfig -C works. 2015-05-18 06:38:59 +00:00
rtr fd12cf39ee make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}
2015-05-02 17:18:03 +00:00
roy 505639d2f3 Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@
2015-05-02 14:41:32 +00:00