Commit Graph

2555 Commits

Author SHA1 Message Date
knakahara
b71542e5bc let gif(4) promise softint(9) contract (2/2) : ip_encap side
The last commit does not care encaptab. This commit fixes encaptab race which
is used not only gif(4).
2016-07-04 04:17:25 +00:00
knakahara
d81cd78ed7 let gif(4) promise softint(9) contract (1/2) : gif(4) side
To prevent calling softint_schedule() after called softint_disestablish(),
the following modifications are added
    + ioctl (writing configuration) side
      - off IFF_RUNNING flag before changing configuration
      - wait softint handler completion before changing configuration
    + packet processing (reading configuraiotn) side
      - if IFF_RUNNING flag is on, do nothing
    + in whole
      - add gif_list_lock_{enter,exit} to prevent the same configuration is
        set to other gif(4) interfaces
2016-07-04 04:14:47 +00:00
ozaki-r
17b4eb5edd Make sure to free all interface addresses in if_detach
Addresses of an interface (struct ifaddr) have a (reverse) pointer of an
interface object (ifa->ifa_ifp). If the addresses are surely freed when
their interface is destroyed, the pointer is always valid and we don't
need a tweak of replacing the pointer to if_index like mbuf.

In order to make sure the assumption, the following changes are required:
- Deactivate the interface at the firstish of if_detach. This prevents
  in6_unlink_ifa from saving multicast addresses (wrongly)
- Invalidate rtcache(s) and clear a rtentry referencing an address on
  RTM_DELETE. rtcache(s) may delay freeing an address
- Replace callout_stop with callout_halt of DAD timers to ensure stopping
  such timers in if_detach
2016-07-01 05:22:33 +00:00
ozaki-r
65634f4cd9 Tidy up goto lables
No functional change.
2016-06-30 06:56:27 +00:00
ozaki-r
3fe6488e55 Fix error paths
Some error paths did m_put_rcvif_psref twice.
2016-06-30 06:48:58 +00:00
ozaki-r
d4c71b34a8 Make sure that ifaddr is published after its initialization finished
Basically we should insert an item to a collection (say a list) after
item's initialization has been completed to avoid accessing an item
that is initialized halfway. ifaddr (in{,6}_ifaddr) isn't processed
like so and needs to be fixed.

In order to do so, we need to tweak {arp,nd6}_rtrequest that depend
on that an ifaddr is inserted during its initialization; they explore
interface's address list to determine that rt_getkey(rt) of a given
rtentry is in the list to know whether the route's interface should
be a loopback, which doesn't work after the change. To make it work,
first check RTF_LOCAL flag that is set in rt_ifa_addlocal that calls
{arp,nd6}_rtrequest eventually. Note that we still need the original
code for the case to remove and re-add a local interface route.
2016-06-30 01:34:53 +00:00
ozaki-r
ca4ea29d93 Add missing NULL checks for m_get_rcvif_psref 2016-06-28 02:02:56 +00:00
ozaki-r
e1b6735f05 Fix typo in a comment 2016-06-23 06:40:48 +00:00
ozaki-r
43c5ab376f Replace ifp of ip_moptions and ip6_moptions with if_index
The motivation is the same as the mbuf's rcvif case; avoid having a pointer
of an ifnet object in ip_moptions and ip6_moptions, which is not MP-safe.

ip_moptions and ip6_moptions can be stored in a PCB for inet or inet6
that's life time is different from ifnet one and so an ifnet object can be
disappeared anytime we get it via them. Thus we need to look up an ifnet
object by if_index every time for safe.
2016-06-21 03:28:27 +00:00
knakahara
bf1a57d3d3 fix: i386/ALL build failure 2016-06-20 08:08:13 +00:00
knakahara
95fc145695 apply if_output_lock() to L3 callers which call ifp->if_output() of L2(or L3 tunneling). 2016-06-20 06:46:37 +00:00
ozaki-r
e1135cd9b9 Use curlwp_bind and curlwp_bindx instead of open-coding LP_BOUND 2016-06-16 02:38:40 +00:00
knakahara
a6f4292e65 eliminate unnecessary splnet 2016-06-13 08:37:15 +00:00
knakahara
e4ff09f05d MP-ify fastforward to support GATEWAY kernel option.
I add "ipflow_lock" mutex in ip_flow.c and "ip6flow_lock" mutex in ip6_flow.c
to protect all data in each file. Of course, this is not MP-scalable. However,
it is sufficient as tentative workaround. We should make it scalable somehow
in the future.

ok by ozaki-r@n.o.
2016-06-13 08:34:23 +00:00
knakahara
14ea9af5f7 make ipflow_reap() static function. 2016-06-13 08:29:55 +00:00
knakahara
f2808ade1a remove unnecessary splnet before pool_{get,put} 2016-06-13 08:04:44 +00:00
ozaki-r
fe6d427551 Avoid storing a pointer of an interface in a mbuf
Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
2016-06-10 13:31:43 +00:00
ozaki-r
d938d837b3 Introduce m_set_rcvif and m_reset_rcvif
The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
2016-06-10 13:27:10 +00:00
christos
fdea3219c6 make hostzerobroadcast default to "no". 2016-05-27 16:44:15 +00:00
rjs
afd529313e Use const for arguments to sctp_is_same_scope(). 2016-05-22 23:04:27 +00:00
rjs
b65559a564 Remove rtcache reference to route before freeing the containing struct. 2016-05-22 22:18:41 +00:00
ozaki-r
1acd48af54 Get rid of unnecessary assignment 2016-05-17 09:00:24 +00:00
ozaki-r
040205ae93 Protect ifnet list with psz and psref
The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.
2016-05-12 02:24:16 +00:00
ozaki-r
8e8364ddca Fix compilation for ppc 2016-05-09 07:02:10 +00:00
christos
902487a7f3 fix compilation for ppc. 2016-05-04 15:42:32 +00:00
ozaki-r
2cf7873b92 Constify rtentry of if_output
We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
2016-04-28 00:16:56 +00:00
rjs
991b8746b6 Fix build when IPSEC enabled. 2016-04-26 11:02:57 +00:00
ozaki-r
9e0f6c5e36 Stop using rt_gwroute on packet sending paths
rt_gwroute of rtentry is a reference to a rtentry of the gateway
for a rtentry with RTF_GATEWAY. That was used by L2 (arp and ndp)
to look up L2 addresses. By separating L2 nexthop caches, we don't
need a route for the purpose and we can stop using rt_gwroute.
By doing so, we can reduce referencing and modifying rtentries,
which makes it easy to apply a lock (and/or psref) to the
routing table and rtentries.

One issue to do this is to keep RTF_REJECT behavior. It seems it
was broken when we moved rtalloc1 things from L2 output routines
(e.g., ether_output) to ip_hresolv_output, but (fortunately?)
it works unexpectedly. What we mistook are:
- RTF_REJECT was checked for any routes in L2 output routines,
  but in ip_hresolv_output it is checked only when the route
  is RTF_GATEWAY
- The RTF_REJECT check wasn't copied to IPv6 (nd6_output)

It seems that rt_gwroute checks hid the mistakes and it looked
work (unexpectedly) and removing rt_gwroute checks unveil the
issue. So we need to fix RTF_REJECT checks in ip_hresolv_output
and also add them to nd6_output.

One more point we have to care is returning an errno; we need
to mimic looutput behavior. Originally RTF_REJECT check was
done either in L2 output routines or in looutput. The latter is
applied when a reject route directs to a loopback interface.
However, now RTF_REJECT check is done before looutput so to keep
the original behavior we need to return an errno which looutput
chooses. Added rt_check_reject_route does such tweaks.
2016-04-26 09:30:01 +00:00
ozaki-r
a79dfa5db0 Sweep unnecessary route.h inclusions 2016-04-26 08:44:44 +00:00
rjs
505ea9765f Fix build when IPSEC enabled. 2016-04-25 21:21:02 +00:00
ozaki-r
0c74cec625 Check error of rt_setgate and rt_settag 2016-04-25 14:38:08 +00:00
ozaki-r
5fd142cec8 Fix error path 2016-04-19 09:36:35 +00:00
ozaki-r
54748dcad2 Separate MPLS-related routines from ip_hresolv_output
No functional changes.
2016-04-19 09:29:54 +00:00
ozaki-r
07d863c903 Constify rtentry of arpresolve
We don't need to (rather shouldn't) modify rtentry in there.
2016-04-19 04:13:56 +00:00
ozaki-r
805fe96546 Fix panic on receiving an ARP request
The panic happened if an ARP request has a spa (i.e., IP address) whose
ARP entry already exists in the table as a static ARP entry.
2016-04-18 02:24:42 +00:00
ozaki-r
4ace575dc7 Get rid of meaningless RTF_UP check from ip_hresolv_output
The check is meaningless because
- An obtained rtentry is ensured that it's always RTF_UP by rtcache,
  rtalloc1 and rtlookup. If the rtentry isn't changed (i.e., RTF_UP gets
  dropped) during processing, the check should be unnecessary
- Even if not, i.e., an obtained rtentry can be changed during processing,
  checking only at the point doesn't help; the rtentry can be changed after
  the check

Instead we have to ensure that RTF_UP isn't dropped if someone is using it
somehow. Note that we already ensure that a rtentry being used isn't freed
by rt_refcnt.

Proposed on tech-kern and tech-net.
2016-04-18 01:28:06 +00:00
rjs
b4a446b522 Remove stray debug printf(). 2016-04-14 18:36:56 +00:00
ozaki-r
4f0eb37aac ddb: rename show arptab to show routes
show arptab command of ddb is now inappropriate because it actually dumps
routes but arp entries aren't routes anymore. So rename it to show routes
and move the code from if_arp.c to route.c.

ok christos@
2016-04-13 00:47:01 +00:00
ozaki-r
322b6a238d Sweep unncessary radix.h inclusions 2016-04-11 08:56:16 +00:00
christos
b988d754df - tidy up error messages
- add a length argument to arpresolve()
- add KASSERT for overflow
2016-04-07 03:22:15 +00:00
ozaki-r
09973b35ac Separate nexthop caches from the routing table
By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
  - sysctl(NET_RT_DUMP) doesn't return them
  - If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
  - RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
  - It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
  - -[no]cloning remains because it seems there are users
  - -[no]connected is introduced and recommended
    to be used instead of -[no]cloning
- route show/netstat -r drops some flags
  - 'L' and 'c' are not seen anymore
  - 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
  a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
  http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html
2016-04-04 07:37:07 +00:00
mlelstv
78f913b0b2 Replace generic queue macros with IFNET/IFADDR macros. 2016-04-03 09:57:40 +00:00
ozaki-r
35b18fbb1d Remove unnecessary casts and do s/0/NULL/ for rtrequest 2016-04-01 09:16:02 +00:00
christos
6228dc517a PR/50899: David Binderman: optimize memset 2016-03-06 19:46:05 +00:00
knakahara
9b7918b3ee remove unnecessary declarations and fix KNF
Thanks to riastradh@
2016-02-29 01:29:15 +00:00
knakahara
e80f101289 To eliminate gif_softc_list linear search, add extra argument to encapsw.pr_ctlinput(). 2016-02-26 07:35:17 +00:00
ozaki-r
a143583fe0 Use callout_halt instead of callout_stop 2016-02-25 06:00:01 +00:00
rtr
0a0528fd0a Fix building of IPv4-Mapped IPv6 addresses.
As discussed on tech-net@ use in6_sin_2_v4mapsin6() to build mapped
addresses.
2016-02-15 19:00:42 +00:00
rtr
e2a3307b85 Reduce code duplication.
Split creation of IPv4-Mapped IPv6 addresses into its own function
and use it.

No functional change intended.  As posted to tech-net@
2016-02-15 14:59:03 +00:00
rtr
f5c6d9772a remove duplicated #include of <netinet/in.h> 2016-02-14 23:47:57 +00:00
ozaki-r
9c4cd06355 Introduce softint-based if_input
This change intends to run the whole network stack in softint context
(or normal LWP), not hardware interrupt context. Note that the work is
still incomplete by this change; to that end, we also have to softint-ify
if_link_state_change (and bpf) which can still run in hardware interrupt.

This change softint-ifies at ifp->if_input that is called from
each device driver (and ieee80211_input) to ensure Layer 2 runs
in softint (e.g., ether_input and bridge_input). To this end,
we provide a framework (called percpuq) that utlizes softint(9)
and percpu ifqueues. With this patch, rxintr of most drivers just
queues received packets and schedules a softint, and the softint
dequeues packets and does rest packet processing.

To minimize changes to each driver, percpuq is allocated in struct
ifnet for now and that is initialized by default (in if_attach).
We probably have to move percpuq to softc of each driver, but it's
future work. At this point, only wm(4) has percpuq in its softc
as a reference implementation.

Additional information including performance numbers can be found
in the thread at tech-kern@ and tech-net@:
http://mail-index.netbsd.org/tech-kern/2016/01/14/msg019997.html

Acknowledgment: riastradh@ greatly helped this work.
Thank you very much!
2016-02-09 08:32:07 +00:00
knakahara
51f4870974 eliminate variable argument in encapsw 2016-01-26 06:00:10 +00:00
knakahara
b546d5277b implement encapsw instead of protosw and uniform prototype.
suggested and advised by riastradh@n.o, thanks.

BTW, It seems in_stf_input() had bugs...
2016-01-26 05:58:05 +00:00
ozaki-r
07e20941bc Remove unnecessary LLE_REMREF
The code around it was copied from arptimer, but LLE_REMREF
is unnecessary because it is needed only for arptimer that
is called after LLE_ADDREF.

This is a possible fix for PR#50548, PR#50702 and PR#50704.
2016-01-25 10:15:38 +00:00
riastradh
fa50b451d4 Those were local changes not meant to be part of the revert. SORRY! 2016-01-23 14:48:55 +00:00
christos
e1c6072fc4 fix compilation 2016-01-23 02:58:13 +00:00
riastradh
e588d95c25 Back out previous change to introduce struct encapsw.
This change was intended, but Nakahara-san had already made a better
one locally!  So I'll let him commit that one, and I'll try not to
step on anyone's toes again.
2016-01-22 23:27:12 +00:00
riastradh
87bc652e3d Don't abuse struct protosw for ip_encap -- introduce struct encapsw.
Mostly mechanical change to replace it, culling some now-needless
boilerplate around all the users.

This does not substantively change the ip_encap API or eliminate
abuse of sketchy pointer casts -- that will come later, and will be
easier now that it is not tangled up with struct protosw.
2016-01-22 05:15:10 +00:00
riastradh
7c7b1739c8 Revert previous: ran cvs commit when I meant cvs diff. Sorry!
Hit up-arrow one too few times.
2016-01-21 15:41:29 +00:00
riastradh
b41d562bd0 Give proper prototype to ip_output. 2016-01-21 15:27:48 +00:00
riastradh
f8b0ac1cb4 Give proper prototype to ip_output. 2016-01-20 22:12:22 +00:00
riastradh
6439c4109a Give proper prototype to rip_output. 2016-01-20 22:02:54 +00:00
riastradh
2880d69957 Give proper prototype to udp_output. 2016-01-20 22:01:18 +00:00
riastradh
65a8f527af Eliminate struct protosw::pr_output.
You can't use this unless you know what it is a priori: the formal
prototype is variadic, and the different instances (e.g., ip_output,
route_output) have different real prototypes.

Convert the only user of it, raw_send in net/raw_cb.c, to take an
explicit callback argument.  Convert the only instances of it,
route_output and key_output, to such explicit callbacks for raw_send.
Use assertions to make sure the conversion to explicit callbacks is
warranted.

Discussed on tech-net with no objections:
https://mail-index.netbsd.org/tech-net/2016/01/16/msg005484.html
2016-01-20 21:43:59 +00:00
knakahara
2692d86ef7 remove unused variable. 2016-01-20 05:58:49 +00:00
knakahara
d7b9bb29c0 Refactor protosw codes in gif(4). No functional change.
- remove unnecessary include
    - reduce scopes
2016-01-18 06:08:26 +00:00
christos
bd37d539ab PR/50670: David Binderman: Tidy up debugging printfs to avoid if else confusion. 2016-01-17 15:08:10 +00:00
knakahara
1c5d304e9c eliminate ip_input.c and ip6_input.c dependency on gif(4) 2016-01-08 03:55:39 +00:00
ozaki-r
d52244fae3 Make revarprequest static 2016-01-05 05:37:06 +00:00
knakahara
6d50f36d54 use satosin{,6} macros instead of casts. 2015-12-25 06:47:56 +00:00
ozaki-r
66d9895f20 Fix memory leak of llentry#la_opaque
llentry#la_opaque which is for token ring is allocated in arp.c
and freed in arp.c when freeing llentry. However, llentry can be
freed from other places, e.g., lltable_free. In such cases,
la_opaque is never freed.

To fix that, add a new callback (lle_ll_free) to llentry and
register a destruction function of la_opque to it. On freeing a
llentry, we can surely free la_opque via the callback.
2015-12-17 02:38:33 +00:00
ozaki-r
213b8d3cc6 Fix token_rif extractions from llentry 2015-12-16 05:44:59 +00:00
christos
7273e27bf7 PR/50529: David Binderman: Remove double sizeof 2015-12-13 18:58:13 +00:00
christos
f2d1d0f2f7 PR/50528: David Binderman: remove sizeof(sizeof(x)) 2015-12-13 18:53:57 +00:00
knakahara
a00e94f4ff PR kern/50522: gif(4) ioctl causes panic while someone is using the gif(4) interface.
It is required to wait other CPU's softint completion before disestablishing
the softint handler.
2015-12-11 07:59:14 +00:00
ozaki-r
871888c540 Introduce arp_settimer
No functional change.
2015-12-11 01:15:00 +00:00
knakahara
0072297ac8 ip_encap uses kmem_alloc APIs instead of malloc. 2015-12-09 06:00:51 +00:00
ozaki-r
cefec21119 Get rid of a big block in in_arpinput
No functional change.
2015-11-30 06:45:38 +00:00
ozaki-r
f373fa78e6 Fix build dependency of if_llatbl.c
if_llatbl.c is required if inet or inet6 is enabled. Depending on ether
doesn't suit for NDP case.
2015-11-26 01:41:20 +00:00
ozaki-r
53e3e4714d Restore softnet_lock and KERNEL_LOCK for rtrequest and rtfree
We still need them for rt operations.
2015-11-19 03:03:04 +00:00
ozaki-r
17001ea619 Add missing rtfree 2015-11-16 05:39:39 +00:00
ozaki-r
e72fec577e Fix db_print_llinfo
rt_llinfo is now struct llentry.
2015-11-06 08:55:49 +00:00
ozaki-r
60defe31a6 Fix inappropriate rt_flags check
It depended on either RTF_CLONED or RTF_CLONING must be set, however,
the assumption didn't meet for userland problems that create a route
via RTM_ADD.

This fixes an issue that running rarpd causes the following kernel panic
reported by nonaka@:
  panic: kernel diagnostic assertion "(la->la_flags & LLE_STATIC) == 0"
  failed: file "/usr/src/sys/netinet/if_arp.c", line 1339
2015-11-06 08:38:43 +00:00
ozaki-r
847c251da6 Stop callout in arp_rtrequest(RTM_DELETE)
This change fixes arptimer panic after removing an interface
(say by drvctl -d), which is reported by Takahiro Hayashi.

This change also fixes llentry's reference counting; we have
to take into account rtentry#rt_llinfo as well as arptimer.
2015-10-20 07:46:59 +00:00
ozaki-r
e4a5751875 Stop using softnet_lock (fix possible deadlock)
Using softnet_lock for mutual exclusion between lltable_free and
arptimer was wrong and had an issue causing a deadlock between
them;  lltable_free waits arptimer completion by calling
callout_halt with softnet_lock that is held in arptimer, however
lltable_free also holds llentry's lock that is also held in
arptimer so arptimer never obtain the lock and both never go
forward eventually.  We have to pass llentry's lock to
callout_halt instead.
2015-10-20 07:35:15 +00:00
roy
a2d314543b In the event of an error within arpresolve(), delete the cloned route
otherwise it would never be deleted.
2015-10-14 11:22:55 +00:00
roy
b0f4622d81 Save and clear the la route while we have a write lock 2015-10-14 11:17:57 +00:00
rjs
8c2654abca Add core networking support for SCTP. 2015-10-13 21:28:34 +00:00
roy
222d6fab6a arpresolve() now returns 0 on success otherwise an error code.
Callers of arpresolve() now pass the error code back to their caller,
masking out EWOULDBLOCK.

This allows applications such as ping(8) to display a suitable error
condition.
2015-10-13 12:33:07 +00:00
roy
9ba2bef003 Move the NOARP check up a bit so that it works when an la is created
but hasn't been resolved yet.
Fixes PR kern/17611.
2015-10-13 11:13:37 +00:00
roy
c47c3c3042 Include arp.h to restore the sysctl net.inet.ip.dad_count.
Fixes PR kern/49883 thanks to HITOSHI Osada.
2015-10-13 09:46:42 +00:00
roy
b61ebcc9c7 Simplify la handling in arpresolve() by asking arplookup() not to create
a la. If a la is needed arpresolve() will then create it or mark the
current la as writable.
2015-10-13 09:33:35 +00:00
roy
620387577c Create a temporary define involving IFF_STATICARP if we have it
instead of just testing for __FreeBSD__.
No functional change.

ok: ozaki-r@
2015-10-08 08:17:37 +00:00
ozaki-r
98c468dd77 Create an llentry after fixing an interface to store
In case of RTF_LOCAL routes, we change an output interface
of a route from original one to lo0ifp. An llentry also
has to be stored to lo0ifp in such cases.

Problem reported by roy@
2015-10-07 00:33:27 +00:00
ozaki-r
a7ed97a295 Fix arplookup logic
It should first lookup and then create an entry if not found (and if
creation is requested).
2015-10-05 08:17:31 +00:00
skrll
e22ecac88e Make this compile again 2015-09-21 13:32:26 +00:00
roy
f3b0c038a1 If, for whatever reason, a local interface route is removed and then
re-added, mark it as a local route.

While here, if changing the route to go via the loopback interface
remove any inherited MTU value.
2015-09-11 10:33:32 +00:00
ozaki-r
4c03bf20c9 Remove wrong KASSERT in arptfree
la_rt can be NULL because arptimer that calls arptfree doesn't always
free llentry so llentry can remain with la_rt == NULL. So we instead
check whether la_rt is NULL or not and do arptfree if not.

This fixes PR kern/50184 (confirmed by martin@) and
PR kern/50186 (maybe).
2015-09-09 01:24:01 +00:00
ozaki-r
c2ed920b63 Revert v1.176 for further proper fix 2015-09-09 01:22:28 +00:00
ozaki-r
cac0e9c370 Refactor tcp_mtudisc
No functional change.
2015-09-07 01:56:50 +00:00
ozaki-r
6c5982b876 CID 1322880: remove unnecessary m != NULL checks 2015-09-07 01:18:27 +00:00
ozaki-r
d5ad433b2c CID 1322878: simplify log output flow 2015-09-07 01:17:37 +00:00
ozaki-r
54c4f3b688 Do rt_refcnt++ when set a rtentry to another rtentry's rt_gwroute
And also do rtfree when deref a rtentry from rt_gwroute.
2015-09-02 11:35:11 +00:00
christos
00158cce6d XXX: Disable KASSERT for now since locking is broken for interface removals. 2015-09-02 09:28:13 +00:00
ozaki-r
13b8e486ae Fix building kernels w/o ether 2015-08-31 16:46:14 +00:00
ozaki-r
ac75483513 Fix building kernels w/o DIAGNOSTIC 2015-08-31 09:21:55 +00:00
ozaki-r
7dc37e542b Remove obsolete global variables and sysctl MIBs 2015-08-31 08:06:30 +00:00
ozaki-r
8997ac8f09 Replace ARP cache (llinfo) with lltable/llentry
Highlights of the change are:
- Use llentry instead of llinfo to manage ARP caches
  - ARP specific data are stored in the hashed list
    of an interface instead of the global list (llinfo_arp)
- Fine-grain locking on llentry
- arptimer (callout) per ARP cache
  - the global timer callout with the big locks can be
    removed (though softnet_lock is still required for now)
- net.inet.arp.prune is now obsoleted
  - it was the interval of the global timer callout
- net.inet.arp.refresh is now obsoleted
  - it was a parameter that prevents expiration of active caches
  - Removed to simplify the timer logic, but we may be able to
    restore the feature if really needed

Proposed on tech-kern and tech-net.
2015-08-31 08:05:20 +00:00
ozaki-r
879526da38 Hook up lltable/llentry with the kernel (and rumpkernel)
It is built and initialized on bootup, but there is no user for now.

Most codes in in.c are imported from FreeBSD as well as lltable/llentry.
2015-08-31 08:02:44 +00:00
ozaki-r
3aedc74443 Make rt_refcnt take into account rt_timer 2015-08-31 06:25:15 +00:00
pooka
1c4a50f192 sprinkle _KERNEL_OPT 2015-08-24 22:21:26 +00:00
christos
e7ae23fd9e include "ioconf.h" to get the 'void <driver>attach(int count);' prototype. 2015-08-20 14:40:16 +00:00
ozaki-r
f818671bf4 Move insane goto label 2015-08-12 07:13:14 +00:00
ozaki-r
55140c1926 Use time_uptime instead of time_second to avoid time leaps
Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.
2015-08-07 08:11:33 +00:00
matt
49cb8763aa If we are sending a window probe and there's unacked data in the socket, make
sure at least the persist timer is running.
2015-07-24 04:33:50 +00:00
matt
6de0fc0ff8 Make sure that snd_win doesn't go negative. 2015-07-24 04:31:20 +00:00
ozaki-r
9eae87d0c8 Reform use of rt_refcnt
rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
  of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)
2015-07-17 02:21:08 +00:00
ozaki-r
fcda92b6be Remove unused arguments and the associated code from nd6_nud_hint()
from OpenBSD
2015-07-15 09:20:18 +00:00
ozaki-r
bd4fe18031 Make global variables static 2015-07-15 08:49:15 +00:00
ozaki-r
f2abd6a2e3 Move rt_gwroute operation out of stripoutput
We should do it in ip_hresolv_needed.
2015-07-14 08:44:59 +00:00
ozaki-r
f81368b844 Use ip_hresolv_output for if_token as well
I thought we cannot apply ip_hresolv_output to if_token because
rt0 looked being needed by arpresolve in token_output. However,
rt0 is actually not used by arpresolve in NetBSD (see obsolete
ARPRESOLVE macro).
2015-07-01 03:39:36 +00:00
roy
938235dc97 errno -> error, spotted by the hawk skrll 2015-06-08 08:19:20 +00:00
roy
17aa5fe87d It's possible we could not have any ready addresses. 2015-06-08 08:02:43 +00:00
roy
be5b0a3a89 Don't set errno. Thanks to skrll@ 2015-06-08 07:59:54 +00:00
ozaki-r
6ea8c2e666 Pull out route lookups from L2 output routines
Route lookups for routes of RTF_GATEWAY were done in L2 output
routines such as ether_output, but they should be done in L3
i.e., before L2 output routines. This change places the lookups
between L3 output routines (say ip_output) and the L2 output
routines.

The change is based on dyoung's patch submitted in the thread:
https://mail-index.netbsd.org/tech-net/2013/02/01/msg003847.html
You can find out detailed investigations by dyoung about the
issue in there.

Note that the change introduces a workaround for MPLS. ether_output
knew that it needs to fill the ethertype of a frame as MPLS,
based on a tag of an original route (rtentry), but now we don't
pass it to ehter_output. So we have to tell that in another way.
We use mtag to do so for now, which introduces some overhead.
We should fix it somehow in the future.

Discussed on tech-kern and tech-net.
2015-06-04 09:19:59 +00:00
rtr
e7083d7a4b remove transitional functions in{,6}_pcbconnect_m() that were used in
converting protocol user requests to accept sockaddr instead of mbufs.

remove tcp_input copy in to mbuf from sockaddr and just copy to sockaddr
to make it possible for the transitional functions to go away.

no version bump since these functions only existed for a short time and
were commented as adapters (they appeared in 7.99.15).
2015-05-24 15:43:45 +00:00
ozaki-r
423491c235 Replace NARC with NARCNET to follow renaming at 2007
Hmm, is anyone using this?
2015-05-22 07:44:46 +00:00
ozaki-r
b41c75c271 Use LIST_FOREACH{,_SAFE}
The first loop doesn't remove any items in it, so we can use
LIST_FOREACH instead of LIST_FOREACH_SAFE.
2015-05-21 09:29:51 +00:00
ozaki-r
442b227d9f Use NULL instead of 0 for pointers 2015-05-21 09:27:10 +00:00
ozaki-r
2c0e34375a Make arp_init, in_revarpinput and revarprequest static 2015-05-21 09:26:18 +00:00
kefren
f3bd20e96c Use RUN_ONCE to initialize iss secret. Suggested by riastradh@ 2015-05-19 17:33:43 +00:00
roy
f45d868787 Separate ARP handling DAD from inet.
This is done by signalling the intent to try tentative addresses
and then clearing the intent once the address is setup.
When the ARP handler is installed (arp_ifinit) then it adds
dad start and stop functions to the address which are used instead
of calling ARP directly.
2015-05-16 12:12:46 +00:00
kefren
56d130b58b Don't overexpose tcp_iss_secret and don't bother compute it unless
RFC1948 compliance is activated
2015-05-16 10:09:20 +00:00
kefren
a6fab82126 Don't put segment on the wire if security request can't be fulfilled 2015-05-16 01:15:34 +00:00
kefren
110f4b05db Don't try to do PCB lookup for bad checksummed segments
Fixes PR/43510 and PR/48452
2015-05-15 18:03:45 +00:00
christos
37fd390ec4 if no address was found, don't check if it is tentative (hi Roy) 2015-05-09 18:47:26 +00:00
christos
28383371f1 assign sin only when it is needed 2015-05-09 18:46:25 +00:00
roy
bdb2ef03d5 If we don't have ARP, don't set IN_IFF_TENTATIVE. 2015-05-05 08:52:51 +00:00
justin
f5df4fc799 Rename delay variable as it shadows a global on arm. 2015-05-03 10:44:04 +00:00
joerg
5cad40c933 Fix !ARP build. 2015-05-02 20:22:12 +00:00
rtr
fd12cf39ee make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}
2015-05-02 17:18:03 +00:00
roy
866e96fa79 Appease gcc. 2015-05-02 15:22:03 +00:00
roy
505639d2f3 Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@
2015-05-02 14:41:32 +00:00
christos
ffe2b84e28 Apply Revision 220794 from FreeBSD to avoid dup ACKs:
When checking to see if a window update should be sent to the remote peer,
don't force a window update if the window would not actually grow due to
window scaling.  Specifically, if the window scaling factor is larger than
2 * MSS, then after the local reader has drained 2 * MSS bytes from the
socket, a window update can end up advertising the same window.  If this
happens, the supposed window update actually ends up being a duplicate ACK.
This can result in an excessive number of duplicate ACKs when using a
higher maximum socket buffer size.

Pointed out by Ricky Charlet, in tech-net.
2015-04-27 16:50:17 +00:00
ozaki-r
5f21075b8f Add missing error checks on rtcache_setdst
It can fail with ENOMEM.
2015-04-27 10:14:44 +00:00
ozaki-r
2373b55abc Introduce in6_selecthlim_rt to consolidate an idiom for rt->rt_ifp
It consolidates a scattered routine:
(rt = rtcache_validate(&in6p->in6p_route)) != NULL ? rt->rt_ifp : NULL
2015-04-27 02:59:44 +00:00
rtr
d2aa9dd71f remove pr_generic from struct pr_usrreqs and all implementations of
pr_generic in protocols.

bump to 7.99.13

approved by rmind@
2015-04-26 21:40:48 +00:00
rtr
89539c0d5f return EINVAL if sin{,6}_len != sizeof(sockaddr_in{,6}) respectively in
in{,6}_pcbconnect().

checking just m->m_len isn't enough because there are various places that
assume sa_len has been properly populated.
2015-04-26 16:45:50 +00:00
rtr
69b4af1034 make rip_connect_pcb take sockaddr_in * instead of mbuf *
make rip_connect_pcb static since it appears to be used only in raw_ip.c

moves m_len check to callers which is a small duplication of code
that will go away when the callers are converted to receive sockaddr *.
2015-04-25 15:19:54 +00:00
rtr
eddf3af3c6 make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19
2015-04-24 22:32:37 +00:00
ozaki-r
06f4ab5ebf Use KASSERT instead of if & panic
rt can be NULL only when programming error (and we sure it cannot for now),
so we can use KASSERT here (i.e., check only if DIAGNOSTIC).
2015-04-24 03:20:41 +00:00
ozaki-r
840cc553d7 Replace 0 with NULL for pointer variables 2015-04-24 02:56:51 +00:00
ozaki-r
2af3302ac0 KNF 2015-04-24 00:48:47 +00:00
ozaki-r
d18817a22a Remove non-USE_RADIX case and USE_RADIX switch
It seems that we have been using ip_encap only with USE_RADIX
for long years. Let's remove unused non-USE_RADIX case.

No objection on tech-kern and tech-net.

Double-checked by knakahara@
2015-04-20 07:34:48 +00:00
ozaki-r
cefb9995f4 Remove garbage undef 2015-04-16 06:50:16 +00:00
riastradh
691129c8c5 KASSERT x then y, not x && y, to give more specific errors. 2015-04-15 13:02:16 +00:00
ozaki-r
e952648134 Use LIST_FOREACH_SAFE
We have to use LIST_FOREACH_SAFE because LIST_REMOVE is used
inside the loop through encap_remove.
2015-04-15 08:47:28 +00:00
ozaki-r
73c17c4a13 Replace DIAGNOSTIC & panic with KASSERT/KASSERTMSG 2015-04-15 03:38:50 +00:00
ozaki-r
dcfb08075f Add $NetBSD$ at the top of the file 2015-04-15 03:32:23 +00:00
riastradh
556fc62b15 cprng_strong(kern_cprng, ...) never blocks, pass 0 for flags.
FASYNC was wrong anyway!  It's FNONBLOCK.
2015-04-13 15:51:00 +00:00
rtr
80ea8ccc7c * update dccp_bind for struct mbuf * to struct sockaddr * parameter change
* pass NULL instead of casting 0 to a pointer when calling in_pcbbind()
2015-04-04 04:33:38 +00:00
rtr
a2ba5e69ab * change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
  instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
  sys_bind; sockaddr_big is of sufficient size and alignment to
  accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
  http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@
2015-04-03 20:01:07 +00:00
ozaki-r
9817ed1a76 Don't grab KERNEL_LOCK during if_output when NET_MPSAFE
The change makes L3 MP-safe work easy. At this point
we deal with only IP forwarding.

No functional change when NET_MPSAFE isn't enabled.
2015-04-03 07:55:18 +00:00
ozaki-r
71b1eb47ca Remove unnecessary opt_ipsec.h inclusions 2015-03-31 08:47:01 +00:00
ozaki-r
7f0bd664ae Add missing ifdef IPSEC 2015-03-31 08:44:43 +00:00
ozaki-r
50468f9be7 Tidy up the regular path of ip_forward
No functional change is intended.
2015-03-26 04:05:58 +00:00
roy
a37502b2b6 Add RTF_BROADCAST to mark routes used for the broadcast address when
they are created on the fly. This makes it clear what the route is for
and allows an optimisation in ip_output() by avoiding a call to
in_broadcast() because most of the time we do talk to a host.
It also avoids a needless allocation for the storage of llinfo_arp and
thus vanishes from arp(8) - it showed as incomplete anyway so this
is a nice side effect.

Guard against this and routes marked with RTF_BLACKHOLE in
ip_fastforward().
While here, guard against routes marked with RTF_BLACKHOLE in
ip6_fastforward().
RTF_BROADCAST is IPv4 only, so don't bother checking that here.
2015-03-23 18:33:17 +00:00
rtr
8699b912c3 Move code that is conditional on options INET6 into #ifdef INET6.
* Re-organize some variable declarations to limit #ifdef's.
* Move INET and INET6 code into respective switch cases to simplify
  #ifdef INET6.

No intended functional change.
2015-03-14 02:08:16 +00:00
roy
5170946304 Don't add local routes for the any address or p2p addresses where the address matches the destination. 2015-02-26 12:58:36 +00:00
roy
42900924fd Introduce the routing flag RTF_LOCAL to track local address routes.
Add functions rt_ifa_addlocal() and rt_ifa_remlocal() to add and remove
local routes for the address and announce the new address and route
to the routing socket.

Add in_ifaddlocal() and in_ifremlocal() to use these functions.
Rename in6_if{add,rem}loop() to in6_if{add,rem}local() and use these
functions.

rtinit() no longer announces the address, just the network route for the
address. As such, calls to rt_newaddrmsg() have been removed from
in_addprefix() and in_scrubprefix().

This solves the problem of potentially more than one announcement, or no
announcement at all for the address in certain situations.
2015-02-26 09:54:46 +00:00
christos
31fb02278a PR/49676: Ryo Shimizu: ICMP_STATINC() buffer overflows
XXX: pullup-7
2015-02-18 17:00:15 +00:00
he
de7f57fda9 Change the new counter variables in struct tcpcb to uint32_t, as
per christos' comments.
2015-02-14 22:09:53 +00:00
he
1d14d02249 Port over the TCP_INFO socket option from FreeBSD, originally from
the Linux 2.6 TCP API.  This permits the caller to query certain information
about a TCP connection, and is used by pkgsrc's net/iperf3 test program
if available.

This extends struct tcbcb with three fields to count retransmits,
out-of-sequence receives and zero window announcements, and will
therefore warrant a kernel revision bump (done separately).
2015-02-14 12:57:52 +00:00
rjs
652788239c Add DCCP protocol support from KAME. 2015-02-10 19:11:52 +00:00
christos
f89df58b37 use the new printing code. 2014-12-02 20:25:47 +00:00
christos
dfbbb8d8b5 add routines to print in_addr and sockaddr_in (in_print and sin_print) 2014-12-02 19:35:27 +00:00
christos
da48f144c9 Don't pass junk in sin_family and sin_len for SIOCGIFNETMASK, and explain why.
XXX: pullup 7?
2014-12-01 17:07:43 +00:00
christos
40d7a68275 Only check that the offset < sizeof(struct ip) if nxt != 0, i.e. in the
tcp and udp cases. From kre.
XXX: pullup 7
2014-11-30 18:15:41 +00:00
ozaki-r
fb797ebb35 Call looutput with holding KERNEL_LOCK
This fixes diagnostic assertion "KERNEL_LOCKED_P()" in if_loop.c.

PR kern/49410
2014-11-26 10:18:37 +00:00
seanb
ae36e3e5b1 Really make SO_REUSEPORT and SO_REUSEADDR equivalent for multicast
sockets.  From FreeBSD.
2014-11-25 19:09:13 +00:00
seanb
56c6664a5c Clean up any dangling ifp references in (struct in6pcb *)->in6p_v4moptions
(v4 multicast options off v4 mapped v6 socket) on interface destruction.  The
code to clean this up in a true v4 socket was moved to its own function
which is now also called in the corresponding place for v6 sockets on
interface destruction.
2014-11-25 15:04:37 +00:00
christos
cb8dda3c0e Add sysctl to selectively log arp packets from unknown network. (Adrien URBAN). 2014-11-13 16:11:18 +00:00
maxv
fcc99ce60e Do not uselessly include <sys/malloc.h>. 2014-11-10 18:46:33 +00:00
christos
828d274251 Avoid stack overflow when SACK and TCP_SIGNATURE are both present. Thanks
to Jonathan Looney for pointing this out.
2014-10-25 15:07:13 +00:00
hikaru
62fa1e32f7 Fix wrong condition checking TSO capability.
ipsec_used is not necessary condition.
IPsec outbound policy will not be checked when ipsec_used is false.
2014-10-21 13:44:47 +00:00
snj
f0a7346d21 src is too big these days to tolerate superfluous apostrophes. It's
"its", people!
2014-10-18 08:33:23 +00:00
christos
01fcb35dc2 document that we depend on the option numbers matching. 2014-10-12 19:02:18 +00:00
christos
4f85a755f8 Refactor the multicast membership code so that we can handle v4 mapped
addresses using the v6 membership ioctls.
2014-10-12 19:00:21 +00:00
christos
07d9441357 exposet multicast option functions which are used by the v6 code now. 2014-10-11 21:12:51 +00:00
rmind
436f757159 Eliminate IFAREF() and IFAFREE() macros in favour of functions. 2014-09-09 20:16:12 +00:00
joerg
8403248f23 Always use cprng_fast32, even during initialisation. No point in using
random(9).
2014-09-08 17:40:02 +00:00
rmind
2082db2d3c in_pcbdetach: move ip_freemoptions() under softnet_lock for now (this will
be changed back once other IP paths become MP-safe).  Same for IPv6 routine.

This partially reverts 1.150 of in_pcb.c and 1.127 of in6_pcb.c changes.
2014-09-07 00:50:56 +00:00
matt
a63dc570e9 Don't use C++ keyword (template) as variable. 2014-09-05 06:04:43 +00:00
matt
6c3d985231 Don't use C++ keywords (class, template) as variables 2014-09-05 06:03:51 +00:00
matt
8f413cecf4 Deanonymize structure for llinfo_arp. 2014-09-05 06:02:11 +00:00
rtr
8cf67cc6d5 split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

  - always KASSERT(solocked(so)) even if not implemented
    (for PRU_CONNECT2 only)

  - replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
    pr_connect2()

  - replace calls to pr_generic() with req = PRU_PURGEIF with calls to
    pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@
2014-08-09 05:33:00 +00:00
rtr
822872eada split PRU_RCVD function out of pr_generic() usrreq switches and put into
separate functions

  - always KASSERT(solocked(so)) even if not implemented

  - replace calls to pr_generic() with req = PRU_RCVD with calls to
    pr_rcvd()
2014-08-08 03:05:44 +00:00
rtr
651e5bd3f8 split PRU_SEND function out of pr_generic() usrreq switches and put into
separate functions

   xxx_send(struct socket *, struct mbuf *, struct mbuf *,
       struct mbuf *, struct lwp *)

  - always KASSERT(solocked(so)) even if not implemented

  - replace calls to pr_generic() with req = PRU_SEND with calls to
    pr_send()

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_send() PRUs

  - l2cap_send() -> l2cap_send_pcb()
  - sco_send() -> sco_send_pcb()
  - rfcomm_send() -> rfcomm_send_pcb()

patch reviewed by rmind
2014-08-05 07:55:31 +00:00
rtr
8e80ae3c97 get_tcppcb() is nearly always called upon entry to usrreqs so
KASSERT(solocked(so)) inside it and remove the redundant KASSERT
everywhere we are using tcp_getpcb()
2014-08-05 07:10:41 +00:00
rtr
ce6a5ff64f revert the removal of struct lwp * parameter from bind, listen and connect
user requests.

this should resolve the issue relating to nfs client hangs presented
recently by wiz on current-users@
2014-08-05 05:24:26 +00:00