Commit Graph

329 Commits

Author SHA1 Message Date
ozaki-r fe6d427551 Avoid storing a pointer of an interface in a mbuf
Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
2016-06-10 13:31:43 +00:00
riastradh 7c7b1739c8 Revert previous: ran cvs commit when I meant cvs diff. Sorry!
Hit up-arrow one too few times.
2016-01-21 15:41:29 +00:00
riastradh b41d562bd0 Give proper prototype to ip_output. 2016-01-21 15:27:48 +00:00
knakahara 1c5d304e9c eliminate ip_input.c and ip6_input.c dependency on gif(4) 2016-01-08 03:55:39 +00:00
roy c47c3c3042 Include arp.h to restore the sysctl net.inet.ip.dad_count.
Fixes PR kern/49883 thanks to HITOSHI Osada.
2015-10-13 09:46:42 +00:00
pooka 1c4a50f192 sprinkle _KERNEL_OPT 2015-08-24 22:21:26 +00:00
ozaki-r 55140c1926 Use time_uptime instead of time_second to avoid time leaps
Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.
2015-08-07 08:11:33 +00:00
joerg 5cad40c933 Fix !ARP build. 2015-05-02 20:22:12 +00:00
roy 505639d2f3 Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@
2015-05-02 14:41:32 +00:00
ozaki-r 50468f9be7 Tidy up the regular path of ip_forward
No functional change is intended.
2015-03-26 04:05:58 +00:00
ozaki-r e05f40117a Add 3rd argument to pktq_create to pass sc
It will be used to pass bridge sc for bridge_forward softint.

ok rmind@
2014-06-16 00:33:39 +00:00
rmind 60d350cf6d - Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
2014-06-05 23:48:16 +00:00
christos 5d61e6c015 Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.
2014-05-30 01:39:03 +00:00
rmind d49a7f6aed Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros.  In the long term, the lookup path ought
to be optimised.
2014-05-29 23:02:48 +00:00
christos d235317f37 CID 12164{49,51}: Remove bogus ifp == NULL checks; if ifp was really NULL,
we would have been dead a few lines before the tests.
2014-05-28 19:19:33 +00:00
rmind d998acdb58 ip_input(), ip_savecontrol(): cache m->m_pkthdr.rcvif in a variable. 2014-05-23 23:38:48 +00:00
rmind 28e5c2a160 Make ip_forward() static, there is no need to expose it. 2014-05-23 19:35:24 +00:00
rmind 79cab7f409 Make ip_input() static, there is no need to expose it. 2014-05-23 19:27:48 +00:00
rmind f499e20dfc - Add in_init() and move some functions, variables and sysctls into in.c
where they belong to.  Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.
2014-05-22 22:01:12 +00:00
liamjfoy e45b308c2a Remove ipflow_prune and replace with ipflow_reap. ok rmind@ 2014-03-19 08:27:21 +00:00
pooka 4f6fb3bf35 Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
2014-02-25 18:30:08 +00:00
rmind f04a92b1d6 - Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system).  Make the structures
  opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.
2013-06-29 21:06:57 +00:00
christos e8ffb7feee flip src/dst 2013-06-27 20:17:36 +00:00
christos 3829533e7f implement IP_PKTINFO and IP_RECVPKTINFO. 2013-06-27 19:38:16 +00:00
rmind 7cb08cfdd0 Split IPsec code in ip_input() and ip_forward() into the separate routines
ipsec4_input() and ipsec4_forward().  Tested by christos@.
2013-06-08 13:50:22 +00:00
christos 27fe772ddc IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.
2013-06-05 19:01:26 +00:00
christos cbf1f72b20 Add a new sysctl to mark ports as reserved, so that they are not used in
the anonymous or reserved port allocation.
2012-11-29 02:07:20 +00:00
christos 84f52095ad rename rfc6056 -> portalgo, requested by yamt 2012-06-25 15:28:38 +00:00
christos 40114b997c PR/46602: Move the rfc6056 port randomization to the IP layer. 2012-06-22 14:54:34 +00:00
dsl e21a34c25e Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
  the item itself.
In the places where the caller specifies a function and a structure
  address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
  sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
  AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
  fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.
2012-06-02 21:36:41 +00:00
drochner 364a06bb29 remove KAME IPSEC, replaced by FAST_IPSEC 2012-03-22 20:34:37 +00:00
liamjfoy 24612de5fe check against NULL 2012-01-09 14:31:21 +00:00
drochner 23e5beaef1 rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.
2011-12-19 11:59:56 +00:00
plunky 7f3d4048d7 NULL does not need a cast 2011-08-31 18:31:02 +00:00
dyoung ac162b774b *_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag.  Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.
2011-05-03 17:44:30 +00:00
dyoung b34b1e2f1f In ipintr(), don't overwrite ipintrq.ifq_maxlen with IFQ_MAXLEN.
Initialize ipintrq.ifq_maxlen using IFQ_MAXLEN directly instead of using
the global ipqmaxlen.  Get rid of the global ipqmaxlen.

Now it works again to override the maximum IP queue length with, for
example, sysctl -w net.inet.ip.ifq.maxlen=5.
2011-04-14 20:32:04 +00:00
matt 2c1217a227 Back out rev that shouldn't have been committed. 2010-12-13 14:18:50 +00:00
matt ebb2d31714 Add routines to calculate a checkesum if the driver concludes that the
h/w can't do it.
2010-12-11 22:37:46 +00:00
rmind c40af51a1a ip_randomid: make mechanism MP-safe and more modular.
OK matt@
2010-11-05 01:35:57 +00:00
rmind aa7dc4aa25 ip_reass_packet: finish abstraction; some clean-up.
Discussed some time ago with matt@.
2010-11-05 00:21:51 +00:00
rmind 2f196e2fd9 Abstract IP reassembly into single generic routine - ip_reass_packet().
Make struct ipq private and struct ipqent not visible to userland.
Push ip_len adjustment into reassembly layer.

OK matt@
2010-07-19 14:09:44 +00:00
rmind bcc65ff09f Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete).  No functional changes to the actual mechanism.

OK matt@
2010-07-13 22:16:10 +00:00
rmind 419f3b11a1 ip_input: move lookup for fragment queue a little bit further. OK matt@. 2010-07-09 18:42:46 +00:00
tls 04c7bc4215 As suggested by at least 3 different people (the guilty parties know who
they are) avoid repeated kernel_lock/unlock by using an intrq on the stack.

About 5%-10% better from run to run, on my *very* simpleminded test.  Can't
possibly be worse.
2010-04-01 01:23:32 +00:00
tls 4e65861033 Don't hold kernel lock across call to ip_input() -- it blocked *all*
hardware interrupts for the length of time it took for all dequeued
packets to flow up the stack (on multiprocessors only).  Initial testing
shows performance impact is minimal -- since this temporary fix actually
means taking/releasing the kernel lock per-packet, that seems
acceptable.

Holding the kernel lock across the ip_input() call duplicated the
exclusion intended to be provided by the socket locks/softnet lock
(same lock, for INET/INET6 sockets) and could mask serious bugs.  Several
hours' testing didn't turn any up but I'd be surprised if some don't now
appear.

Damon Permezel noticed the problem.  Temporary fix suggested by matt@.
2010-03-31 07:31:15 +00:00
pooka 11281f01a0 Replace a large number of link set based sysctl node creations with
calls from subsystem constructors.  Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL
2009-09-16 15:23:04 +00:00
minskim 5731aa1460 Delete trailing whitespace. 2009-07-17 18:09:25 +00:00
minskim ca28940e0e Add the IP_RECVTTL option support.
If the IP_RECVTTL option is enabled on a SOCK_DGRAM socket, the
recvmsg(2) call will return the TTL of the received datagram.  The
msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the TTL value.

Modeled after FreeBSD implementation.
2009-07-16 04:09:51 +00:00
tsutsui d779b85d3e Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch
2009-04-18 14:58:02 +00:00
elad 2d1c968399 Remove a few KAUTH_GENERIC_ISSUSER in favor of more descriptive
alternatives.

Discussed on tech-kern:

	http://mail-index.netbsd.org/tech-kern/2009/04/11/msg004798.html

Input from ad@, christos@, dyoung@, tsutsui@.

Okay ad@.
2009-04-15 20:44:24 +00:00