Commit Graph

1986 Commits

Author SHA1 Message Date
yamt a2939d499b make ipfr_lock IPL_VM as ip_reass_drain is called in interrupts via
the drain hook for mbuf pools.
2010-10-07 03:15:49 +00:00
enami daf969e420 Don't free memory still in use. Fixes nfs root problem reported
by Christoph Egger on source-changes-d.
2010-10-06 07:39:37 +00:00
rmind ff74682fb4 Re-structure IPv4 reassembly code to make it more MP-friendly and simplify
some code fragments while here.  Also, use pool_cache(9) and mutex(9).

IPv4 reassembly mechanism is MP-safe now.
2010-10-03 19:44:47 +00:00
bad 6b557ece78 Defopt the rest of the Ipfilter options and tunables.
Per discussion with darrenr@ a year ago.
2010-10-02 20:07:39 +00:00
rmind 574e8cee41 Use own IPv4 reassembly queue entry structure and leave struct ipqent only
for TCP.  Now both struct ipfr_qent, struct ipfr_queue and hashed fragment
queue are abstracted and no longer public.
2010-08-25 00:05:14 +00:00
pooka abddd18860 ahem, min -> max in previous 2010-08-11 11:06:42 +00:00
pooka 57ec5229b9 Use kpause() instead of DELAY() and sleep a minimum of 1 tick.
This is possible now since softints have a thread context.  It's
also not a very frequent code path.  Addresses ABI issue with delay
(kern/40505).

I'm not entire sure what this delay is meant to accomplish, though.
2010-08-11 09:36:44 +00:00
pooka 4fa346679c Include opt_inet since this checks INET/INET6 2010-08-10 21:46:12 +00:00
rmind 7b5ee09e0b Revert previous change of making struct ipqent invisible to userland. 2010-07-19 19:16:45 +00:00
rmind 2f196e2fd9 Abstract IP reassembly into single generic routine - ip_reass_packet().
Make struct ipq private and struct ipqent not visible to userland.
Push ip_len adjustment into reassembly layer.

OK matt@
2010-07-19 14:09:44 +00:00
rmind bcc65ff09f Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete).  No functional changes to the actual mechanism.

OK matt@
2010-07-13 22:16:10 +00:00
rmind 419f3b11a1 ip_input: move lookup for fragment queue a little bit further. OK matt@. 2010-07-09 18:42:46 +00:00
kefren 8f87b4e7b8 manually adjust m_data and m_len so it can later be prepended with a
struct ip in case that a cluster is used. icmp len panic is not valid for
cluster case.

Fixes PR/43548
2010-07-02 07:02:00 +00:00
kefren 826653c190 Add MPLS support, proposed on tech-net@ a couple of days ago
Welcome to 5.99.33
2010-06-26 14:24:27 +00:00
bouyer adad9c5471 Make sure SYN_CACHE_TIMER_ARM() has been run before calling syn_cache_put()
as it will reschedule the timer.  Fixes PR kern/43318.
2010-05-26 17:38:29 +00:00
oki cd671d9067 Backout rev.1.137. It causes troubles, see PR kern/43294.
We needs more discussion/a more general solution.
2010-05-15 05:02:46 +00:00
bouyer c638cbeac1 syn_cache_put(): defer all pool_put() to the callout. Reschedule
the callout if needed so frees are not delayed too much.
syn_cache_timer(): we can't call syn_cache_put() here any more,
so move code deleted from syn_cache_put() here.

Avoid KASSERT() in kern_timeout.c because pool_put() is called from
ipintr context, as reported in
http://mail-index.netbsd.org/tech-kern/2010/03/19/msg007762.html
Thanks to Andrew Doran and Mindaugas Rasiukevicius for help and review.
2010-04-21 20:40:16 +00:00
darrenr a69ca40523 fix spelling mistake: netient -> netinet 2010-04-17 22:00:33 +00:00
darrenr 539655a401 add IPFILTER_COMPAT to kernel config options recognised for IPFilter 2010-04-17 21:44:05 +00:00
rmind b278cb5138 tcp_input: set ECE flag even if CWR flag is active.
Submitted by Richard Scheffenegger in PR/43150.
2010-04-16 03:13:03 +00:00
joerg 58e867556f Push the bpf_ops usage back into bpf.h. Push the common ifp->if_bpf
check into the inline functions as well the fourth argument for
bpf_attach.
2010-04-05 07:19:28 +00:00
tls 4e0229021b Oops. Fix LOCKDEBUG panic -- and spurious calls to tcp_output()! -- in
previous.  Be careful with that {}, Eugene.
2010-04-01 14:31:51 +00:00
tls 04c7bc4215 As suggested by at least 3 different people (the guilty parties know who
they are) avoid repeated kernel_lock/unlock by using an intrq on the stack.

About 5%-10% better from run to run, on my *very* simpleminded test.  Can't
possibly be worse.
2010-04-01 01:23:32 +00:00
tls 994b02bdbe After discussion with ad@: it appears that KERNEL_LOCK also protects
the driver output path (that is, ifp->if_output()).  In the case of
entry through the socket code, we are fine, because pru_usrreq takes
KERNEL_LOCK.  However, there are a few other ways to cause output
which require protection:

	1) direct calls to tcp_output() in tcp_input()
	2) fast-forwarding code (ip_flow) -- protected elsewise
	   against itself by the softnet lock.
	3) *Possibly* the ARP code.  I have currently persuaded
	   myself that it is safe because of how it's called.
	4) Possibly the ICMP code.

This change addresses #1 and #2.
2010-04-01 00:24:41 +00:00
tls 4e65861033 Don't hold kernel lock across call to ip_input() -- it blocked *all*
hardware interrupts for the length of time it took for all dequeued
packets to flow up the stack (on multiprocessors only).  Initial testing
shows performance impact is minimal -- since this temporary fix actually
means taking/releasing the kernel lock per-packet, that seems
acceptable.

Holding the kernel lock across the ip_input() call duplicated the
exclusion intended to be provided by the socket locks/softnet lock
(same lock, for INET/INET6 sockets) and could mask serious bugs.  Several
hours' testing didn't turn any up but I'd be surprised if some don't now
appear.

Damon Permezel noticed the problem.  Temporary fix suggested by matt@.
2010-03-31 07:31:15 +00:00
oki 4c7318c4d2 Fixed a number of race conditions in the case of receiving ipv4 packet.
found by iij seil team.
2010-03-12 13:33:19 +00:00
pooka 54b3dc4108 tcp sockbuf autoscaling was initially added turned off because it
was experimental.  People (including myself) have been running with
it turned on for eons now, so flip the default to enabled.
2010-01-26 18:09:07 +00:00
pooka 6e52e33956 ipfilter depends on bpf_filter, not bpfilter (since the year 2000). 2010-01-24 14:25:57 +00:00
pooka b014350f7f Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client.  This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached.  However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff.  ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
2010-01-19 22:08:16 +00:00
elad 1d8d325447 Get the uid from the socket's credentials. 2009-12-30 06:59:32 +00:00
dyoung 802b1236af Remove superfluous cast of a pointer to void *.
Compare a pointer with NULL, not 0.

No functional change intended.
2009-12-09 00:45:25 +00:00
christos adf7e47145 PR/42243: Yasuoka Masahiko: Add "net.inet.icmp.bmcastecho" sysctl support,
to disable icmp replies to the broadcast address.
2009-12-07 18:47:24 +00:00
dyoung 04489f616a Initialize/compare pointers with NULL instead of 0. 2009-12-07 18:38:55 +00:00
christos dd8534acfe ar_tha() can return NULL; treat this as an error. 2009-11-20 02:14:56 +00:00
christos 6cd198d078 Handle RFC 5227 ARP probes properly, don't drop 0.0.0.0 source packets
silently. (Patrik Lahti <plahti at qnx dot com>)
2009-11-03 00:57:42 +00:00
christos dbfa0db489 add enough info to let rtadvd compile with route-info. 2009-10-31 22:32:17 +00:00
rmind 993cb03302 Drop 3rd and 4th clauses from David Young's license.
Reviewed and approved by dyoung@ (copyright holder).
2009-10-19 23:19:37 +00:00
pooka 11281f01a0 Replace a large number of link set based sysctl node creations with
calls from subsystem constructors.  Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL
2009-09-16 15:23:04 +00:00
degroote 2d48ac808c Import pfsync support from OpenBSD 4.2
Pfsync interface exposes change in the pf(4) over a pseudo-interface, and can
be used to synchronise different pf.

This work was part of my 2009 GSoC

No objection on tech-net@
2009-09-14 10:36:48 +00:00
pooka fbd53556dc Wipe out the last vestiges of POOL_INIT with one swift stroke. In
most cases, use a proper constructor.  For proplib, give a local
equivalent of POOL_INIT for the kernel object implementation.  This
way the code structure can be preserved, and a local link set is
not hazardous anyway (unless proplib is split to several modules,
but that'll be the day).

tested by booting a kernel in qemu and compile-testing i386/ALL
2009-09-13 18:45:10 +00:00
dyoung c5d5f7697a Make ifconfig(8) set and display preference numbers for IPv6
addresses.  Make the kernel support SIOC[SG]IFADDRPREF for IPv6
interface addresses.

In in6ifa_ifpforlinklocal(), consult preference numbers before
making an otherwise arbitrary choice of in6_ifaddr.  Otherwise,
preference numbers are *not* consulted by the kernel, but that will
be rather easy for somebody with a little bit of free time to fix.

Please note that setting the preference number for a link-local
IPv6 address does not work right, yet, but that ought to be fixed
soon.

In support of the changes above,

1 Add a method to struct domain for "externalizing" a sockaddr, and
  provide an implementation for IPv6.  Expect more work in this area: it
  may be more proper to say that the IPv6 implementation "internalizes"
  a sockaddr.  Add sockaddr_externalize().

2 Add a subroutine, sofamily(), that returns a struct socket's address
  family or AF_UNSPEC.

3 Make a lot of IPv4-specific code generic, and move it from
  sys/netinet/ to sys/net/ for re-use by IPv6 parts of the kernel and
  ifconfig(8).
2009-09-11 22:06:29 +00:00
darran ddd44491c6 Make tcp msl (max segment life) tunable via sysctl net.inet.tcp.msl.
Okayed by tls@.
2009-09-09 22:41:28 +00:00
tls fd671f648a Add a direction argument to socket upcalls, so they can tell why they've
been called when, for example, they're waiting for space to write.  From
Ritesh Agrawal at Coyote Point.
2009-09-02 14:56:57 +00:00
dyoung ce7dbb45a0 Stop the admin from creating nodes under net.inet.ip.interfaces or
net.inet.ip.interfaces.<ifname>.
2009-08-30 02:03:58 +00:00
dyoung 6c7a849f95 Don't require the gateway address to have room for both an interface
name and address.  Room for an address will do.  This should fix
a regression in 'arp -s ...' on interfaces such as xennet0 with
unusually long names.

I will request a pull-up to netbsd-5.
2009-08-12 22:16:15 +00:00
minskim 39e3066b15 Enable IP_MINTTL option for SOCK_DGRAM sockets. 2009-07-19 23:17:33 +00:00
minskim 2708c3c1b9 Check the minimum ttl only when pcb is available. 2009-07-18 23:09:53 +00:00
minskim d0a9c36e4a Add the IP_MINTTL socket option.
The IP_MINTTL option may be used on SOCK_STREAM sockets to discard
packets with a TTL lower than the option value.  This can be used to
implement the Generalized TTL Security Mechanism (GTSM) according to
RFC 3682.

OK'ed by christos@.
2009-07-17 22:02:54 +00:00
minskim 5731aa1460 Delete trailing whitespace. 2009-07-17 18:09:25 +00:00
minskim ca28940e0e Add the IP_RECVTTL option support.
If the IP_RECVTTL option is enabled on a SOCK_DGRAM socket, the
recvmsg(2) call will return the TTL of the received datagram.  The
msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the TTL value.

Modeled after FreeBSD implementation.
2009-07-16 04:09:51 +00:00