Commit Graph

228 Commits

Author SHA1 Message Date
riastradh d1579b2d70 Rename min/max -> uimin/uimax for better honesty.
These functions are defined on unsigned int.  The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER!  Some subsystems have

	#define min(a, b)	((a) < (b) ? (a) : (b))
	#define max(a, b)	((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX.  Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate.  But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all.  (Who knows, maybe in some cases integer
truncation is actually intended!)
2018-09-03 16:29:22 +00:00
msaitoh c33f30648e Initialize some members in a mbuf which is on stack. 2018-07-25 07:55:44 +00:00
msaitoh 3cd62456f9 Implement the BPF direction filter (BIOC[GS]DIRECTION). It provides backward
compatibility with BIOC[GS]SEESENT ioctl. The userland interface is the same
as FreeBSD.

 This change also fixes a bug that the direction is misunderstand on some
environment by passing the direction to bpf_mtap*() instead of checking
m->m_pkthdr.rcvif.
2018-06-26 06:47:57 +00:00
msaitoh c6a4b11da8 Removal of bpf_tap(). 2018-06-25 03:22:14 +00:00
ozaki-r 4f93a8de81 Protect packet input routines with KERNEL_LOCK and splsoftnet
if_input, i.e, ether_input and friends, now runs in softint without any
protections.  It's ok for ether_input itself because it's already MP-safe,
however, subsequent routines called from it such as carp_input and agr_input
aren't safe because they're not MP-safe.  Protect if_input with KERNEL_LOCK.

if_input can be called from a normal LWP context.  In that case we need to
prevent interrupts (softint) from running by splsoftnet to protect non-MP-safe
codes (e.g., carp_input and agr_input).

Pointed out by mlelstv@
2018-05-14 02:55:03 +00:00
ozaki-r 6a3a8456d3 Abandon unnecessary softint
The softint was introduced to defer fownsignal that was called in bpf_wakeup to
softint at v1.139, but now bpf_wakeup always runs in softint so we don't need
the softint anymore.
2018-01-25 02:45:02 +00:00
ozaki-r 580fb70bf3 Make softint and callout MP-safe 2017-12-15 07:29:11 +00:00
ozaki-r e0d574e4f8 Fix panic in callout_halt (fix typo)
Reported by wiz@
2017-12-12 06:26:57 +00:00
christos ea05286d92 add fo_name so we can identify the fileops in a simple way. 2017-11-30 20:25:54 +00:00
ozaki-r cead3b8854 Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change
2017-11-17 07:37:12 +00:00
maya 18b796d442 Use C99 initializer for filterops
Mostly done with spatch with touchups for indentation

@@
expression a;
identifier b,c,d;
identifier p;
@@
const struct filterops p =
- 	{ a, b, c, d
+ 	{
+ 	.f_isfd = a,
+ 	.f_attach = b,
+ 	.f_detach = c,
+ 	.f_event = d,
};
2017-10-25 08:12:37 +00:00
ozaki-r 7107584815 Turn on D_MPSAFE flag of bpf_cdevsw that is already MP-safe
Pointed out by k-goda@IIJ
2017-10-19 01:57:15 +00:00
ozaki-r b7100b390b Reinit a pslist entry before inserting it to a pslist again
Fix PR kern/51984
Tested by nonaka@
2017-02-20 03:08:38 +00:00
christos 290ad1f61f typo 2017-02-19 13:58:42 +00:00
ozaki-r e7ec750b6d Update comments to reflect bpf MP-ification 2017-02-13 03:44:45 +00:00
ozaki-r f66c9ca3fd Make bpf MP-safe
By the change, bpf_mtap can run without any locks as long as its bpf filter
doesn't match a target packet. Pushing data to a bpf buffer still needs
a lock. Removing the lock requires big changes and it's a future work.

Another known issue is that we need to remain some obsolete variables to
avoid breaking kvm(3) users such as netstat and fstat. One problem for
MP-ification is that in order to keep statistic counters of bpf_d we need
to use atomic operations for them. Once we retire the kvm(3) users, we
should make the counters per-CPU and remove the atomic operations.
2017-02-09 09:30:26 +00:00
ozaki-r c66c595b80 Reduce return points 2017-02-01 08:18:33 +00:00
ozaki-r 23e72bfbc4 Kill tsleep/wakeup and use cv 2017-02-01 08:16:42 +00:00
ozaki-r bbe8ead203 Make bpf_gstats percpu 2017-02-01 08:15:15 +00:00
ozaki-r 2fec859db2 Use pslist(9) instead of queue(9) for psz/psref
As usual some member variables of struct bpf_d and bpf_if remain to avoid
breaking kvm(3) users (netstat and fstat).
2017-02-01 08:13:45 +00:00
ozaki-r b76d85bbe1 Use kmem(9) instead of malloc/free 2017-02-01 08:07:27 +00:00
ozaki-r ddd60175a6 Make global variables static 2017-02-01 08:06:01 +00:00
ozaki-r 87e988a7d8 Use bpf_ops for bpf_mtap_softint
By doing so we don't need to care whether a kernel enables bpfilter or not.
2017-01-25 01:04:23 +00:00
ozaki-r 9674e2224b Defer bpf_mtap in Rx interrupt context to softint
bpf_mtap of some drivers is still called in hardware interrupt context.
We want to run them in softint as well as bpf_mtap of most drivers
(see if_percpuq_softint and if_input).

To this end, bpf_mtap_softint mechanism is implemented; it defers
bpf_mtap processing to a dedicated softint for a target driver.
By using the machanism, we can move bpf_mtap processing to softint
without changing target drivers much while it adds some overhead
on CPU and memory. Once target drivers are changed to softint-based,
we should return to normal bpf_mtap.

Proposed on tech-kern and tech-net
2017-01-24 09:05:27 +00:00
ozaki-r e9b008839d Make bpf_setf static 2017-01-23 10:17:36 +00:00
pgoyette 7c20c5d3bb Fix regression introduced in tests/net/bpf and tests/net/bpfilter
The rump code needs to call devsw_attach() in order to assign a dev_major
for bpf;  it then uses this to create rumps /dev/bpf node.  Unfortunately,
this leaves the devsw attached, so when the bpf module tries to initialize
itself, it gets an EEXIST error and fails.

So, once rump has figured what the dev_major should be, call devsw_detach()
to remove the devsw.  Then, when the module initialization code calls
devsw_attach() it will succeed.
2016-07-19 02:47:45 +00:00
pgoyette b380080ebc Now that we're only calling devsw_attach() in the modular driver, it
is not ok for the driver/module to already exist.  So don't ignore
EEXIST.
2016-07-17 02:48:07 +00:00
pgoyette 3c6a976d2d Don't initialize variables that no longer exist in built-in module. 2016-07-17 01:16:30 +00:00
pgoyette 5233aa279b Don't try to call devsw_attach() for built-in driver code. 2016-07-17 01:03:46 +00:00
knakahara 95fc145695 apply if_output_lock() to L3 callers which call ifp->if_output() of L2(or L3 tunneling). 2016-06-20 06:46:37 +00:00
ozaki-r fe6d427551 Avoid storing a pointer of an interface in a mbuf
Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
2016-06-10 13:31:43 +00:00
ozaki-r d938d837b3 Introduce m_set_rcvif and m_reset_rcvif
The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
2016-06-10 13:27:10 +00:00
pgoyette 532241d269 Create separate modules for i2c_bitbang and bpf_filter so these files
can be included in kernels which need them without also duplicating
them in other modules.  Removes the duplicate symbols I found which
prevented loading i2c and bpf modules after having fixed PR 45125.
2016-06-07 01:06:27 +00:00
ozaki-r 9c4cd06355 Introduce softint-based if_input
This change intends to run the whole network stack in softint context
(or normal LWP), not hardware interrupt context. Note that the work is
still incomplete by this change; to that end, we also have to softint-ify
if_link_state_change (and bpf) which can still run in hardware interrupt.

This change softint-ifies at ifp->if_input that is called from
each device driver (and ieee80211_input) to ensure Layer 2 runs
in softint (e.g., ether_input and bridge_input). To this end,
we provide a framework (called percpuq) that utlizes softint(9)
and percpu ifqueues. With this patch, rxintr of most drivers just
queues received packets and schedules a softint, and the softint
dequeues packets and does rest packet processing.

To minimize changes to each driver, percpuq is allocated in struct
ifnet for now and that is initialized by default (in if_attach).
We probably have to move percpuq to softc of each driver, but it's
future work. At this point, only wm(4) has percpuq in its softc
as a reference implementation.

Additional information including performance numbers can be found
in the thread at tech-kern@ and tech-net@:
http://mail-index.netbsd.org/tech-kern/2016/01/14/msg019997.html

Acknowledgment: riastradh@ greatly helped this work.
Thank you very much!
2016-02-09 08:32:07 +00:00
christos 654889b793 Do less work under the kernel lock, otherwise dhcpcd aborting causes us
to deadlock.
2016-02-01 16:32:28 +00:00
christos 43eac92e53 don't free mbuf twice.
XXX: pullup 7.
2015-12-16 23:14:42 +00:00
christos d522fec9f5 PR/49386: Ryota Ozaki: Add a mutex for bpf creation/removal to avoid races.
Add M_CANFAIL to malloc.
2015-10-14 19:40:09 +00:00
joerg adac2d746a Improve wording. 2015-05-30 19:14:46 +00:00
ozaki-r 9116f11456 Remove unnecessary variable bc 2014-12-29 13:38:13 +00:00
rmind b891d5cdc7 PR/49190: bpf_deliver: set scratch memory store in bpf_args_t. 2014-09-13 17:18:45 +00:00
matt 45b1ec740d Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.
2014-09-05 09:20:59 +00:00
ozaki-r 5b4238682e Use NULL instead of 0 for pointers 2014-08-07 03:40:21 +00:00
alnsn b67137b2bd Enable net.bpf.jit only if MODULAR and BPFJIT. Tweak a warning about postponed
jit activation.
2014-07-28 07:32:46 +00:00
dholland f9228f4225 Add d_discard to all struct cdevsw instances I could find.
All have been set to "nodiscard"; some should get a real implementation.
2014-07-25 08:10:31 +00:00
christos 0e34796007 initialize args the same way we do in filter. 2014-07-10 15:32:09 +00:00
alnsn 19fed70d36 Implement copfuncs and external memory in bpfjit. 2014-06-24 10:53:30 +00:00
dholland a68f9396b6 Change (mostly mechanically) every cdevsw/bdevsw I can find to use
designated initializers.

I have not built every extant kernel so I have probably broken at
least one build; however I've also found and fixed some wrong
cdevsw/bdevsw entries so even if so I think we come out ahead.
2014-03-16 05:20:22 +00:00
pooka 4f6fb3bf35 Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
2014-02-25 18:30:08 +00:00
christos c16aecd187 It is silly to kill the system when an interface failed to clear promiscuous
mode. Some return EINVAL when they are dying, but others like USB return EIO.
Downgrade to a DIAGNOSTIC printf. Same should be done for the malloc/NOWAIT,
but this is rarely hit.
2013-12-05 15:55:35 +00:00
rmind 5bd8916144 bpf_deliver: convert to bpf_filter_ext(). 2013-11-16 01:13:52 +00:00