Commit Graph

63 Commits

Author SHA1 Message Date
ad e1c190d346 sched_preempted(): always clear LP_TELEPORT. 2020-03-08 15:00:31 +00:00
ad 5f741a4fe2 sched_takecpu(): for vfork(), when looking at curcpu's runqueue consider
maximum priority waiting to run and not count of LWPs.
2020-01-25 15:09:54 +00:00
ad e4b63a3e95 sched_bestcpu(): break out of the loop earlier. 2020-01-18 13:53:50 +00:00
ad 2861694849 sched_catchlwp(): fix an inverted test that could have caused performance
degradation.
2020-01-17 20:27:28 +00:00
ad 20469ddbac - Fix an inverted test that could have prevented LWPs running on helper
CPUs to teleport somewhere better during preempt().

- Fix a comment.
2020-01-13 11:53:24 +00:00
ad e78f9b4fde A final set of scheduler tweaks:
- Try hard to keep vfork() parent and child on the same CPU until execve(),
  failing that on the same core, but in all other cases scatter new LWPs
  among the different CPU packages, round robin, to try and get the best out
  of the available cache and bus bandwidth.

- Remove attempts at balancing.  Replace with a rate-limited skim of other
  CPU's run queues in sched_idle(), starting in the current package and
  moving outwards.  Add a sysctl tunable to change the interval.

- Make the cacheht_time tuneable take a milliseconds value.

- It's possible to configure things such that there's no CPU allowed to run
  an LWP.  Defeat this by always having a default:

Reported-by: syzbot+46968944dd9359ab93bc@syzkaller.appspotmail.com
Reported-by: syzbot+7f750a4cc230d1e831f9@syzkaller.appspotmail.com
Reported-by: syzbot+88d7675158f5cb4684db@syzkaller.appspotmail.com
Reported-by: syzbot+d409c2338150e9a8ae1e@syzkaller.appspotmail.com
Reported-by: syzbot+e152dc5bff188f67358a@syzkaller.appspotmail.com
2020-01-12 22:03:22 +00:00
ad c5b060977a - Many small tweaks to the SMT awareness in the scheduler. It does a much
better job now at keeping all physical CPUs busy, while using the extra
  threads to help out.  In particular, during preempt() if we're using SMT,
  try to find a better CPU to run on and teleport curlwp there.

- Change the CPU topology stuff so it can work on asymmetric systems.  This
  mainly entails rearranging one of the CPU lists so it makes sense in all
  configurations.

- Add a parameter to cpu_topology_set() to note that a CPU is "slow", for
  where there are fast CPUs and slow CPUs, like with the Rockwell RK3399.
  Extend the SMT awareness to try and handle that situation too (keep fast
  CPUs busy, use slow CPUs as helpers).
2020-01-09 16:35:03 +00:00
ad 2ddceed1d9 Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
  calling cpu_switchto().  It's not safe to let other actors mess with the
  LWP (in particular l->l_cpu) while it's still context switching.  This
  removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
  it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead.  Everything
  is in cache anyway so it wasn't buying much by trying to avoid saving old
  state.  This means cpu_switchto() will never be called with prevlwp ==
  NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.
2020-01-08 17:38:41 +00:00
ad dc6d146464 - Another tweak for SMT: if a new LWP is starting life, try to make it run
on a different CPU in the same CPU core as the parent, because both parent
  and child share lots of state.  (I want to come back later and do
  something different for _lwp_create() and maybe execve().)

- Remove the runqueue evcnts, which are racy and impose a penalty for very
  little payoff.

- Break out of the loop in sched_takecpu() as soon as we have a CPU that can
  run the LWP.  There's no need to look at all CPUs.

- SPCF_IDLE in sched_enqueue() so we know the CPU is not idle sooner.
2020-01-05 20:26:56 +00:00
ad 401aa4758b A couple of scheduler tweaks which benchmark well for me:
- Add some simple SMT awareness.  Try to keep as many different cores loaded
  up with jobs as possible before we start to make use of SMT.  Have SMT
  "secondaries" function more as helpers to their respective primaries.
  This isn't enforced, it's an effort at herding/encouraging things to go in
  the right direction (for one because we support processor sets and those
  can be configured any way that you like).  Seen at work with "top -1".

- Don't allow sched_balance() to run any faster than the clock interrupt,
  because it causes terrible cache contention.  Need to look into this in
  more detail because it's still not ideal.
2020-01-04 22:46:01 +00:00
ad dece39714a - Add some more failsafes to the CPU topology stuff, and build a 3rd
circular list of peer CPUs in other packages, so we might scroll through
  them in the scheduler when looking to distribute or steal jobs.

- Fold the run queue data structure into spc_schedstate.  Makes kern_runq.c
  a far more pleasant place to work.

- Remove the code in sched_nextlwp() that tries to steal jobs from other
  CPUs.  It's not needed, because we do the very same thing in the idle LWP
  anyway.  Outside the VM system this was one of the the main causes of L3
  cache misses I saw during builds.  On my machine, this change yields a
  60%-70% drop in time on the "hackbench" benchmark (there's clearly a bit
  more going on here, but basically being less aggressive helps).
2019-12-03 22:28:41 +00:00
ad 57eb66c673 Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
  the IPI bitmask and ci_want_resched.
2019-12-01 15:34:44 +00:00
ad 036b61e0aa PR port-sparc/54718 (sparc install hangs since recent scheduler changes)
- sched_tick: cpu_need_resched is no longer the correct thing to do here.
  All we need to do is OR the request into the local ci_want_resched.

- sched_resched_cpu: we need to set RESCHED_UPREEMPT even on softint LWPs,
  especially in the !__HAVE_FAST_SOFTINTS case, because the LWP with the
  LP_INTR flag could be running via softint_overlay() - i.e. it has been
  temporarily borrowed from a user process, and it needs to notice the
  resched after it has stopped running softints.
2019-12-01 13:20:42 +00:00
ad 79ec83719f Don't try to IPI other CPUs early on. Fixes a crash on sparc64. Thanks
to martin@ for diagnosing.
2019-11-27 20:31:13 +00:00
ad c9afc9987a Pull in sys/atomic.h. 2019-11-23 22:35:08 +00:00
ad 11ba4e1830 Minor scheduler cleanup:
- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
  sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.
2019-11-23 19:42:52 +00:00
chs fd34ea77eb remove checks for failure after memory allocation calls that cannot fail:
kmem_alloc() with KM_SLEEP
  kmem_zalloc() with KM_SLEEP
  percpu_alloc()
  pserialize_create()
  psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
2017-06-01 02:45:05 +00:00
mlelstv 46f58a90c6 When balancing threads over multiple CPUs, use fixpoint arithmetic
for averages. Otherwise the decisions can be heavily biased by rounding
errors.

Add sysctl kern.sched_average_weight to change the weight of
historical data, the default is 50%.
2016-12-22 14:11:58 +00:00
msaitoh 8bc54e5be6 KNF. Remove extra spaces. No functional change. 2016-07-07 06:55:38 +00:00
christos 2210ed24e7 provide curthread for dtrace 2015-10-07 00:32:34 +00:00
wiz 87561671c1 defintion -> definition 2014-08-03 19:14:24 +00:00
pooka 4f6fb3bf35 Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
2014-02-25 18:30:08 +00:00
rmind df64447ca6 Remove cpu_queue (and thus eleminate another use of CIRCLEQ) by replacing
its uses with cpu_infos array.  Extra testing by christos@.
2013-11-24 21:58:38 +00:00
christos e5d5564a4b remove __unused now that it is used. 2013-10-19 19:22:16 +00:00
martin 8afc72d050 cpu_need_resched(ci, type) might not make use of the type argument - mark
the variable declaration accordingly.
2013-10-19 18:42:05 +00:00
yamt 37fc08318c revert rev.1.37 for now.
PR/47634 from Ryo ONODERA.
while i have no idea how this change can break bge,
i don't have hardware and/or time to investigate right now.
2013-03-12 23:16:31 +00:00
yamt 69f842b1d9 - use scaled calculations for avgcount
- sched_balance: account lwp which is currently running
- sched_balance: skip cpus w/o migratable lwps
2013-03-06 11:25:01 +00:00
christos a67c3c8971 printflike maintenance. 2013-02-09 00:31:21 +00:00
matt b0d1f89948 Change KASSERT to KASSERTMSG 2012-08-30 02:25:35 +00:00
para 05f35f5342 change sched_upreempt_pri default to 0 as discussed on tech-kern@
should improve interactive performance on SMP machines
as user preemption happens immediately in x-cpu wakeup case now
2012-02-23 12:24:05 +00:00
yamt 97e20eb0a1 comments 2011-12-02 12:31:03 +00:00
rmind 501dd321fb Remove LW_AFFINITY flag and fix some bugs affinity mask handling. 2011-08-07 21:13:05 +00:00
rmind 52b220e91d Add kcpuset(9) - a reworked dynamic CPU set implementation for kernel.
Suitable for use during the early boot.  MD and other implementations
should be replaced with this interface.

Discussed on: tech-kern@
2011-08-07 13:33:01 +00:00
yamt b1521a3612 remove redundant checks of PK_MARKER. 2010-03-03 00:47:30 +00:00
mrg efc854cf68 introduce a new function that returns a unique string for each cpu:
char *cpu_name(struct cpu_info *);

and use it when setting up the runq event counters, avoiding an 8 byte
kmem(4) allocation for each cpu.  there are more places the cpuname is
used that can be converted to using this new interface, but that can
and will be done as future work.

as discussed with rmind.
2010-01-13 01:57:17 +00:00
rmind 65265dedb7 sched_catchlwp: fix the case when other CPU might see curlwp->l_cpu != curcpu()
while LWP is finishing context switch.  Should fix PR/42539, tested by martin@.
2009-12-30 23:49:59 +00:00
rmind 40cf6f3659 Remove uarea swap-out functionality:
- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code.  Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.
2009-10-21 21:11:57 +00:00
ad 822f68cc07 If DEBUG is enabled, drop kpreempt_pri to zero. It means that every
wakeup will cause a kernel preemption, simulating massive concurrency.

Proposed on tech-kern@.
2009-03-02 21:17:29 +00:00
rmind 3de401ae19 Make sched_getrq() inline (gcc does not optimize it), avoids call. 2009-02-17 22:00:14 +00:00
rmind d1efa8f729 - Avoid calling sched_catchlwp() if CPUs have different processor-sets.
- sched_takecpu: check for psid earlier (be more strict).

PR/40419.
2009-01-18 05:07:51 +00:00
ad 7ad98abc71 - Wrap sys/cpu.h contents in _LOCORE.
- Add a RESCHED_LAZY flag and use instead of zero.
2008-12-02 17:57:32 +00:00
rmind 337b081fed - Replace lwp_t::l_sched_info with union: pointer and timeslice.
- Change minimal time-quantum to ~20 ms.
- Thus remove unneeded pool in M2, and unused sched_lwp_exit().
- Do not increase l_slptime twice for SCHED_4BSD (regression fix).
2008-10-07 09:48:27 +00:00
rmind ae626d791a - Schedule bound threads even if CPU is offline. Might be revisited later,
when decision what to do with already bound threads will be made.
- Do not allow to assign offline CPU to the processor-set.

Quick fix for PR/39349.
2008-09-30 16:28:45 +00:00
rmind d489642431 sched_migratable: add KASSERT since this function cannot be called
without lock held now.  Few cosmetic changes, while here.
2008-07-14 01:18:10 +00:00
christos 1d875fc75f Adjust to separate kcpuset_t and cpuset_t. 2008-06-22 00:06:36 +00:00
rmind 481ae1556f - Add general cpuset macros.
- Use kcpuset name for kernel-only functions.
- Use cpuid_t to specify CPU ID.
- Unify all cpuset users.

API is expected to be stable now.
2008-06-16 01:41:20 +00:00
christos f30b5785d5 Don't expose struct cpuset, share the l_affinity flag and only allocate it
if we need to. This is not a compatible change, but the syscalls are new
enough and they don't need to be versioned. Approved by rmind.
2008-06-15 20:32:57 +00:00
ad 13cf4bcc55 PR kern/38663 Kernel preemption can't be enabled on x86 because of amd64
FPU handling

Remove ifdef(i386), kernel preemption works on amd64 now.
2008-05-30 12:18:14 +00:00
rmind a68758f8bd sched_idle: initialise 'tci' to NULL, avoids compiler warning. 2008-05-30 08:31:42 +00:00
rmind 29170d3854 Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.
2008-05-29 22:33:27 +00:00