Commit Graph

231 Commits

Author SHA1 Message Date
rmind
e75fa0930a Few fixes for Xen:
- cpu_load_pmap: use atomic kcpuset(9) operations; fixes rare crashes.
- Add kcpuset_copybits(9) and replace xen_kcpuset2bits().  Avoids incorrect
  ncpu problem in early boot.  Also, micro-optimises xen_mcast_invlpg() and
  xen_mcast_tlbflush() routines.

Tested by chs@.
2012-06-06 22:22:41 +00:00
rmind
0c79472223 - Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks.  This removes the
  limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.
2012-04-20 22:23:24 +00:00
jym
5b037abc92 Split the map/unmap code from the sync/flush code: move xpq_flush_queue()
calls after pmap_{,un}map_recursive_entries() so that pmap's handlers
handle the flush themselves.

Now pmap_{,un}map_recursive_entries() do what their names imply, nothing more.

Fix pmap_xen_suspend()'s comment: APDPs are now gone.

pmap's handlers are called deep during kernel save/restore. We already
are at IPL_VM + kpreemption disabled. No need to wrap the xpq_flush_queue()
with splvm/splx.
2012-03-11 17:14:30 +00:00
jym
9506b7bd53 Typo fix. 2012-03-11 16:16:44 +00:00
bouyer
dfde48d707 Add some more KASSERT() 2012-03-02 16:38:14 +00:00
bouyer
6132369235 MMUEXT_INVLPG_MULTI and MMUEXT_TLB_FLUSH_MULTI use a long as cpu mask,
not uint32_t, so  pass a pointer of the right type.
While there, cleanup includes and delete local, redundant define of PG_k.
2012-03-02 16:37:38 +00:00
bouyer
d50505c826 The code assumes that ci_index is also the Xen's cpunum, and that
cpunum is less than XEN_LEGACY_MAX_VCPUS. KASSERT both.
2012-02-25 18:57:50 +00:00
bouyer
d8304c6be9 Don't maintain ci_cpumask for physical CPUs, it's not used. 2012-02-24 11:43:06 +00:00
bouyer
b0388defd1 Get rid of phycpus_attached bitmask; it's maintained but not used and
will limit the number of physical CPUs to 32 without good reasons.
2012-02-24 11:31:23 +00:00
cherry
a9b15e0f47 (xen) - remove the (*xpq_cpu)() shim.We hasten the %fs/%gs setup process during boot.Although this is hacky, it lets us use the non-xen specificpmap_pte_xxx() functions in pmap code (and others). 2012-02-24 08:06:07 +00:00
bouyer
fa9d8fcc5b On Xen, there is variable-sized Xen data after the kernel's text+data+bss
(this include the physical->machine table).
(vaddr_t)(KERNBASE + NKL2_KIMG_ENTRIES * NBPD_L2) is after text+data+bss but,
on a domU with lots of RAM (more than 4GB) (so large
xpmap_phys_to_machine_mapping table) this can point to some of Xen's data
setup at bootstrap (either the xpmap_phys_to_machine_mapping table,
some page shared with the hypervisor, or our kernel page table). Using it for
early_zerop will cause of these pages to be unmapped after bootstrap.
This will cause a kernel page fault for the domU, either immediatly or
eventually much later, depending on where early_zerop points to.
To fix this, account for early_zerop when building the bootstrap pages,
and its VA from here.

May fix PR port-xen/38699
2012-02-23 18:59:21 +00:00
cherry
ad6a7fcd5d Cleanup.
- Remove cruft from native x86 origin.
 - Remove access to privileged MSRs.
 - Cleanup stale comments.
2012-02-23 07:30:30 +00:00
cherry
b82beb15c7 cpu_load_pmap() should not be used to load pmap_kernel(), since in the
x86 model, its mappings are shared across pmaps. KASSERT() for this
and remove unused codepaths.
2012-02-23 04:10:51 +00:00
bouyer
94e365bdd2 use pmap_protect() instead of pmap_kenter_pa() to remap R/O an exiting
page. This gets rid of the last "mapping already present" warnings.
2012-02-22 18:29:31 +00:00
bouyer
5dfe2dddcc Avoid early use of xen_kpm_sync(); locks are not available at this time.
Don't call cpu_init() twice.

Makes LOCKDEBUG kernels boot again
2012-02-21 19:10:13 +00:00
bouyer
d3ccea851c Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
  have different x86_curcpu symbols with different addresses in the kernel.
  Fortunably, all addresses dissaemble to the same code.
  Because of this we always use the code intended for bootstrap, which doesn't
  use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
  which cause it to sleep and pmap.c doesn't like that. It triggers this
  KASSERT() in pmap_unmap_ptes():
  KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
  needs to know on which CPU a pmap is loaded *now*:
  pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
  to a new pmap, leaving a window where a pmap is still in a CPU's
  ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
  by the hypervisor at any time, it can be large enough to let another
  CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.
2012-02-17 18:40:18 +00:00
jym
b3430e5626 PAT flags are not under control of Xen domains currently, so there is no
point in enabling them.

Avoids:
- a warning logged by hypervisor when a domain attempts to modify the PAT
MSR.
- an error during domain resuming, where a PAT flag has been set on a page
while the hypervisor does not allow it.

ok releng@.
2012-02-13 23:54:58 +00:00
cherry
cbdea16d58 Update comments to remove references to alternate pte space. 2012-01-28 12:15:19 +00:00
cherry
6bed7d4e8c stop using alternate pde mapping in xen pmap 2012-01-28 07:19:17 +00:00
cherry
c9bb90d5c5 Do not clobber pmap_kernel()'s pdir unnecessarily while syncing per-cpu pdirs 2012-01-22 18:16:34 +00:00
bouyer
a3e0c29742 add a missing splvm()/splx() to protect the xpq queue. 2012-01-19 22:04:05 +00:00
cherry
2504a10c74 relocate pte_lock initialisation to the earliest points after %fs is first usable in the XEN bootpath 2012-01-12 19:49:37 +00:00
cherry
66e35f7978 Make cross-cpu pte access MP safe.
XXX: review cases of use of pmap_set_pte() vs direct use of xpq_queue_pte_update()
2012-01-09 13:04:13 +00:00
cherry
0e1fd236aa Harden cross-cpu L3 sync - avoid optimisations that may race.
Update ci->ci_kpm_pdir from user pmap, not global pmap_kernel() entry which may get clobbered by other CPUs.
XXX: Look into why we use pmap_kernel() userspace entries at all.
2012-01-09 12:58:49 +00:00
cherry
d515295709 revert previous commit. DIAGNOSTIC should only do strict checks, not muffle current ones 2012-01-09 04:39:14 +00:00
cherry
44fb314fb7 Address those pesky DIAGNOSTIC messages. \n
Take a performance hit at fork() for not DTRT. \n
Note: Only applicable for kernels built with "options DIAGNOSTIC" \n
2012-01-06 15:15:27 +00:00
cherry
84d4985e86 Use macro PDP_SIZE instead of numeric constant, for unshared PAE L3 entries.
Thanks jym@
2012-01-04 10:30:23 +00:00
cherry
b83ccb0e45 Never cut-paste code from email!
Use the right count (0 -> 2) of l3 unshared userland entries for per-cpu initialisation.
2011-12-30 19:18:35 +00:00
cherry
d827fd25ea Force pae l3 page allocation for new vcpus to be < 4G, so they fit in 32bits 2011-12-30 18:01:20 +00:00
cherry
d12f2f3b2f per-cpu shadow directory pages should be updated locally via cross-calls. Do this. 2011-12-30 16:55:21 +00:00
cherry
7603d0cfb3 Remove spurious (debug) printf() 2011-12-30 12:16:19 +00:00
cherry
6cc7a9d8d3 Remove temporary variable definition that is unused in non DIAGNOSTIC builds. 2011-12-28 18:59:21 +00:00
cherry
c4baef9634 Optimise branch predict hint for the intended use-case (cross cpu event notification) 2011-12-27 07:47:00 +00:00
cherry
533ee572b1 Do not touch pending flags across vcpus 2011-12-27 07:45:41 +00:00
cherry
4dcdeeab68 Do not fiddle with the event masks of non-local vcpus when unmasking events across vcpus 2011-12-26 18:27:11 +00:00
cegger
908dafc263 switch from xen3-public to xen-public. 2011-12-07 15:47:41 +00:00
bouyer
ad7affb170 hypervisor_unmask_event(): don't check/update evtchn_pending_sel for the
current CPU, but for any CPU which may accept this event.
xen/xenevt.c: more use of atomic ops and locks where appropriate, and some
  other SMP fixes. Handle all events on the primary CPU (may be revisited
  later). Set/clear ci_evtmask[] for watched events.

This should fix the problems on dom0 kernels reported by jym@
2011-12-03 22:41:40 +00:00
jym
1eaed4e6e6 Move Xen-specific functions to Xen pmap. Requested by cherry@.
Un'ifdef XEN in xen_pmap.c, it is always defined there.
2011-11-23 00:56:56 +00:00
jym
6bfeabc65a Expose pmap_pdp_cache publicly to x86/xen pmap. Provide suspend/resume
callbacks for Xen pmap.

Turn static internal callbacks of pmap_pdp_cache.

XXX the implementation of pool_cache_invalidate(9) is still wrong, and
IMHO this needs fixing before -6. See
http://mail-index.netbsd.org/tech-kern/2011/11/18/msg011924.html
2011-11-20 19:41:27 +00:00
cherry
de4e5fae37 [merging from cherry-xenmp] bring in bouyer@'s changes via:
http://mail-index.netbsd.org/source-changes/2011/10/22/msg028271.html
From the Log:
Log Message:
Various interrupt fixes, mainly:
keep a per-cpu mask of enabled events, and use it to get pending events.
A cpu-specific event (all of them at this time) should not be ever masked
by another CPU, because it may prevent the target CPU from seeing it
(the clock events all fires at once for example).
2011-11-19 17:13:39 +00:00
cherry
3520926365 Expose the PG_k #define pt/pd bit to both xen and "baremetal" x86. This is required, since kernel pages are mapped with user permissions in XEN/amd64 since the VM kernel runs in ring3. Since XEN/i386(including PAE) runs in ring1, supervisor mode is appropriate for these ports. We need to share this since the pmap implementation is still shared. Once the xen implementation is sufficiently independant of the x86 one, this can be made private to xen/include/xenpmap.h 2011-11-08 17:16:52 +00:00
cherry
926a93384f Add an ipi callback to force hypervisor callback. this is useful to "re-route" interrupts to a given vcpu 2011-11-07 15:51:31 +00:00
cherry
c9745c1f66 [merging from cherry-xenmp] make pmap_kernel() shadow PMD per-cpu and MP aware. 2011-11-06 15:18:18 +00:00
cherry
396b8b4abf [merging from cherry-xenmp] Make the xen MMU op queue locking api private. Implement per-cpu queues. 2011-11-06 11:40:46 +00:00
jruoho
e23dd3f620 Remove code that is commented out and out-of-sync with x86. If Xen needs to
use cpu_resume(), cpu_suspend(), or cpu_shutdown() in the future, it is
better to expose these from x86 rather than duplicate code.
2011-10-20 13:21:11 +00:00
jym
2c4b0fd95e Move Xen specific functions out of x86 native pmap to xen_pmap.c.
Provide a wrapper to trigger pmap pool_cache(9) invalidations without
exposing the caches to outside world.
2011-10-18 23:43:06 +00:00
mrg
8f93e1bd21 remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.
2011-10-06 06:56:29 +00:00
jruoho
7feffa2641 Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume. 2011-09-28 15:38:21 +00:00
jym
325494fe33 Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html
2011-09-27 01:02:33 +00:00
jym
eba16022d3 Merge jym-xensuspend branch in -current. ok bouyer@.
Goal: save/restore support in NetBSD domUs, for i386, i386 PAE and amd64.

Executive summary:
- split all Xen drivers (xenbus(4), grant tables, xbd(4), xennet(4))
in two parts: suspend and resume, and hook them to pmf(9).
- modify pmap so that Xen hypervisor does not cry out loud in case
it finds "unexpected" recursive memory mappings
- provide a sysctl(7), machdep.xen.suspend, to command suspend from
userland via powerd(8). Note: a suspend can only be handled correctly
when dom0 requested it, so provide a mechanism that will prevent
kernel to blindly validate user's commands

The code is still in experimental state, use at your own risk: restore
can corrupt backend communications rings; this can completely thrash
dom0 as it will loop at a high interrupt level trying to honor
all domU requests.

XXX PAE suspend does not work in amd64 currently, due to (yet again!)
page validation issues with hypervisor. Will fix.

XXX secondary CPUs are not suspended, I will write the handlers
in sync with cherry's Xen MP work.

Tested under i386 and amd64, bear in mind ring corruption though.

No build break expected, GENERICs and XEN* kernels should be fine.
./build.sh distribution still running. In any case: sorry if it does
break for you, contact me directly for reports.
2011-09-20 00:12:23 +00:00