- cpu_load_pmap: perform tlbflush() after xen_set_user_pgd().
- xen_pmap_bootstrap: perform xpq_queue_tlb_flush() in the end.
- pmap_tlb_shootdown: do not check PG_G for Xen.
- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.
- Simplify locking in some pmap(9) modules by removing P->V locking.
- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).
- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.
- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.
Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
not in HEAD:
- use uvm_km_alloc() instead of kmem_alloc() to enforce alignement when
allocating p2m_frame pages (xentools can only deal with page-aligned
addresses)
- do not use paddr_t for p2m_frame_list_list with PAE, xentools expect
32 bits PFNs even with 64 bits PTE.
Required to make ``xm dump-core'' work as expected.
call it for different levels (L1 => L4).
Replace all calls to xpq_queue_pin_table(...) in MD code with these new
functions, with proper #ifdef'ing depending on $MACHINE.
Rationale:
- only one function to modify for logging
- pushes responsibility to caller for chosing the proper pin level, rather
than Xen internal functions; this makes the pin level explicit rather than
implicit.
Boot tested for dom0 i386/amd64, PAE included. No functional change intended.
handled by the hypervisor and all CPUs are running when the dom0 is started.
In addition, we don't have a reliable way to determine the boot CPU as
- we may not be running on the boot CPU
- we don't have access to the lapic id
So simplify by ignoring the information and assign phycpu_info_primary to the
first attached CPU.
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.
Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.
This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.
PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).
In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.
When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.
Notes:
- reworked locore.S
- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.
- some changes in bioscall and kvm86_call, to reflect the above.
- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).
- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).
- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).
- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).
- 64 bits atomic functions for pmap
- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).
- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.
See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html
No objection raised on port-i386@ and port-xen@R for about a week.
XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).
XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.
hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.
x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.
Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html
No comments on this last version.
kernels, and use them in the bus_space(9) implementation instead of ugly
Xen #ifdef-age. In a non-Xen kernel, the _ma() functions either call or
alias the equivalent _pa() functions.
Reviewed on port-xen@netbsd.org and port-i386@netbsd.org. Passes
rmind@'s and bouyer@'s inspection. Tested on i386 and on Xen DOMU /
DOM0.
bus_space_tag. For now, bus_space_tag's only member is
bst_type, the type of space, which is either X86_BUS_SPACE_IO
or X86_BUS_SPACE_MEM. In the future, new bus_space_tag members
will refer to override-functions installed by a new function,
bus_space_tag_create(9).
Add pointers to constant struct bus_space_tag, x86_bus_space_io and
x86_bus_space_mem. Use them to replace most uses of X86_BUS_SPACE_IO
and X86_BUS_SPACE_MEM.
Add an x86-specific bus_space_is_equal(9) implementation that compares
the two tags' bst_type.
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).
- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).
- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.
- replace checks against CPUID_TSC with the cpu_hascounter() function.
- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().
- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.
- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().
- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).
This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.
XXX Should kernel rev be bumped?
XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.
it's perfectly valid for bus_dmamap_create() to do so (a contigous
transfers will then split in multiple segment).
Fix _xen_bus_dmamem_alloc_range() and _bus_dmamem_alloc_range() to
allow a boundary limit smaller than size:
- compute appropriate boundary for uvm_pglistalloc(), wich doesn't
accept boundary < size
- also take care of boundary when deciding to start a new segment.
While there, remove useless boundary argument to _xen_alloc_contig().
Fix the boundary-related issue of PR port-amd64/42980
by XENMEM_decrease_reservation, it is checked by the hypervisor. In certain
circumstances (stack leak), the field could have an improper value, leading
to a fail of the hypercall.
Set it to 0 ("no addressing restriction") to avoid that.
Patch tested by Sam Fourman and haad@.
This should fix the rare "failed allocating DMA memory" encountered
under NetBSD dom0. Will ask for a pull-up.
- NBPD_* macros are set to the types that better match their architecture
(UL for i386 and amd64, ULL for i386 PAE) - will revisit when paddr_t is
set to 64 bits for i386 non-PAE.
- type fixes in printf/printk messages (Use PRIxPADDR when printing paddr_t
values, instead of %lx - paddr_t/pd_entry_t being 64 bits with PAE)
- remove casts that are no more needed now that Xen2 support has been dropped
Some fixes are from jmorse@ patches for PAE.
Compile + tested for i386 GENERIC and XEN3 kernels. Only compile tested for
amd64.
Reviewed by bouyer@.
See also http://mail-index.netbsd.org/tech-kern/2010/02/22/msg007373.html
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.
pinning/unpinning, set_ldt, invlpg) operations cannot be queued in the
xpq_queue[] any more, as they use their own specific hypercall, mmuext_op().
Their associated xpq_queue_*() functions already call xpq_flush_queue()
before issuing the mmuext_op() hypercall, which makes these xpq_flush_queue()
calls not necessary.
Rapidly discussed with bouyer@ in private mail. XEN3_DOM0/XEN3PAE_DOM0 tested
through a build.sh release, amd64 was only compile tested. No regression
expected.
- Replace most remaining uses of l_addr with uvm_lwp_getuarea() or lwp_getpcb().
- Amend assembly in ports where it accesses PCB via struct user.
- Rename L_ADDR to L_PCB in few places. Reduce sys/user.h inclusions.