NetBSD

Author	SHA1	Message	Date
maxv	ac670b3099	Instead of using a global array with per-cpu indexes, embed the tmp VAs into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64}, because amd64 already has a direct map that is way faster than that. There are two major issues with the global array: maxcpus entries are allocated while it is unlikely that common i386 machines have so many cpus, and the base VA of these entries is not cache-line-aligned, which mostly guarantees cache-line-thrashing each time the VAs are entered. Now the number of tmp VAs allocated is proportionate to the number of CPUs attached (which therefore reduces memory consumption), and the base is properly aligned. On my 3-core AMD, the number of DC_refills_L2 events triggered when performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on average divided by two with this patch. Discussed on tech-kern a little.	2017-02-11 14:11:24 +00:00
maxv	6c9d31ed8a	Rename ldt->ldtstore and gdt->gdtstore on i386. It reduces the diff with amd64, and makes it easier to track down these variables on nxr - 'ldt' and 'gdt' being common keywords.	2017-02-05 10:42:21 +00:00
maxv	2b26583164	Increase KERNTEXTOFF from 1MB to 2MB on amd64. [1MB; 2MB[ is now handled by UVM, so there is no physical loss. On amd64 we always remap the kernel text with 2MB pages, and because of the 1MB start address we were forced to map [0MB; 2MB[ inside the first large page. The problem is, the lower half is used by UVM to allocate physical pages, and it is possible that some of these could be used by userland. We could end up with userland-controllable data mapped into the kernel text on a privileged page, which is far from being a good idea from a security pov. I am not fixing i386 yet, because the large page size depends on PAE, and we probably don't want to have a text located at 4MB on low-memory systems. (note: I didn't introduce this issue, it was already there when I came in)	2017-02-02 19:09:08 +00:00
maxv	a4a4753729	Use __read_mostly on these variables, to reduce the probability of false sharing.	2017-02-02 08:57:04 +00:00
maxv	4d2995d98f	Import xpmap_pg_nx, and put it in the per-cpu recursive slot on amd64.	2017-01-22 19:42:48 +00:00
maxv	14f8c1a9b1	Export xpmap_pg_nx, and put it in the page table pages. It does not change anything, since Xen removes the X bit on these; but it is better for consistency.	2017-01-22 19:24:51 +00:00
maxv	2b321a2d23	Remove a few #if 0s, and explain what we are doing on PAE: the last two PAs are entered in reversed order.	2017-01-06 08:32:26 +00:00
cherry	85a999caa3	In the MP case, do not attempt to pmap_tlb_shootdown() after a pmap_kenter_ma() during boot. pmap_tlb_shootdown() assumes post boot. Instead invalidate the entry on the local CPU only. XXX: to DTRT, probably this assumption needs re-examination. XXX: The tradeoff is a (predicted) single word size comparison penalty, so perhaps a decision needs performance stats.	2016-12-26 08:53:11 +00:00
skrll	fb73e3c1d6	Hold the interlock before cv_broadcast as per condvar(9)	2016-12-26 08:16:28 +00:00
cherry	b94dbe7b54	balloon(4) can now use uvm_hotplug(9) Do this.	2016-12-23 17:01:10 +00:00
maxv	8a3e058bbd	The way the xen dummy page is taken care of makes absolutely no sense at all, with magic offsets here and there in different layers of the system. It is just blind luck that everything has always worked as expected so far. Due to this wrong design we have a problem now: we allocate one physical page for lapic, and it happens to overlap with the dummy page, which causes the system to crash. Fix this by keeping the dummy va directly in a variable instead of magic offsets. The asm locore now increments the first pa to hide the dummy page to machdep and pmap.	2016-12-16 19:52:22 +00:00
kamil	241cf91ddc	Add support for hardware assisted watchpoints/breakpoints API in ptrace(2) Add new ptrace(2) calls: - PT_COUNT_WATCHPOINTS - count the number of available hardware watchpoints - PT_READ_WATCHPOINT - read struct ptrace_watchpoint from the kernel state - PT_WRITE_WATCHPOINT - write new struct ptrace_watchpoint state, this includes enabling and disabling watchpoints The ptrace_watchpoint structure contains MI and MD parts: typedef struct ptrace_watchpoint { int pw_index; /* HW Watchpoint ID (count from 0) / lwpid_t pw_lwpid; / LWP described / struct mdpw pw_md; / MD fields / } ptrace_watchpoint_t; For example amd64 defines MD as follows: struct mdpw { void md_address; int md_condition; int md_length; }; These calls are protected with the __HAVE_PTRACE_WATCHPOINTS guard. Tested on amd64, initial support added for i386 and XEN. Sponsored by <The NetBSD Foundation>	2016-12-15 12:04:17 +00:00
ozaki-r	dd8638eea5	Move bpf_mtap and if_ipackets++ on Rx of each driver to percpuq if_input The benefits of the change are: - We can reduce codes - We can provide the same behavior between drivers - Where/When if_ipackets is counted up - Note that some drivers still update packet statistics in their own way (periodical update) - Moved bpf_mtap run in softint - This makes it easy to MP-ify bpf Proposed on tech-kern and tech-net	2016-12-15 09:28:02 +00:00
kamil	192986ec23	Torn down KSTACK_CHECK_DR0, i386-only feature to detect stack overflow This feature was intended to detect stack overflow with CPU Debug Registers (x86). It was never ported to other ports, neither amd64 and should be adapted for SMP... Currently there might be better ways to detect stack overflows like page mapping protection. Since the number of Debug Registers is restricted (4 on x86), torn it down completely. This interface introduced helper functions for Debug Registers, they will be replaced with the new <x86/dbregs.h> interface. KSTACK_CHECK_DR0 was disabled by default and won't affect ordinary users. Sponsored by <The NetBSD Foundation>	2016-12-13 10:54:27 +00:00
kamil	9b94e002e8	Switch x86 CPU Debug Register types from vaddr_t to register_t This is more opaque and appropriate type, as vaddr_t is meant to be used for vitual address value. Not all DR on x86 are used to represent virtual address (DR6 and DR7 are definitely not). No functional change intended. Change suggested by <christos> Sponsored by <The NetBSD Foundation>	2016-12-13 10:21:33 +00:00
kamil	266caf90fb	Add accessors for available x86 Debug Registers There are 8 Debug Registers on i386 (available at least since 80386) and 16 on AMD64. Currently DR4 and DR5 are reserved on both cpu-families and DR9-DR15 are still reserved on AMD64. Therefore add accessors for DR0-DR3, DR6-DR7 for all ports. Debug Registers x86: * DR0-DR3 Debug Address Registers * DR4-DR5 Reserved * DR6 Debug Status Register * DR7 Debug Control Register * DR8-DR15 Reserved Access the registers is available only from a kernel (ring 0) as there is needed top protected access. For this reason there is need to use special XEN functions to get and set the registers in the XEN3 kernels. XEN specific functions as defined in NetBSD: - HYPERVISOR_get_debugreg() - HYPERVISOR_set_debugreg() This code extends the existing rdr6() and ldr6() accessor for additional: - rdr0() & ldr0() - rdr1() & ldr1() - rdr2() & ldr2() - rdr3() & ldr3() - rdr7() & ldr7() Traditionally accessors for DR6 were passing vaddr_t argument, while it's appropriate type for DR0-DR3, DR6-DR7 should be using u_long, however it's not a big deal. The resulting functionality should be equivalent so stick to this convention and use the vaddr_t type for all DR accessors. There was already a function defined for rdr6() in XEN, but it had a nit on AMD64 as it was casting HYPERVISOR_get_debugreg() to u_int (32-bit on AMD64), truncating result. It still works for DR6, but for the sake of simplicity always return full 64-bit value. New accessors duplicate functionality of the dr0() function available on i386 within the KSTACK_CHECK_DR0 option. dr0() is a specialized layer with logic to set appropriate types of interrupts, now accessors are designed to pass verbatim values from user-land (with simple sanity checks in the kernel). At the moment there are no plans to make possible to coexist KSTACK_CHECK_DR0 with debug registers for user applications (debuggers). options KSTACK_CHECK_DR0 Detect kernel stack overflow using DR0 register. This option uses DR0 register exclusively so you can't use DR0 register for other purpose (e.g., hardware breakpoint) if you turn this on. The KSTACK_CHECK_DR0 functionality was designed for i386 and never ported to amd64. Code tested on i386 and amd64 with kernels: GENERIC, XEN3_DOMU, XEN3_DOM0. Sponsored by <The NetBSD Foundation>	2016-11-27 14:49:21 +00:00
maxv	415d831276	KNF a little	2016-11-25 12:20:03 +00:00
ozaki-r	61f9115f54	Sweep unnecessary xcall.h inclusions	2016-11-21 04:10:05 +00:00
maxv	401e389a28	Mmh, apparently I didn't properly test my previous change since it does not compile anymore	2016-11-15 17:01:12 +00:00
maxv	944fdd3c50	Keep simplifying that stuff. Also, replace plX_pi(KERNTEXTOFF) by LX_SLOT_KERNBASE: the base address is KERNBASE, and we just start mapping from KERNTEXTOFF. For symmetry with the normal amd64, does not change anything.	2016-11-15 15:37:20 +00:00
maxv	979460599e	Rename xen_pmap_bootstrap to xen_locore, it really has nothing to do with pmap and is just a C version of what amd64 and i386 do in asm.	2016-11-11 11:34:51 +00:00
maxv	748acd38e3	Start simplifying the Xen locore: rename and reorder several things, remove awful debug messages, use unsigned counters, fix typos and KNF.	2016-11-11 11:12:42 +00:00
maxv	6afccfe796	Map the PTE space as non-executable on PAE. The same is already done on amd64.	2016-11-01 12:16:10 +00:00
maxv	36b3504040	Map the remaining pages as non-executable. Only text should have X.	2016-11-01 12:00:21 +00:00
jdolecek	ee25a7878f	provide stub intr xname establish for xen	2016-10-17 18:23:49 +00:00
kre	30a3d786ca	This should return the amd64 build to a working state (and hopefully i386 as well) - but this is a hideous hack, and should be reverted as soon as a better (which means any) alternative is available.	2016-10-16 06:40:43 +00:00
joerg	d87b908a67	Given Xen/i386 the same process and file limit as native i386.	2016-09-23 22:07:12 +00:00
bouyer	7929952b1a	Revert to 1.59 (adding back the W^X kernel mapings), and move the data+bss mapping late so that mappings that should be RO (such as page tables) won't be made RW by accident.	2016-08-25 17:03:57 +00:00
bouyer	db4f79c55f	Stopgap measure: revert to rev 1.56. starting with 1.57 an i386PAE Xen kernel doesn't boot: (XEN) mm.c:2394:d139v0 Bad type (saw 5400000000000001 != exp 7000000000000000) for mfn 1136f5 (pfn 621) (XEN) mm.c:887:d139v0 Could not get page type PGT_writable_page (XEN) mm.c:939:d139v0 Error getting mfn 1136f5 (pfn 621) from L1 entry 00000001136f5003 for l1e_owner=139, pg_owner=139 (XEN) mm.c:1254:d139v0 Failure in alloc_l1_table: entry 33 (XEN) mm.c:2141:d139v0 Error while validating mfn 112f57 (pfn dbf) for type 1000000000000000: caf=8000000000000003 taf=1000000000000001 (XEN) mm.c:947:d139v0 Attempt to create linear p.t. with write perms (XEN) mm.c:1330:d139v0 Failure in alloc_l2_table: entry 3 (XEN) mm.c:2141:d139v0 Error while validating mfn 112f5b (pfn dbb) for type 2200000000000000: caf=8000000000000003 taf=2200000000000001 (XEN) mm.c:1412:d139v0 Failure in alloc_l3_table: entry 3 (XEN) mm.c:2141:d139v0 Error while validating mfn 112f60 (pfn db6) for type 3000000000000000: caf=8000000000000003 taf=3000000000000001 (XEN) mm.c:3044:d139v0 Error while pinning mfn 112f60 (XEN) traps.c:459:d139v0 Unhandled bkpt fault/trap [#3] on VCPU 0 [ec=0000] (XEN) domain_crash_sync called from entry.S: fault at ffff82d080231894 compat_create_bounce_frame+0xda/0xf2	2016-08-23 11:03:52 +00:00
maxv	a38b5c9a58	Make the I/O area non-executable on Xen.	2016-08-11 15:35:10 +00:00
maxv	d7d5f3349a	Map the recursive slot and page table pages as non-executable on Xen. Same as normal x86.	2016-08-03 11:51:18 +00:00
maxv	2f746d1585	Map the kernel text, rodata and data+bss independently on Xen, with respectively RX, R and RW.	2016-08-02 14:21:53 +00:00
maxv	d2a4f6b0ae	Use PG_RO instead of a magic zero.	2016-08-02 13:29:35 +00:00
maxv	039c7ddcb0	KNF, and use PAGE_SIZE instead of NBPG.	2016-08-02 13:25:56 +00:00
msaitoh	71fbb921c3	KNF. No functional change.	2016-07-11 11:31:49 +00:00
msaitoh	8bc54e5be6	KNF. Remove extra spaces. No functional change.	2016-07-07 06:55:38 +00:00
nonaka	6af4cd3352	Pass bus_dma(9) tag to allow for porting sdhc(4) at acpi.	2016-06-21 11:33:32 +00:00
jnemeth	4ec72366f1	- add machdep.xen.version sysctl to easily get hypervisor version - move machdep.xen_timepush_ticks to machdep.xen.timepush_ticks to consolidate all Xen related sysctls under machdep.xen	2016-06-12 09:08:09 +00:00
ozaki-r	d938d837b3	Introduce m_set_rcvif and m_reset_rcvif The API is used to set (or reset) a received interface of a mbuf. They are counterpart of m_get_rcvif, which will come in another commit, hide internal of rcvif operation, and reduce the diff of the upcoming change. No functional change.	2016-06-10 13:27:10 +00:00
jnemeth	d9c5d3073b	Feeding uninitialised garbage to the hypervisor is likely to be a bad idea.	2016-06-08 01:59:06 +00:00
bouyer	55f57916be	Switch to elf notes for amd64 instead of the old key=value list to describe the guest requirements and support. Add infrastructure to query the hypervisor about features support. For verbose boot, print the features suppoted by the hypervisor for this guest.	2016-05-29 17:06:17 +00:00
jnemeth	605ea3fe8e	make CPU microcode loading dependent on both DOM0OPS AND CPU_UCODE	2016-05-20 03:41:20 +00:00
christos	c7f0ba033b	Account for the CRC len (Jean-Jacques.Puig)	2016-05-09 15:11:35 +00:00
mlelstv	6846a009d9	no condition for cpu_rng here	2016-02-27 15:42:20 +00:00
mlelstv	850f8baca4	add missing cpu_rng.c to kernel	2016-02-27 14:28:50 +00:00
bouyer	f35e6799e0	In xennet_xenbus_detach(), remove the event handler early (just after xennet_stop()) so that we don't get events while slepping (e.g. in softint_disestablish()) when some structures have already been freed. Problem reported and patch tested by Rohan Desai.	2016-02-16 08:41:32 +00:00
ozaki-r	9c4cd06355	Introduce softint-based if_input This change intends to run the whole network stack in softint context (or normal LWP), not hardware interrupt context. Note that the work is still incomplete by this change; to that end, we also have to softint-ify if_link_state_change (and bpf) which can still run in hardware interrupt. This change softint-ifies at ifp->if_input that is called from each device driver (and ieee80211_input) to ensure Layer 2 runs in softint (e.g., ether_input and bridge_input). To this end, we provide a framework (called percpuq) that utlizes softint(9) and percpu ifqueues. With this patch, rxintr of most drivers just queues received packets and schedules a softint, and the softint dequeues packets and does rest packet processing. To minimize changes to each driver, percpuq is allocated in struct ifnet for now and that is initialized by default (in if_attach). We probably have to move percpuq to softc of each driver, but it's future work. At this point, only wm(4) has percpuq in its softc as a reference implementation. Additional information including performance numbers can be found in the thread at tech-kern@ and tech-net@: http://mail-index.netbsd.org/tech-kern/2016/01/14/msg019997.html Acknowledgment: riastradh@ greatly helped this work. Thank you very much!	2016-02-09 08:32:07 +00:00
bouyer	13ee92e7ec	Apply patch from xsa155: make sure that the backend won't read parts of the request again (possibly because of compiler optimisations), by using copies and barrier. From XSA155: The compiler can emit optimizations in the PV backend drivers which can lead to double fetch vulnerabilities. Specifically the shared memory between the frontend and backend can be fetched twice (during which time the frontend can alter the contents) possibly leading to arbitrary code execution in backend.	2016-01-06 15:28:40 +00:00
christos	19d921e536	need definition	2015-12-13 16:11:14 +00:00
christos	07d3d8c47c	fix the build.	2015-12-13 15:22:31 +00:00

1 2 3 4 5 ...

1262 Commits