Commit Graph

2003 Commits

Author SHA1 Message Date
ozaki-r
bf5ce79b5b Protect ec_multi* with mutex
The data can be accessed from sysctl, ioctl, interface watchdog
(if_slowtimo) and interrupt handlers. We need to protect the data against
parallel accesses from them.

Currently the mutex is applied to some drivers, we need to apply it to all
drivers in the future.

Note that the mutex is adaptive one for ease of implementation but some
drivers access the data in interrupt context so we cannot apply the mutex
to every drivers as is. We have two options: one is to replace the mutex
with a spin one, which requires some additional works (see
ether_multicast_sysctl), and the other is to modify the drivers to access
the data not in interrupt context somehow.
2016-12-28 07:32:16 +00:00
hikaru
e88860b723 Use the correct number of multicast addrs 2016-12-27 03:09:55 +00:00
cherry
9db6d2188c the i386 and amd64 boot time msgbuf init code is nearly identical.
Unify them into x86/x86_machdep.c:init_x86_msgbuf()

Boot tested on GENERIC (i386, amd64), XEN3_DOM0 (amd64)
2016-12-26 17:54:06 +00:00
bouyer
ad7eb753d3 Xen doens't need lapic so don't allocate a lapic VA/PA for Xen.
As a side effect this makes XEN3PAE boot again but I don't know why ...
2016-12-22 16:29:05 +00:00
maxv
2827e84396 When the i386 port was designed, the bootstrap code needed little physical
memory, and taking it below the kernel image was fine: we had 160 free
pages, and never allocated more than 20. With amd64 however, we create a
direct map, and for this map we need a number of page table pages that is
mostly proportionate to the number of physical addresses available, which
implies that these 160 free pages may not be enough.

In particular, if the CPU does not support 1GB superpages, each 1GB chunk
of physical memory needs a 4k page in the direct map, which means that if
a machine has 160GB of ram, the bootstrap code allocates more than 160
pages, thereby overwriting the I/O mem area. If we push a little further,
if a machine has 512GB of ram, we allocate ~525 pages, and start
overwriting the kernel text, causing the system to go crazy at boot time.

Fix this moving the physical allocation area from below the kernel to above
it. avail_start is now beyond the kernel, and lowmem_rsvd indicates the
reserved low-memory pages. The area [lowmem_rsvd; IOM_BEGIN[ is
internalized into UVM, so there is no pa loss.

The only limit now is the pa of LAPIC, which is located at ~4GB of memory,
so it is perfectly fine.

This change theoretically adds va support for 512GB of ram; and it is a
prerequisite if we want to support more memory anyway.
2016-12-20 14:03:15 +00:00
maxv
ba62531c60 Depend on KERNTEXTOFF - KERNBASE, not IOM_END, both are equal but the text
address may change in the future.
2016-12-20 12:48:30 +00:00
maxv
3a8b7cad7a Remove a wrong comment - the FPU save size should never be percpu -, and
be more explicit about Xen.
2016-12-17 15:23:08 +00:00
maxv
663ef5ea2b Use pmap_bootstrap_valloc and simplify. By the way, I think the cache
stuff is wrong, since the pte is not necessarily aligned to 64 bytes, so
nothing guarantees there is no false sharing.
2016-12-17 13:43:33 +00:00
maxv
8a3e058bbd The way the xen dummy page is taken care of makes absolutely no sense at
all, with magic offsets here and there in different layers of the system.
It is just blind luck that everything has always worked as expected so
far.

Due to this wrong design we have a problem now: we allocate one physical
page for lapic, and it happens to overlap with the dummy page, which
causes the system to crash.

Fix this by keeping the dummy va directly in a variable instead of magic
offsets. The asm locore now increments the first pa to hide the dummy page
to machdep and pmap.
2016-12-16 19:52:22 +00:00
kamil
241cf91ddc Add support for hardware assisted watchpoints/breakpoints API in ptrace(2)
Add new ptrace(2) calls:
 - PT_COUNT_WATCHPOINTS - count the number of available hardware watchpoints
 - PT_READ_WATCHPOINT   - read struct ptrace_watchpoint from the kernel state
 - PT_WRITE_WATCHPOINT  - write new struct ptrace_watchpoint state, this
                          includes enabling and disabling watchpoints

The ptrace_watchpoint structure contains MI and MD parts:

typedef struct ptrace_watchpoint {
	int		pw_index;	/* HW Watchpoint ID (count from 0) */
	lwpid_t		pw_lwpid;	/* LWP described */
	struct mdpw	pw_md;		/* MD fields */
} ptrace_watchpoint_t;

For example amd64 defines MD as follows:
struct mdpw {
	void	*md_address;
	int	 md_condition;
	int	 md_length;
};

These calls are protected with the __HAVE_PTRACE_WATCHPOINTS guard.

Tested on amd64, initial support added for i386 and XEN.

Sponsored by <The NetBSD Foundation>
2016-12-15 12:04:17 +00:00
ozaki-r
dd8638eea5 Move bpf_mtap and if_ipackets++ on Rx of each driver to percpuq if_input
The benefits of the change are:
- We can reduce codes
- We can provide the same behavior between drivers
  - Where/When if_ipackets is counted up
  - Note that some drivers still update packet statistics in their own
    way (periodical update)
- Moved bpf_mtap run in softint
  - This makes it easy to MP-ify bpf

Proposed on tech-kern and tech-net
2016-12-15 09:28:02 +00:00
kamil
192986ec23 Torn down KSTACK_CHECK_DR0, i386-only feature to detect stack overflow
This feature was intended to detect stack overflow with CPU Debug Registers
(x86). It was never ported to other ports, neither amd64 and should be
adapted for SMP...

Currently there might be better ways to detect stack overflows like page
mapping protection. Since the number of Debug Registers is restricted
(4 on x86), torn it down completely.

This interface introduced helper functions for Debug Registers, they will
be replaced with the new <x86/dbregs.h> interface.

KSTACK_CHECK_DR0 was disabled by default and won't affect ordinary users.

Sponsored by <The NetBSD Foundation>
2016-12-13 10:54:27 +00:00
kamil
9b94e002e8 Switch x86 CPU Debug Register types from vaddr_t to register_t
This is more opaque and appropriate type, as vaddr_t is meant to be used
for vitual address value. Not all DR on x86 are used to represent virtual
address (DR6 and DR7 are definitely not).

No functional change intended.

Change suggested by <christos>

Sponsored by <The NetBSD Foundation>
2016-12-13 10:21:33 +00:00
maxv
b6a32da467 Kenter local_apic_va to a fake physical page, because our x86
implementation expects this va to be valid even if no lapic is present;
which probably is a bug in itself, but let's just reproduce the old
behavior and rehide that bug.
2016-12-11 08:31:53 +00:00
nat
03783bb56a Add a synthesized pc beeper and keyboard bell for platforms with an audio
device.
2016-12-08 11:31:08 +00:00
msaitoh
29ceb203f4 - Remove "pcommit".
- Add "rdt_a".
2016-12-08 06:28:21 +00:00
msaitoh
9f7ea7417b Add CLWB bit. 2016-12-08 06:11:03 +00:00
ozaki-r
c0e7885f20 Apply deferred if_start framework
if_schedule_deferred_start checks if the if_snd queue contains packets,
so drivers don't need to check it by themselves.
2016-12-08 01:12:00 +00:00
maxv
533b7e32f0 Memory leak, found by Mootja 2016-12-06 15:09:04 +00:00
msaitoh
ec51d3892a Fix CPUID_SEF_FLAGS. Octal value has no 8. 2016-12-05 03:59:47 +00:00
dholland
3f943f301b PR 51672 David Binderman: M_CSUM_TCPv6, not 2x M_CSUM_TCPv4.
(from context it's quite clear that's what's supposed to be here)
2016-11-29 22:27:09 +00:00
martin
572cb7ec52 Mark a variable __diagused as it is only ever used in a KASSERT 2016-11-28 20:12:41 +00:00
knakahara
d018cf3a86 fix build of amd64/i386 with NO_PCI_MSI_MSIX option. 2016-11-28 05:00:41 +00:00
kamil
266caf90fb Add accessors for available x86 Debug Registers
There are 8 Debug Registers on i386 (available at least since 80386) and 16
on AMD64. Currently DR4 and DR5 are reserved on both cpu-families and
DR9-DR15 are still reserved on AMD64. Therefore add accessors for DR0-DR3,
DR6-DR7 for all ports.

Debug Registers x86:
 * DR0-DR3  Debug Address Registers
 * DR4-DR5  Reserved
 * DR6      Debug Status Register
 * DR7      Debug Control Register
 * DR8-DR15 Reserved

Access the registers is available only from a kernel (ring 0) as there is
needed top protected access. For this reason there is need to use special
XEN functions to get and set the registers in the XEN3 kernels.

XEN specific functions as defined in NetBSD:
 - HYPERVISOR_get_debugreg()
 - HYPERVISOR_set_debugreg()

This code extends the existing rdr6() and ldr6() accessor for additional:
 - rdr0() & ldr0()
 - rdr1() & ldr1()
 - rdr2() & ldr2()
 - rdr3() & ldr3()
 - rdr7() & ldr7()

Traditionally accessors for DR6 were passing vaddr_t argument, while it's
appropriate type for DR0-DR3, DR6-DR7 should be using u_long, however it's
not a big deal. The resulting functionality should be equivalent so stick
to this convention and use the vaddr_t type for all DR accessors.

There was already a function defined for rdr6() in XEN, but it had a nit on
AMD64 as it was casting HYPERVISOR_get_debugreg() to u_int (32-bit on
AMD64), truncating result. It still works for DR6, but for the sake of
simplicity always return full 64-bit value.

New accessors duplicate functionality of the dr0() function available on
i386 within the KSTACK_CHECK_DR0 option. dr0() is a specialized layer with
logic to set appropriate types of interrupts, now accessors are designed to
pass verbatim values from user-land (with simple sanity checks in the
kernel). At the moment there are no plans to make possible to coexist
KSTACK_CHECK_DR0 with debug registers for user applications (debuggers).

     options KSTACK_CHECK_DR0
     Detect kernel stack overflow using DR0 register.  This option uses DR0
     register exclusively so you can't use DR0 register for other purpose
     (e.g., hardware breakpoint) if you turn this on.

The KSTACK_CHECK_DR0 functionality was designed for i386 and never ported
to amd64.

Code tested on i386 and amd64 with kernels: GENERIC, XEN3_DOMU, XEN3_DOM0.

Sponsored by <The NetBSD Foundation>
2016-11-27 14:49:21 +00:00
maxv
54a825ed9a Remove this comment and allow the beginning of .data to be mapped with
large pages. The issue is fixed, the lapic va is dynamically allocated
now.
2016-11-25 14:26:53 +00:00
maxv
b487884215 Move the virtual address of the LAPIC page out of the data segment on amd64
and i386. The old design was error-prone, and it didn't allow us to map the
data segment with large pages.

Now, the VA is allocated dynamically in the pmap bootstrap code, and entered
manually later. We go from using &local_apic to using *local_apic_va, and we
therefore need one more level of indirection in the asm code.

Discussed on tech-kern.
2016-11-25 14:12:55 +00:00
maxv
4798650788 Initialize the module map limits in amd64, not x86.
For the record: normally we could enable this code on Xen, since the
bootstrap layout is globally the same. But there appears to be an issue
in xen_locore, since any kenter in the area after kern_end triggers a
KASSERT because the va is already busy.
2016-11-25 11:57:36 +00:00
hikaru
f9f592bf04 Add missing bpf_mtap. 2016-11-25 06:48:37 +00:00
hikaru
63460b9d44 Sync code with FreeBSD to support RSS
- Use MSI/MSI-X if it is available.
- Support TSO.

co-authored by k-nakahara
2016-11-25 05:29:54 +00:00
ozaki-r
61f9115f54 Sweep unnecessary xcall.h inclusions 2016-11-21 04:10:05 +00:00
maxv
c998f797d2 Unmap tmpva once we are done using it, not to pollute the page tree. 2016-11-17 16:32:06 +00:00
maxv
c3b2da209a Remap the pages with G until kern_end, and not just the preloaded modules.
This way the bootstrap tables, proc0's stack and the I/O mem area don't get
flushed each time userland needs a TLB shootdown.
2016-11-17 16:26:07 +00:00
knakahara
321d834b12 avoid a failure of interrupt affinity when the interrupt is pending.
pointed out and reviewed by ozaki-r@n.o, thanks.
2016-11-16 07:13:01 +00:00
maxv
2433a8616e Initialize kern_end in amd64 instead of x86. 2016-11-15 15:00:55 +00:00
maxv
bc776fcc5b Explain why this is the right value, otherwise someone (like me) could be
tempted to increase it. The invlpg part is from rmind, the statistical from
me.
2016-11-13 12:58:40 +00:00
maxv
979460599e Rename xen_pmap_bootstrap to xen_locore, it really has nothing to do with
pmap and is just a C version of what amd64 and i386 do in asm.
2016-11-11 11:34:51 +00:00
maxv
ed2962cbb0 Update the pmap only once 2016-11-11 09:47:18 +00:00
ozaki-r
e956baf9d4 Fix a breakout of loops
As the comment "find first available ipv4 address" indicates,
if an IP address is found, we need to leave the two nested loops,
a loop for an interface list and a loop for IP addresses of
an interface. However, the original code broke away only from
the inner loop.

The original (wrong) behavior was non-critical, which just
returned a non-first IP address. Unfortunately, after applying
psref, the behavior may call psref_acquire twice to a target
with the same psref object, resulting in a kernel panic eventually.
2016-11-10 03:32:04 +00:00
christos
e4377ba18a PR/49691: KAMADA Ken'ichi: free deferred ptp mappings if present.
XXX: pullup-7
2016-11-08 03:05:36 +00:00
maxv
21053717d0 There is a huge fpu synchronization issue here.
When the remote CPUs receive the ACPI sleep IPI, they do not save the fpu
state of the lwp they are executing. The problem is, when waking up they
reinitialize the registers of their local fpu and go back to their lwp
directly. Therefore, if an lwp is interrupted while storing data in an fpu
register, that data gets overwritten, which basically means the lwp is
likely to go crazy when resuming execution.

Fix this by simply saving the fpu state correctly. This way when going to
sleep the state is stored in the lwp's pcb and CR0_TS is set, so the next
time the lwp wants to use the fpu we'll get a dna, and the state will be
restored as expected.

While here, don't forget to reenable interrupts (and the spl) if an error
occurs.
2016-10-20 16:05:04 +00:00
maxv
72c89a7fde Reload the MSRs on the original cpu on i386 - looks like I forgot this part
in my rev1.41. Technically it does not change anything, since the only MSR
is NOX and it is already reloaded in the trampoline.
2016-10-20 14:06:18 +00:00
maxv
cde1c29832 Remove lapic_tpr on amd64 and i386, unused. Now, we have only one pointer
to the LAPIC page, and each register access is done with relative offsets.
2016-10-16 10:51:31 +00:00
maxv
7ab054030e Use the generic i82489_writereg instead of lapic_tpr, for consistency. 2016-10-16 10:24:58 +00:00
jdolecek
89f2f7ea65 provide intr xname 2016-10-15 16:46:14 +00:00
maxv
9b54bbf047 Instead of setting the TPR to the value that was in the data segment, set
zero directly. On amd64, the data version of lapic_tpr is not explicitly
initialized.
2016-10-15 09:50:27 +00:00
skrll
dfeefb1044 Don't include sys/cdefs.h and __KERNEL_RSCID twice... once is enough. 2016-10-07 10:58:03 +00:00
mrg
dfe5471eae use 4-byte style accesses, should hopefully fix PR#37787. 2016-10-01 21:51:52 +00:00
maxv
60a76fef67 Remove outdated comments, typos, rename and reorder a few things. 2016-09-29 17:01:43 +00:00
dholland
bb0514ffb3 LDT handling fixes:
- add missing membar_store_store ("membar_producer") when setting a
    new ldt;
  - use UVM_KMF_WAITVA when allocating space for a new ldt instead of
    crashing if uvm_km_alloc fails;
  - if uvm_km_alloc fails in pmap_fork, bail instead of crashing;
  - clarify what else is going on in pmap_fork;
  - don't uvm_km_free while holding a mutex.
2016-09-24 21:13:44 +00:00
jmcneill
8b9bb61b71 Set hw.acpi.sleep.vbios when a non-HW accelerated VGA driver attaches.
If the VGA_POST option is present in the kernel the default value is 2,
otherwise 1. PR kern/50781

Reviewed by:    agc, mrg
2016-09-21 00:00:06 +00:00