The data can be accessed from sysctl, ioctl, interface watchdog
(if_slowtimo) and interrupt handlers. We need to protect the data against
parallel accesses from them.
Currently the mutex is applied to some drivers, we need to apply it to all
drivers in the future.
Note that the mutex is adaptive one for ease of implementation but some
drivers access the data in interrupt context so we cannot apply the mutex
to every drivers as is. We have two options: one is to replace the mutex
with a spin one, which requires some additional works (see
ether_multicast_sysctl), and the other is to modify the drivers to access
the data not in interrupt context somehow.
memory, and taking it below the kernel image was fine: we had 160 free
pages, and never allocated more than 20. With amd64 however, we create a
direct map, and for this map we need a number of page table pages that is
mostly proportionate to the number of physical addresses available, which
implies that these 160 free pages may not be enough.
In particular, if the CPU does not support 1GB superpages, each 1GB chunk
of physical memory needs a 4k page in the direct map, which means that if
a machine has 160GB of ram, the bootstrap code allocates more than 160
pages, thereby overwriting the I/O mem area. If we push a little further,
if a machine has 512GB of ram, we allocate ~525 pages, and start
overwriting the kernel text, causing the system to go crazy at boot time.
Fix this moving the physical allocation area from below the kernel to above
it. avail_start is now beyond the kernel, and lowmem_rsvd indicates the
reserved low-memory pages. The area [lowmem_rsvd; IOM_BEGIN[ is
internalized into UVM, so there is no pa loss.
The only limit now is the pa of LAPIC, which is located at ~4GB of memory,
so it is perfectly fine.
This change theoretically adds va support for 512GB of ram; and it is a
prerequisite if we want to support more memory anyway.
all, with magic offsets here and there in different layers of the system.
It is just blind luck that everything has always worked as expected so
far.
Due to this wrong design we have a problem now: we allocate one physical
page for lapic, and it happens to overlap with the dummy page, which
causes the system to crash.
Fix this by keeping the dummy va directly in a variable instead of magic
offsets. The asm locore now increments the first pa to hide the dummy page
to machdep and pmap.
Add new ptrace(2) calls:
- PT_COUNT_WATCHPOINTS - count the number of available hardware watchpoints
- PT_READ_WATCHPOINT - read struct ptrace_watchpoint from the kernel state
- PT_WRITE_WATCHPOINT - write new struct ptrace_watchpoint state, this
includes enabling and disabling watchpoints
The ptrace_watchpoint structure contains MI and MD parts:
typedef struct ptrace_watchpoint {
int pw_index; /* HW Watchpoint ID (count from 0) */
lwpid_t pw_lwpid; /* LWP described */
struct mdpw pw_md; /* MD fields */
} ptrace_watchpoint_t;
For example amd64 defines MD as follows:
struct mdpw {
void *md_address;
int md_condition;
int md_length;
};
These calls are protected with the __HAVE_PTRACE_WATCHPOINTS guard.
Tested on amd64, initial support added for i386 and XEN.
Sponsored by <The NetBSD Foundation>
The benefits of the change are:
- We can reduce codes
- We can provide the same behavior between drivers
- Where/When if_ipackets is counted up
- Note that some drivers still update packet statistics in their own
way (periodical update)
- Moved bpf_mtap run in softint
- This makes it easy to MP-ify bpf
Proposed on tech-kern and tech-net
This feature was intended to detect stack overflow with CPU Debug Registers
(x86). It was never ported to other ports, neither amd64 and should be
adapted for SMP...
Currently there might be better ways to detect stack overflows like page
mapping protection. Since the number of Debug Registers is restricted
(4 on x86), torn it down completely.
This interface introduced helper functions for Debug Registers, they will
be replaced with the new <x86/dbregs.h> interface.
KSTACK_CHECK_DR0 was disabled by default and won't affect ordinary users.
Sponsored by <The NetBSD Foundation>
This is more opaque and appropriate type, as vaddr_t is meant to be used
for vitual address value. Not all DR on x86 are used to represent virtual
address (DR6 and DR7 are definitely not).
No functional change intended.
Change suggested by <christos>
Sponsored by <The NetBSD Foundation>
implementation expects this va to be valid even if no lapic is present;
which probably is a bug in itself, but let's just reproduce the old
behavior and rehide that bug.
There are 8 Debug Registers on i386 (available at least since 80386) and 16
on AMD64. Currently DR4 and DR5 are reserved on both cpu-families and
DR9-DR15 are still reserved on AMD64. Therefore add accessors for DR0-DR3,
DR6-DR7 for all ports.
Debug Registers x86:
* DR0-DR3 Debug Address Registers
* DR4-DR5 Reserved
* DR6 Debug Status Register
* DR7 Debug Control Register
* DR8-DR15 Reserved
Access the registers is available only from a kernel (ring 0) as there is
needed top protected access. For this reason there is need to use special
XEN functions to get and set the registers in the XEN3 kernels.
XEN specific functions as defined in NetBSD:
- HYPERVISOR_get_debugreg()
- HYPERVISOR_set_debugreg()
This code extends the existing rdr6() and ldr6() accessor for additional:
- rdr0() & ldr0()
- rdr1() & ldr1()
- rdr2() & ldr2()
- rdr3() & ldr3()
- rdr7() & ldr7()
Traditionally accessors for DR6 were passing vaddr_t argument, while it's
appropriate type for DR0-DR3, DR6-DR7 should be using u_long, however it's
not a big deal. The resulting functionality should be equivalent so stick
to this convention and use the vaddr_t type for all DR accessors.
There was already a function defined for rdr6() in XEN, but it had a nit on
AMD64 as it was casting HYPERVISOR_get_debugreg() to u_int (32-bit on
AMD64), truncating result. It still works for DR6, but for the sake of
simplicity always return full 64-bit value.
New accessors duplicate functionality of the dr0() function available on
i386 within the KSTACK_CHECK_DR0 option. dr0() is a specialized layer with
logic to set appropriate types of interrupts, now accessors are designed to
pass verbatim values from user-land (with simple sanity checks in the
kernel). At the moment there are no plans to make possible to coexist
KSTACK_CHECK_DR0 with debug registers for user applications (debuggers).
options KSTACK_CHECK_DR0
Detect kernel stack overflow using DR0 register. This option uses DR0
register exclusively so you can't use DR0 register for other purpose
(e.g., hardware breakpoint) if you turn this on.
The KSTACK_CHECK_DR0 functionality was designed for i386 and never ported
to amd64.
Code tested on i386 and amd64 with kernels: GENERIC, XEN3_DOMU, XEN3_DOM0.
Sponsored by <The NetBSD Foundation>
and i386. The old design was error-prone, and it didn't allow us to map the
data segment with large pages.
Now, the VA is allocated dynamically in the pmap bootstrap code, and entered
manually later. We go from using &local_apic to using *local_apic_va, and we
therefore need one more level of indirection in the asm code.
Discussed on tech-kern.
For the record: normally we could enable this code on Xen, since the
bootstrap layout is globally the same. But there appears to be an issue
in xen_locore, since any kenter in the area after kern_end triggers a
KASSERT because the va is already busy.
As the comment "find first available ipv4 address" indicates,
if an IP address is found, we need to leave the two nested loops,
a loop for an interface list and a loop for IP addresses of
an interface. However, the original code broke away only from
the inner loop.
The original (wrong) behavior was non-critical, which just
returned a non-first IP address. Unfortunately, after applying
psref, the behavior may call psref_acquire twice to a target
with the same psref object, resulting in a kernel panic eventually.
When the remote CPUs receive the ACPI sleep IPI, they do not save the fpu
state of the lwp they are executing. The problem is, when waking up they
reinitialize the registers of their local fpu and go back to their lwp
directly. Therefore, if an lwp is interrupted while storing data in an fpu
register, that data gets overwritten, which basically means the lwp is
likely to go crazy when resuming execution.
Fix this by simply saving the fpu state correctly. This way when going to
sleep the state is stored in the lwp's pcb and CR0_TS is set, so the next
time the lwp wants to use the fpu we'll get a dna, and the state will be
restored as expected.
While here, don't forget to reenable interrupts (and the spl) if an error
occurs.
- add missing membar_store_store ("membar_producer") when setting a
new ldt;
- use UVM_KMF_WAITVA when allocating space for a new ldt instead of
crashing if uvm_km_alloc fails;
- if uvm_km_alloc fails in pmap_fork, bail instead of crashing;
- clarify what else is going on in pmap_fork;
- don't uvm_km_free while holding a mutex.