- sparc and sparc64 were not using version 0 sigcontext when there were
no arguments in the signal version. This was probably a bug.
- vax is using +1 the version numbers of the other archs.
- Only hppa was defining __LIBC12_SOURCE__ so it was getting a working
sigcontext before. all the other ports that supported sigcontext had
the compat code disabled.
[pointed out by thorpej, thanks!]
If we want to remove sigcontext support from userland at least now there
is less work to do so.
wss has two attributes, "wss" and "audiobus", and this call didn't
specify an iattr for opl to attach to. config_search_internal asserts
that when no iattr is specified, the parent should only have one.
This harmonizes efiboot and the various x86 bootloaders to use shared
code for printing the banner. By friendly coincidence, it also adds
support for specifying 'banner=' in arm efiboot's boot.cfg, as on x86.
This exists for compatibility with a Linux interface which was apparently
deprecated in Linux 2.6. There are various mailing list threads going
back to 2004 where the usefulness of this driver is discussed, but
the conclusion is that scanner software has all moved to using ugen(4)
instead, and enabling this driver will not help you scan things.
In r. 1.13, a check of the return value in fd was removed, and a comment
about this ("...so keep going") added. Then in r. 1.19, the fd return
value check was reinstated (as the true underlying errno could be masked
by the fstat() call), so there is no "keep going" happening anymore.
db_read_bytes already does this better (but didn't at the time this
check was originally added back in 1998). Not sure if this code had
the same mistake as the amd64 code causing it to trip over its own
shoelaces, but there should be no need for it here.
than PAGE_SIZE and fails spuriously.
XXX: Note the duplicate definition hacks. Should really create <x86/gdt.h>,
put the just the constants there and unify them.
This would also avoid the hack in: src/tests/lib/libi386/t_user_ldt.c#46
Simplify and make extensible the config_search() / config_found() /
config_attach() interfaces: rather than having different variants for
which arguments you want pass along, just have a single call that
takes a variadic list of tag-value arguments.
Adjust all call sites:
- Simplify wherever possible; don't pass along arguments that aren't
actually needed.
- Don't be explicit about what interface attribute is attaching if
the device only has one. (More simplification.)
- Add a config_probe() function to be used in indirect configuiration
situations, making is visibly easier to see when indirect config is
in play, and allowing for future change in semantics. (As of now,
this is just a wrapper around config_match(), but that is an
implementation detail.)
Remove unnecessary or redundant interface attributes where they're not
needed.
There are currently 5 "cfargs" defined:
- CFARG_SUBMATCH (submatch function for direct config)
- CFARG_SEARCH (search function for indirect config)
- CFARG_IATTR (interface attribte)
- CFARG_LOCATORS (locators array)
- CFARG_DEVHANDLE (devhandle_t - wraps OFW, ACPI, etc. handles)
...and a sentinel value CFARG_EOL.
Add some extra sanity checking to ensure that interface attributes
aren't ambiguous.
Use CFARG_DEVHANDLE in MI FDT, OFW, and ACPI code, and macppc and shark
ports to associate those device handles with device_t instance. This
will trickle trough to more places over time (need back-end for pre-OFW
Sun OBP; any others?).
I am going to add ATF tests for these two functions, and having them
in a separate file will make it more convenient to build and run them
in userspace.
Fix the machine-dependent ptrace register-related requests (e.g.
PT_GETXMMREGS, PT_GETXSTATE on x86) to correctly respect the LWP number
passed as the data argument. Before this change, these requests
did not operate on the requested LWP of a multithreaded program.
This change required moving ptrace_update_lwp() out of unit scope,
and changing ptrace_machdep_dorequest() function to take a pointer
to pointer as the second argument, consistently with ptrace_regs().
I am planning to extend the ATF ptrace() register tests in the future
to check for regressions in multithreaded programs, as time permits.
Reviewed by kamil.
right now. new address-of-packed-member and format-overflow
warnings have new GCC_NO_ADDR_OF_PACKED_MEMBER amd
GCC_NO_FORMAT_OVERFLOW variables to remove these warnings.
apply to a bunch of the tree. mostly, these are real bugs that
should be fixed, but in many cases, only by removing the 'packed'
attribute from some structure that doesn't really need it. (i
looked at many different ones, and while perhaps 60-80% were
already properly aligned, it wasn't clear to me that the uses
were always coming from sane data vs network alignment, so it
doesn't seem safe to remove packed without careful research for
each affect struct.) clang already warned (and was not erroring)
for many of these cases, but gcc picked up dozens more.
This is realized by following modifications:
- Add IDT pages and its allocation maps for each cpu in "struct cpu_info"
- Load per-cpu IDTs at cpu_init_idt(struct cpu_info*)
- Copy the IDT entries for cpu0 to other CPUs at attach
- These are, for example, exceptions, db, system calls, etc.
And, added a kernel option named PCPU_IDT to enable the feature.
architectures.
Notes:
- On alpha and ia64 the function is kept but gets renamed locally to avoid
symbol collision. This is because on these two arches, I am not sure
whether the ASM callers do not rely on fixed registers, so I prefer to
keep the ASM body for now.
- On Vax, only the symbol is removed, because the body is used from other
functions.
- On RISC-V, this change fixes a bug: copystr() was just a wrapper around
strlcpy(), but strlcpy() makes the operation less safe (strlen on the
source beyond its size).
- The kASan, kCSan and kMSan wrappers are removed, because now that
copystr() is in C, the compiler transformations are applied to it,
without the need for manual wrappers.
Could test on amd64 only, but should be fine.
x86/x86/tsc.c rev. 1.67 reduced cache problem and got big improvement, but it
still has room. I measured the effect of lfence, mfence, cpuid and rdtscp.
The impact to TSC skew and/or drift is:
AMD: mfence > rdtscp > cpuid > lfence-serialize > lfence = nomodify
Intel: lfence > rdtscp > cpuid > nomodify
So, mfence is the best on AMD and lfence is the best on Intel. If it has no
SSE2, we can use cpuid.
NOTE:
- An AMD's document says DE_CFG_LFENCE_SERIALIZE bit can be used for
serializing, but it's not so good.
- On Intel i386(not amd64), it seems the improvement is very little.
- rdtscp instruct can be used as serializing instruction + rdtsc, but
it's not good as [lm]fence. Both Intel and AMD's document say that
the latency of rdtscp is bigger than rdtsc, so I suspect the difference
of the result comes from it.
This violates the security contract of the CBC construction, which
requires that the IV be unpredictable in advance; an adaptive adversary
can exploit this to verify plaintext guesses.
XXX Compile-tested only.
The kernel option is introduced to realize softint-based if_input.
Since the same scheme has been implemented in if_percpuq_enqueue(),
the option is no longer needed.
pointed out by ozaki-r@n.o.
And the associated ezload EZ-USB code, which is only used by uyap.
It could theoretically be used by other drivers, but none of them are
in tree.
I suspect that this device isn't in use, as phone technology has improved
a lot since 2001 when uyap(4) was added to the tree.
Proposed with no objections on netbsd-users on 13 April 2020
use PHYSDEVOP_map_pirq to get the pirq/gsi for MSI/MSI-X, switch also INTx
to use it instead of PHYSDEVOP_alloc_irq_vector
MSI confirmed working with single-vector MSI for wm(4), ahcisata(4), bge(4)
XXX added some provision for MSI-X, but it doesn't actually work (no interrupts
delivered), needs some further investigation; disable MSI-X for XENPV
via flag in x86/pci/pci_machdep.c
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH
Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().
Currently x86_patch_window_open is a big problem, because it is a perfect
function to inject/modify executable code with ROP.
- Remove x86_patch_window_open(), along with its x86_patch_window_close()
counterpart.
- Introduce a read-only link-set of hotpatch descriptor structures,
which reference a maximum of two read-only hotpatch sources.
- Modify x86_hotpatch() to open a window and call the new
x86_hotpatch_apply() function in a hard-coded manner.
- Modify x86_hotpatch() to take a name and a selector, and have
x86_hotpatch_apply() resolve the descriptor from the name and the
source from the selector, before hotpatching.
- Move the error handling in a separate x86_hotpatch_cleanup() function,
that gets called after we closed the window.
The resulting implementation is a bit complex and non-obvious. But it
gains the following properties: the code executed in the hotpatch window
is strictly hard-coded (no callback and no possibility to execute your own
code in the window) and the pointers this code accesses are strictly
read-only (no possibility to forge pointers to hotpatch an area that was
not designated as hotpatchable at compile-time, and no possibility to
choose what bytes to write other than the maximum of two read-only
templates that were designated as valid for the given destination at
compile-time).
With current CPUs this slightly improves a situation that is already
pretty bad by definition on x86. Assuming CET however, this change closes
a big hole and is kinda great.
The only ~problem there is, is that dtrace-fbt tries to hotpatch random
places with random bytes, and there is just no way to make it safe.
However dtrace is only in a module, that is rarely used and never compiled
into the kernel, so it's not a big problem; add a shitty & vulnerable
independent hotpatch window in it, and leave big XXXs. It looks like fbt
is going to collapse soon anyway.
Push it all into MD x86 code to keep it simpler, until we have other
examples on other CPUs. Simplify RDSEED-to-RDRAND fallback.
Eliminate cpu_earlyrng in favour of just using entropy_extract, which
is available early now.
sized mechanism that was too complex.
This fixes a race between USER_LDT and SVS: during context switches, the
way SVS installs the new ldt relies on the ldt pointer AND the ldt size,
but both cannot be accessed atomically at the same time.
arg are separate arguments - this is needed for newer physdev_op commands
remove code for PHYSDEVOP_IRQ_UNMASK_NOTIFY, it is obsolete since
interface version 0x00030202 and is unsupported by newer versions of Xen
confirmed working on amd64 Dom0, i386 compile-tested only
split off __XEN_INTERFACE_VERSION__ to new xen/conf/std.xenversion
and use from both i386/conf/std.xen and amd64/conf/stf.xen, so that
there is single place for the definition
/netbsd/modules respectively instead of /netbsd and
/stand/<arch>/<version>/modules. This is only supported for x86,
and is turned off by default. To try it, add KERNEL_DIR=yes in your
/mk.conf and install a system from that build.
(m68k uses 40msec default as before). And remove the option from GENERIC.
- It's not good idea to set such parameter in individual GENERICs.
- 4msec is (probably no problem for most modern real hardware but)
too aggressive to be default.
- 10msec is too severe for antique machines but it's hard to draw a line.
cpu_switch(): Remove stuff dealing with interrupt levels. From memory it
was something to do with TLB shootdown interrupts but they have long been
outside the SPL framework.
This did not do what I thought it did. opt_diagnostic.h is only for
the unused _DIAGNOSTIC, which seems like an abortive attempt to
incrementally convert DIAGNOSTIC to an opt_*.h option rather than a
command-line option.
- If a GOP mode wasn't explicitly requested, the bootloader was passing
fb info to the kernel even if the console was in text mode! This
results in garbled console output on at least ThinkPad T420 and
likely many others. If a mode isn't specified, default to 800x600.
- The "gop" command was incorrectly parsing video modes in the form
WxHxD as WxWxD.
- Allow a short form WxH for the "gop" command to select any mode with
the target dimensions.
At this point it is highly unlikely this 1999 device still has users,
but it still comes up in the context of maxv's USB-fuzzing (and any device
could pretend to be a urio(4)), so it's best to get rid of it.
Renamed all major entries to obsolete, as was done in previous removals.
This still requires an update to sanitizers, but they're located in
"external", perhaps it should be first committed upstream?
Proposed on tech-kern a month ago.
- If the config had both an le@pci and a pcn, simply remove le@pci
(pcn would match at a higher priority anyway).
- If the config had le@pci enabled, but no pcn, change le@pci to pcn.
- If the config had le@pci commented out, but no pcn, change le@pci
to pcn and leave it commented out.
The pcn driver supports more chips than le@pci and does DMA directly
to/from mbufs rather than memory copies.
Discussed on tech-kern:
https://mail-index.NetBSD.org/tech-kern/2020/01/13/msg025938.html
This was never (intentionally) enabled by default, and the design has
some shortcomings. You can get mostly the same results with ktrace,
as in usr.bin/make/filemon/filemon_ktrace.c which is now used instead
of filemon for make's meta mode.
If applications require higher performance than ktrace, or nesting
that ktrace doesn't support, we might consider adding something back
into the vfs system calls themselves, without hijacking the syscall
table. (Might want a more reliable output format too, e.g. one that
can handle newlines in file names.)
This is a driver for a "nonsense machine" made by the art group Maywa-Denki
in 2008. It was disabled by default.
Unfortunately even so it draws development attention (flaws found in the
code, MP-ification needs) and it is best not to continue to maintain this
driver.
Proposed without objections on tech-kern.
where curcpu() is defined as curlwp->l_cpu:
- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.
- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.
- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.
- Remove some KERNEL_LOCK handling which hasn't been needed for years.
Introduce a simple COREDUMP_MACHDEP_LWP_NOTES logic to provide machdep
API for injecting per-LWP notes into coredumps, and use it to append
PT_GETXSTATE note.
Since the XSTATE block uses the same format on i386 and amd64, the code
does not have to conditionalize between 32-bit and 64-bit ELF format
on that. However, it does need to distinguish between 32-bit and 64-bit
PT_* values. In order to do that, it reuses PT32_* constant already
present for ptrace(), and adds a matching PT64_GETXSTATE to satisfy
the cpp logic.
- When handling the source-is-masked case in the interrupt vector, set the
interrupt bit in a new ci_imasked field and ensure the bit is cleared
from ci_ipending.
- In intr_unmask(), transfer the bit from ci_imasked to ci_ipending for
non-level-sensitive interrupts (the PIC does the work for us in the
level-sensitive case), and only force pending interrupts to be processed
in this case. (In all cases, make sure the now-unmasked bit is cleared
from ci_imasked.)
Before, the bit was left in ci_ipending so as not to use edge-triggered
interrupts while the source is masked, but Xspllower() relies on the
pending bits getting cleared.
Tested by forcing all wm(4) interrupts on my test system though an
intr_mask() / softint / intr_unmask() cycle and exercising the network
heavily.
Protect __lwp_getprivate_fast() with _RTLD_SOURCE, _LIBC_SOURCE and
__LIBPTHREAD_SOURCE__.
Include in this namespace <sys/tcl.h> and use __BEGIN_DECLS/__END_DECLS
for the sake of consistency.
iic_acquire_bus() / iic_release_bus(). "acquire" and "release" hooks
no longer need to be provided by back-end controller drivers (only if
they need special handling, e.g. powering on the i2c controller).
This results in the removal of a bunch of rendundant code from each
back-end controller driver.
Assert that we are not in hard interrupt context in iic_acquire_bus(),
iic_exec(), and iic_release_bus().
When booting sysinst from UEFI, it defaults to a GPT installation
where partition have no labels. Bootstrap used the NAME=label partition
anyway, with the result that both EFI and FFS root partition had
the same name "NAME=" and could not be distinguished. The first matching
partition for the name was used, and bootstrap looked for the kernel
in the EFI partition.
We fix that by not using NAME=label names for partition when label
is empty. In that case we revert to old syntax such as hd0b
The code to select boot partition in RAID assumed thet had a name,
which is true when there is a GPT inside the RAID, but not when there
is a disklabel inside the RAID. This caused a regression from behavior
of NetBSD 8.1.
We fix this by allowing nameless partition to be boot candidates.
This fixes misc/54748
While there, let raid device be used in the boot specification, like
raid0a:/netbsd.
rw_enter(lock, RW_READ);
Having instrumented it, it turns out that >99.5% of the time the lock is
completely unknowned. Make this assumption in the assembly stub for
rw_enter(), and avoid the initial read of the lock word. Where there are
existing read holds, we'll do an additional CMPXCHG but should already have
the cache line in the EXCLUSIVE state.
That implementation works either with BIOS or UEFI bootstrap
This requires the following kernel changes:
Add UEFI boot services and I/O method protoypes
src/sys/arch/x86/include/efi.h 1.8 - 1.9
Fix EFI system table mapping in virtual space
src/sys/arch/x86/x86/efi.c 1.19 - 1.20
Make sure no bioscall is issued when booting off UEFI system
src/sys/arch/i386/i386/machdep.c 1.821 - 1.822
src/sys/arch/i386/pci/piixpcib.c 1.22 - 1.23
And the following bootstrap changes:
Add kernel symbols for multiboot1
src/sys/arch/i386/stand/lib/exec_multiboot1.c 1.2 - 1.3
src/sys/arch/i386/stand/lib/libi386.h 1.45 - 1.47
Fix kernel symbols for multiboot2
src/sys/arch/i386/stand/lib/exec_multiboot2.c 1.2 - 1.3
Previous version just provided the ELF section table, which is correct
as far as the multiboot 2 specification is concerned.
But in order to retreive kernel symboles, the NetBSD kernelneeds symbol
table and string table sections to be loaded in memory, and have an
address set in the section table.
Requires change: Add kernel symbols for multiboot1
src/sys/arch/i386/stand/lib/exec_multiboot1.c 1.2 - 1.3
src/sys/arch/i386/stand/lib/libi386.h 1.45 - 1.46
This change allows to pick code paths in the kernel that are tuned for
alignment sensitive (and stricted in C meaning) code paths. In particular
the IPv6 code uses this heavily and skips whenever possible the process
of aligning of networking data.
With this modification all ATF tests are executed on amd64 without
triggering any UBSan reports in dmesg.
In theory __NO_STRICT_ALIGNMENT could be tuned for vax and m68k, however
these machines are still unsupported in LLVM sanitizers and syzkaller.
sys/netinet6/scope6.c:404:6, member access within misaligned address 0xfffffaea81276086 for type 'struct in6_addr' which requires 4 byte alignment
Reported-by: syzbot+a86f58d17685317b3df9@syzkaller.appspotmail.com
sys/net/rtsock_shared.c:629:41, member access within misaligned address 0xffffddb5db3ff04c for type 'struct rt_msghdr50' which requires 8 byte alignment
Reported-by: syzbot+0a3a022bc9d2b8880c16@syzkaller.appspotmail.com
multiboot 2 is required to boot Xen on an EFI system.
This also require a kernel patch for properly discovering
the ACPI RSDP, which is available after 20190912, in
src/sys/arch/x86/acpi/acpi_machdep.c 1.26-1.28
There are a few missing bit in this multiboot 2 implementation
(which are unused by Xen):
- Header tags Address, Freambuffer, and Relocatable are ignored
- Tags APM and Network are not provided
- Tags ACPI old and ACP new are only provided for ACPI boot
- Tag boot device does not provides the subpart (BSD disklabel partition)
Notes:
- multiboot2 is disabled in dosboot, otherwise the binary
gets too big and build fails.
- in src/sys/arch/i386/stand/efiboot, consinit() is renamed
as efi_consinit() to avoid prototype conflicts in src/sys/sys/systm.h
To be used with ALIGNED_POINTER(p,t) instead of writing *(const t *)p
directly. This way, on machines without strict alignment, we can use
memcpy to pacify sanitizers, while getting the same compiled code in
the end with a single (say) MOV instruction.
Classic BIOS (/boot) and EFI bootloaders can now name devices
using the NAME=gpt_label syntax, or using raid partitions. Here
are examples:
boot NAME=root:/netbsd
boot raid0e:/netbsd