Similar to arch_get_debug_cpu_state(), but the thread whose CPU state
to retrieve is specified. Works only for threads that aren't running,
and on x86-64 we can get the FPU state only when the thread was
interrupted in userland.
Not implemented for the incomplete architecture ports.
* New Intel SkyLake seems to have 9 mapped ranges
at boot. It seems like this define has been creeping
up for a while.
* Resolves the inital issue reported in #11377 on SkyLake
as well. Bonefish mentioned it might need to be raised
again... he had some good foresight there :-)
* I'm seeing the same no bootable partitions issue though
via USB after this raise. (maybe a USB 3.1 thing?)
Reduce duplication of code by
* Removing from elf_common.h definitions available in os/kernel/elf.h
* Deleting elf32.h and elf64.h
* Renaming elf_common.h to elf_private.h
* Updating source to build using public and private ELF header files
together
Signed-off-by: Jessica Hamilton <jessica.l.hamilton@gmail.com>
* Called via arm_mailbox_bcm2835 *and* arm_framebuffer_bcm2835
* This is a bit messy. We really should be getting these
chipset-centric bases from the provided FDT / DTB.
* I can't think of a way to redo this without undoing
work towards FDT.
* The Raspberry pi 2 uses a new SoC which differs slightly
from the Raspberry Pi 1.
* Someday these two board targets could go away when we get
FDT support.
* To while there was some compatibility between
BCM2708 and BCM2805, it makes the BCM2806 changes
more confusing. We don't have any valueable BCM2708
targets.
The BOOT_GDT_SEGMENT_COUNT was based on USER_DATA_SEGMENT on both
x86 and x86_64. However, on x86_64 the order of the segments is
different, leading to a too small gBootGDT array. Move the define to
the arch specific headers so they can be setup correctly in either case.
Also add a STATIC_ASSERT() to check that the descriptors fit into the
array.
Pointed out by CID 1210898.
This patch adds user_access() which can be used to gracefully handle
page faults that may happen when accessing user memory. It is used
by arch_cpu_user{memcpy, memset, strlcpy}() to allow using optimized
functions from the standard library.
Currently only x64 uses this, but nothing really is arch specific here.
Signed-off-by: Paweł Dziepak <pdziepak@quarnos.org>
The kernel is allowed to use fpu anywhere so we must make sure that
user state is not clobbered by saving fpu state at interrupt entry.
There is no need to do that in case of system calls since all fpu
data registers are caller saved.
We do not need, though, to save the whole fpu state at task swich
(again, thanks to calling convention). Only status and control
registers are preserved. This patch actually adds xmm0-15 register
to clobber list of task swich code, but the only reason of that is
to make sure that nothing bad happens inside the function that
executes that task swich. Inspection of the generated code shows
that no xmm registers are actually saved.
Signed-off-by: Paweł Dziepak <pdziepak@quarnos.org>
Enable SSE as a part of the "preparation of the environment to run any
C or C++ code" in the entry points of stage2 bootloader.
SSE2 is going to be used by memset() and memcpy().
Signed-off-by: Paweł Dziepak <pdziepak@quarnos.org>
The possibility to specify custom memcpy and memset implementations
in cpu modules is currently unused and there is generally no point
in such feature.
There are only 2 x86 vendors that really matter and there isn't
very big difference in performance of the generic optmized versions
of these funcions across different models. Even if we wanted different
versions of memset and memcpy depending on the processor model or
features much better solution would be to use STT_GNU_IFUNC and save
one indirect call.
Long story short, we don't really benefit in any way from
get_optimized_functions and the feature it implements and it only adds
unnecessary complexity to the code.
Signed-off-by: Paweł Dziepak <pdziepak@quarnos.org>
* Removes default mapping of a portion of the RAM (will be done
as needed)
* Passes on the page directory area to kernel, so on early vm init
the kernel can use the area for pagetable allocation.
* Leaves it to the platform to pass in physical memory range(s). This
will ultimately come from FDT.
* Fix long standing issue with allocation of the heap, potentially
causing other part of the bootloader to overwrite the heap.
* Implements pagetable allocator in kernel for early vm mapping.
This fixes the first PANIC seen, we now just get the same one later
on when the VM is up... more to come...
This reverts commit 3fbb24680c.
As I mentioned in #11131, this fix is not correct, and works around
the problem. The real reason was that arch_debug_call_with_fault_handler
was not working properly, so the fault handler went crazy.
With commit eb92810 that is fixed so this can be reverted.
If GCC knows what these functions are actually doing the resulting
code can be optimized better what is especially noticeable in case of
invocations of atomic_{or,and}() that ignore the result. Obviously,
everything is inlined what also improves performance.
Signed-off-by: Paweł Dziepak <pdziepak@quarnos.org>
When an ARMv7 CPU is detected, immediately turn on the FPU. This allows
us to use vsnprintf in the TRACE call in that function, as our libc is
compiled with floating point support and will trigger a fault if the FPU
is not available.
This lets the boot go further, and crash in mmu_init. Next steps:
* Find why mmu_init is crashing
* Setup some fault handlers, otherwise we call uboot ones, and they are
not very helpful. They will also probably not work once the mmu is
enabledvery helpful. They will also probably not work once the mmu is
enabledvery helpful. They will also probably not work once the mmu is
enabled...
This patch makes it possible to inline rdmsr and wrmsr instruction. The
performance impact shouldn't be significant since they are used relatively
rarely and wrmsr is usually a serializing instruction, but there is no reason
not to do so.
The goal of this patch is to amortize the cost of context switch by making
the compiler aware that context switch clobbers all registers. Because all
register need to be saved anyway there is no additional cost of using
callee saved register in the function that does the context switch.
Similarly to previous patch regarding GDT this is mostly a rewrite of
IDT handling code from C to C++. Thanks to constexpr IDT is now entirely
generated at compile-time.
Virtually no functional change, just rewriting the code from
"C in *.cpp files" to C++. Use of constexpr may be advantageous but
that code is not performance critical anyway.
* Add isb just because.
* pdziepak pointed out that ARMv5 and before
had different barrier support.
* pdziepak also mentioned that dsb was too strong
for __sync_synchronize
* On ARMv6 or older, we do a simulated dsb.
* Move __sync_synchronize into thread.c in libroot
and use the new arch_atomic.h dsb/dmb defines.
* Gets arm @bootstrap-raw to end of bootstrap.
* Don't assume verdex as it isn't clear this was
occurring.
* Make an educated guess on HAIKU_BOOT_PLATFORM
based on provided board (but still allow it to
be overridden)
* Error out if user doesn't populate
HAIKU_BOOT_PLATFORM or enters an unknown board
name.
* You need to add "-sHAIKU_BOOT_BOARD=xxx" to
your jam to build for the proper ARM device.
* Rename beagle to beagleboneblk as per the
documentation.
* Use atomic_get_and_set for return value
* Atomics are no longer volatile
* Add missing arch_cpu_pause stub
* Move arch_cpu_idle to arch_cpu header to match
other architectures
* Set max cpu to 1 for PPC until atomic functions are finished
* We have atomic functions inline in the kernel and assembly
code in libroot post-scheduler merge... isn't that a lot of
duplication?
Apparently, reading from dr3 is slower than reading from memory
with cache hit.
Also, depending on hypervisor configuration, accessing dr3 may cause
a VM exit (and, at least on kvm, it does), what makes it much slower
than a memory access even when there is a cache miss.
* Create new interface for cpuidle modules (similar to the cpufreq
interface)
* Generic cpuidle module is no longer needed
* Fix and update Intel C-State module
This adds the -mapcs-frame compiler flag for ARM to have "stable"
stack frames, adds support to the kernel for dumping stack crawls,
and initial support for iframes. There' much more functionality
to unlock in KDL, but this makes debugging already a lot more
comfortable.....
* Mostly useful for virtualization at the moment. Works in QEmu.
* Can be enabled by safemode settings/menu.
* Please note that x2APIC normally requires use of VT-d interrupt remapping feature
on real hardware, which we don't support yet.
Placing commpage and team user data somewhere at the top of the user accessible
virtual address space prevents these areas from conflicting with elf images
that require to be mapped at exact address (in most cases: runtime_loader).
Set execute disable bit for any page that belongs to area with neither
B_EXECUTE_AREA nor B_KERNEL_EXECUTE_AREA set.
In order to take advanage of NX bit in 32 bit protected mode PAE must be
enabled. Thus, from now on it is also enabled when the CPU supports NX bit.
vm_page_fault() takes additional argument which indicates whether page fault
was caused by an illegal instruction fetch.
The physical memory map area was not included in the kernel virtual
address space range (it was below KERNEL_BASE). This caused problems
if an I/O operation took place on physical memory mapped there (the
bad address error seen in #9547 was occurring in lock_memory_etc()).
Changed KERNEL_BASE and KERNEL_SIZE to cover the area and add a null
area that covers all of it. Also changed X86VMTranslationMap64Bit to
handle large pages in Query(), as the physical map area uses large
pages.
This detects everything up to ARMv6 right now. Need to check more
recent ARM ARMs for ARMv7 detection.
The detected details get passed on to the kernel, which can use
the pre-detected info for selecting right pagetable format and such.
Copyright removal of Axel done after agreement with Axel @ BeGeistert
that for files that were copy/pasted from x86 arch and then fully
replaced the implementation, removal of original copyright holder is
allowed, since their actual code is gone ;)
This is to make sure all ARM platforms will benefit from planned work on this
MMU/CPU code. The less code duplicated, the better.
Compile-tested for all supported ARM platforms
This also implements the fault handler correctly now, and cleans up the
exception handling. Seems a lot more stable now, no unexpected panics or
faults happening anymore.
The lowest 4 bits of the MSR serves as a hint to the hardware to
favor performance or energy saving. 0 means a hint preference for
highest performance while 15 corresponds to the maximum energy
savings. A value of 7 translates into a hint to balance performance
with energy savings.
The default reset value of the MSR is 0. If BIOS doesn't intialize
the MSR, the hardware will run in performance state. This patch
initialize the MSR with value of 7 for balance between performance
and energy savings
Signed-off-by: Fredrik Holmqvist <fredrik.holmqvist@gmail.com>
Renamed {32,64}/int.cpp to {32,64}/descriptors.cpp, which now contain
functions for GDT and TSS setup that were previously in arch_cpu.cpp,
as well as the IDT setup code. These get called from the init functions
in arch_cpu.cpp, rather than having a bunch of ifdef'd chunks of code
for 32/64.
Reused x86 arch_user_debugger.cpp, with a few minor changes to make
the code work for both 32 and 64 bit. Something isn't quite working
right, if a breakpoint is hit the kernel will hang. Other than that
everything appears to work correctly.
* Changed IS_USER_ADDRESS to check an address using USER_BASE and
USER_SIZE, rather than just !IS_KERNEL_ADDRESS. The old check would
allow user buffers to point into the physical memory map area.
* Added an unmapped hole at the end of the bottom half of the address
space which catches buffers that cross into the uncanonical address
region. This also removes the need to check for uncanonical return
addresses in the syscall handler, it is no longer possible for the
return address to be uncanonical under normal circumstances. All
cases in which the return address might be changed by the kernel
are still handled via the IRET path.
Userland switch is implemented, as is basic system call support (using
SYSCALL/SYSRET). The system call handler is not yet complete: it doesn't
handle more than 6 arguments, and does not perform all the necessary kernel
entry/exit work (neither does the interrupt handler). However, this is
sufficient for runtime_loader to start and print some debug output.
No major changes to the kernel: just compiled in arch_smp.cpp and fixed the
IDT load in arch_cpu_init_percpu to use the correct limit for x86_64 (uses
sizeof(interrupt_descriptor)). In the boot loader, changed smp_boot_other_cpus
to construct a temporary GDT and get the page directory address from CR3, as
what's in kernel_args will be 64-bit stuff and will not work to switch the
CPUs into 32-bit mode in the trampoline code. Refactored 64-bit kernel entry
code to not use the stack after disabling paging, as the secondary CPUs are
given a 32-bit virtual stack address by the SMP trampoline code which will
no longer work.
A proper page fault handler was required for areas that were not locked
into the kernel address space. This enables the boot process to get
up to the point of trying to find the boot volume.
* Thread creation and switching is working fine, however threads do not yet
get interrupted because I've not implemented hardware interrupt handling
yet (I'll do that next).
* I've made some changes to struct iframe: I've removed the e/r prefixes
from the member names for both 32/64, so now they're just named ip, ax,
bp, etc. This makes it easier to write code that works with both 32/64
without having to deal with different iframe member names.
This has been done by adding typedefs in elf_common.h to the correct ELF
structures for the architecture, and changing all Elf32_* uses to those
types. I don't know whether image loading works as I cannot test it yet,
there may be some 64-bit safety issues around. However, symbol lookup for
the kernel is working correctly.
* Uses 64-bit multiplication, special handling for CPUs clocked < 1 GHz
in system_time_nsecs() not required like on x86.
* Tested against a straight conversion of the x86 version, noticably
faster with a large number of system_time() calls.
* Added empty source files for all the 64-bit paging method code, and a
stub implementation of X86PagingMethod64Bit.
* arch_vm_translation_map.cpp has been modified to use X86PagingMethod64Bit
on x86_64.
* Some things are currently ifndef'd out completely for x86_64 because
they aren't implemented, there's a few other ifdef's to handle x86_64
differences but most of the code works unchanged.
* Renamed some i386_* functions to x86_*.
* Added a temporary method for setting the current thread on x86_64
(a global variable, not SMP safe). This will be changed to be done
via the GS segment but I've not implemented that yet.
For now I've just put all the stub functions that are needed to link the
kernel into a file called stubs.cpp. I've not yet moved across the interrupt
handling code or the ELF64 relocation code to the x86 directory. Once those
have been moved I can get rid of the x86_64 headers/source directories.
Not many changes seeing as there's not much x86_64 stuff done yet. Small
differences are handled with ifdefs, large differences (descriptors.h,
struct iframe) have separate headers under arch/x86/32 and arch/x86/64.
The setup procedure is fairly simple: create a 64-bit GDT and 64-bit page
tables that include all kernel mappings from the 32-bit address space, but at
the correct 64-bit address, then go through kernel_args and changes all virtual
addresses to 64-bit addresses, and finally switch to long mode and jump to the
kernel.
* platform_allocate_elf_region() is removed, it is implemented in platform-
independent code now (ELF*Class::AllocateRegion). For ELF64 it is now
assumed that 64-bit addresses are mapped in the loader's 32-bit address space
as (address - KERNEL_BASE_64BIT + KERNEL_BASE).
* mapped_delta field from preloaded_*_image removed, now handled compile-time
using the ELF*Class::Map method.
* Also link the kernel with -z max-page-size=0x1000, removes the need for
2MB alignment on the data segment (not going to map the kernel with large
pages for the time being).
* Added a FixedWidthPointer template class which uses 64-bit storage to hold
a pointer. This is used in place of raw pointers in kernel_args.
* Added __attribute__((packed)) to kernel_args and all structures contained
within it. This is necessary due to different alignment behaviour for
32-bit and 64-bit compilation with GCC.
* With these changes, kernel_args will now come out the same size for both
the x86_64 kernel and the loader, excluding the preloaded_image structure
which has not yet been changed.
* Tested both an x86 GCC2 and GCC4 build, no problems caused by these changes.
The whole kernel now builds and there are no undefined references when
linking, I just need to fix some strange relocation errors I'm getting
(probably a problem with the linker script) and then I'll have a kernel
image.
* x86_64 is using the existing *_ia32 boot platforms.
* Special flags are required when compiling the loader to get GCC to compile
32-bit code. This adds a new set of rules for compiling boot code rather
than using the kernel rules, which compile using the necessary flags.
* Some x86_64 private headers have been stubbed by #include'ing the x86
versions. These will be replaced later.
* gPeripheralBase keeps track of the device
peripherals before and after mmu_init
* Add ability to disable mmu for troubleshooting
* Remove static FB_BASE, we actually don't know
where the FB is yet. (depends on firmware used)
* BCM2708 defines no longer assume 0x20 address
We will be throwing away the blob memory mapping
and using our own.
* Use existing blob mapping to turn GPIO led on pre mmu_init
* Remap MMU hardware addresses from 0x7E. We could map each device,
however the kernel will throw away the mappings again anyway. For
now we just map the whole range and use offsets.
* Serial uart no longer works, however at least
we know why now :). Serial driver now needs to
use mapped address.
* Use U-Boot mmu code as base
* This will be factored out someday into common arch mmu
code when we can read Flattened Device Trees
* Move mmu_init after serial_init.
Temporary change as we will want serial_init to use
memory mapped addresses... for debugging.