The kernel is allowed to use fpu anywhere so we must make sure that
user state is not clobbered by saving fpu state at interrupt entry.
There is no need to do that in case of system calls since all fpu
data registers are caller saved.
We do not need, though, to save the whole fpu state at task swich
(again, thanks to calling convention). Only status and control
registers are preserved. This patch actually adds xmm0-15 register
to clobber list of task swich code, but the only reason of that is
to make sure that nothing bad happens inside the function that
executes that task swich. Inspection of the generated code shows
that no xmm registers are actually saved.
Signed-off-by: Paweł Dziepak <pdziepak@quarnos.org>
Enable SSE as a part of the "preparation of the environment to run any
C or C++ code" in the entry points of stage2 bootloader.
SSE2 is going to be used by memset() and memcpy().
Signed-off-by: Paweł Dziepak <pdziepak@quarnos.org>
The possibility to specify custom memcpy and memset implementations
in cpu modules is currently unused and there is generally no point
in such feature.
There are only 2 x86 vendors that really matter and there isn't
very big difference in performance of the generic optmized versions
of these funcions across different models. Even if we wanted different
versions of memset and memcpy depending on the processor model or
features much better solution would be to use STT_GNU_IFUNC and save
one indirect call.
Long story short, we don't really benefit in any way from
get_optimized_functions and the feature it implements and it only adds
unnecessary complexity to the code.
Signed-off-by: Paweł Dziepak <pdziepak@quarnos.org>
This patch makes it possible to inline rdmsr and wrmsr instruction. The
performance impact shouldn't be significant since they are used relatively
rarely and wrmsr is usually a serializing instruction, but there is no reason
not to do so.
The goal of this patch is to amortize the cost of context switch by making
the compiler aware that context switch clobbers all registers. Because all
register need to be saved anyway there is no additional cost of using
callee saved register in the function that does the context switch.
* Create new interface for cpuidle modules (similar to the cpufreq
interface)
* Generic cpuidle module is no longer needed
* Fix and update Intel C-State module
* Mostly useful for virtualization at the moment. Works in QEmu.
* Can be enabled by safemode settings/menu.
* Please note that x2APIC normally requires use of VT-d interrupt remapping feature
on real hardware, which we don't support yet.
Set execute disable bit for any page that belongs to area with neither
B_EXECUTE_AREA nor B_KERNEL_EXECUTE_AREA set.
In order to take advanage of NX bit in 32 bit protected mode PAE must be
enabled. Thus, from now on it is also enabled when the CPU supports NX bit.
vm_page_fault() takes additional argument which indicates whether page fault
was caused by an illegal instruction fetch.
The lowest 4 bits of the MSR serves as a hint to the hardware to
favor performance or energy saving. 0 means a hint preference for
highest performance while 15 corresponds to the maximum energy
savings. A value of 7 translates into a hint to balance performance
with energy savings.
The default reset value of the MSR is 0. If BIOS doesn't intialize
the MSR, the hardware will run in performance state. This patch
initialize the MSR with value of 7 for balance between performance
and energy savings
Signed-off-by: Fredrik Holmqvist <fredrik.holmqvist@gmail.com>
Renamed {32,64}/int.cpp to {32,64}/descriptors.cpp, which now contain
functions for GDT and TSS setup that were previously in arch_cpu.cpp,
as well as the IDT setup code. These get called from the init functions
in arch_cpu.cpp, rather than having a bunch of ifdef'd chunks of code
for 32/64.
Reused x86 arch_user_debugger.cpp, with a few minor changes to make
the code work for both 32 and 64 bit. Something isn't quite working
right, if a breakpoint is hit the kernel will hang. Other than that
everything appears to work correctly.
Userland switch is implemented, as is basic system call support (using
SYSCALL/SYSRET). The system call handler is not yet complete: it doesn't
handle more than 6 arguments, and does not perform all the necessary kernel
entry/exit work (neither does the interrupt handler). However, this is
sufficient for runtime_loader to start and print some debug output.
A proper page fault handler was required for areas that were not locked
into the kernel address space. This enables the boot process to get
up to the point of trying to find the boot volume.
* Thread creation and switching is working fine, however threads do not yet
get interrupted because I've not implemented hardware interrupt handling
yet (I'll do that next).
* I've made some changes to struct iframe: I've removed the e/r prefixes
from the member names for both 32/64, so now they're just named ip, ax,
bp, etc. This makes it easier to write code that works with both 32/64
without having to deal with different iframe member names.
* Uses 64-bit multiplication, special handling for CPUs clocked < 1 GHz
in system_time_nsecs() not required like on x86.
* Tested against a straight conversion of the x86 version, noticably
faster with a large number of system_time() calls.
* Some things are currently ifndef'd out completely for x86_64 because
they aren't implemented, there's a few other ifdef's to handle x86_64
differences but most of the code works unchanged.
* Renamed some i386_* functions to x86_*.
* Added a temporary method for setting the current thread on x86_64
(a global variable, not SMP safe). This will be changed to be done
via the GS segment but I've not implemented that yet.
Not many changes seeing as there's not much x86_64 stuff done yet. Small
differences are handled with ifdefs, large differences (descriptors.h,
struct iframe) have separate headers under arch/x86/32 and arch/x86/64.
AMD C1E is a BIOS controlled C3 state. Certain processors families
may cut off TSC and the lapic timer when it is in a deep C state,
including C1E state, thus the cpu can't be waken up and system will hang.
This patch firstly adds the support of idle selection during boot. Then
it implements amdc1e_noarat_idle() routine which checks the MSR which
contains the C1eOnCmpHalt (bit 28) and SmiOnCmpHalt (bit 27) before
executing the halt instruction, then clear them once set.
However intel C1E doesn't has such problem. AMD C1E is a BIOS controlled
C3 state. The difference between C1E and C3 is that transition into C1E
is not initiated by the operating system. System will enter C1E state
automatically when both cores enters C1 state. As for intel C1E, it
means "reduce CPU voltage before entering corresponding Cx-state".
This patch may fix#8111, #3999, #7562, #7940 and #8060
Copied from the description of #3999:
>but for some reason I hit the power button instead of the reset one. And
>the boot continued!!
The reason is CPUs are waken up once power button is hit.
Signed-off-by: Fredrik Holmqvist <fredrik.holmqvist@gmail.com>
* Prepend x86_ to non-static x86 code
* Add x86_init_fpu function to kernel header
* Don't init fpu multiple times on smp systems
* Verified fpu is still started on smp and non-smp
* SSE code still generates general protection faults
on smp systems though
* Rename init_sse to init_fpu and handle FPU setup.
* Stop trying to set up FPU before VM init.
We tried to set up the FPU before VM init, then
set it up again after VM init with SSE extensions,
this caused SSE and MMX applications to crash.
* Be more logical in FPU setup by detecting CPU flag prior
to enabling FPU. (it's unlikely Haiku will run on
a processor without a fpu... but lets be consistant)
* SSE2 gcc code now runs (faster even) without GPF
* tqh confirms his previously crashing mmx code now works
* The non-SSE FPU enable after VM init needs tested!
* Reorganized the kernel locking related to threads and teams.
* We now discriminate correctly between process and thread signals. Signal
handlers have been moved to teams. Fixes#5679.
* Implemented real-time signal support, including signal queuing, SA_SIGINFO
support, sigqueue(), sigwaitinfo(), sigtimedwait(), waitid(), and the addition
of the real-time signal range. Closes#1935 and #2695.
* Gave SIGBUS a separate signal number. Fixes#6704.
* Implemented <time.h> clock and timer support, and fixed/completed alarm() and
[set]itimer(). Closes#5682.
* Implemented support for thread cancellation. Closes#5686.
* Moved send_signal() from <signal.h> to <OS.h>. Fixes#7554.
* Lots over smaller more or less related changes.
git-svn-id: file:///srv/svn/repos/haiku/haiku/trunk@42116 a95241bf-73f2-0310-859d-f6bbb57e9c96
* Renamed i386_context_switch() to x86_context_switch().
* x86_context_switch() no longer sets the page directory.
arch_thread_context_switch() does that explicitly, now. This allows to solve
the TODO by reordering releasing the previous paging structures reference and
setting the new page directory.
git-svn-id: file:///srv/svn/repos/haiku/haiku/trunk@37024 a95241bf-73f2-0310-859d-f6bbb57e9c96
* Renamed vm_translation_map_arch_info to X86PagingStructures, and all
members and local variables of that type accordingly.
* arch_thread_context_switch(): Added TODO: The still active paging structures
can indeed be deleted before we stop using them.
git-svn-id: file:///srv/svn/repos/haiku/haiku/trunk@37022 a95241bf-73f2-0310-859d-f6bbb57e9c96
where appropriate.
* Typedef'ed page_num_t to phys_addr_t and used it in more places in
vm_page.{h,cpp}.
git-svn-id: file:///srv/svn/repos/haiku/haiku/trunk@36937 a95241bf-73f2-0310-859d-f6bbb57e9c96
Added fields for temporary storage of the debug registers dr6 and dr7 to the
arch_cpu_info structure. The actual registers are stored at the beginning of
x86_exit_user_debug_at_kernel_entry() and read in
x86_handle_debug_exception().
The problem was that x86_exit_user_debug_at_kernel_entry() itself overwrote
dr7 and, if kernel breakpoints were enabled, dr6 could be overwritten anytime
after. So x86_handle_debug_exception() would find incorrect values in the
registers (definitely in dr7) and thus interpret the detected debug condition
incorrectly. Usually watchpoints were recognized as breakpoints.
git-svn-id: file:///srv/svn/repos/haiku/haiku/trunk@35951 a95241bf-73f2-0310-859d-f6bbb57e9c96
needs to be or'ed to the address specification), "uncached" is assumed.
* Set the memory type for the "BIOS" and "DMA" areas to write-back. Not sure, if
that's correct, but that's what was effectively used on my machines before.
* Changed x86_set_mtrrs() and the CPU module hook to also set the default memory
type.
* Rewrote the MTRR computation once more:
- Now we know all used memory ranges, so we are free to extend used ranges
into unused ones in order to simplify them for MTRR setup.
- Leverage the subtractive properties of uncached and write-through ranges to
simplify ranges of any other respectively write-back type.
- Set the default memory type to write-back, so we don't need MTRRs for the
RAM ranges.
- If a new range intersects with an existing one, we no longer just fail.
Instead we use the strictest requirements implied by the ranges. This fixes
#5383.
Overall the new algorithm should be sufficient with far less MTRRs than before
(on my desktop machine 4 are used at maximum, while 8 didn't quite suffice
before). A drawback of the current implementation is that it doesn't deal with
the case of running out of MTRRs at all, which might result in some ranges
having weaker caching/memory ordering properties than requested.
git-svn-id: file:///srv/svn/repos/haiku/haiku/trunk@35515 a95241bf-73f2-0310-859d-f6bbb57e9c96
system_time_nsecs(), returning the system time in nanoseconds. The function
is only really implemented for x86. For the other architectures
system_time() * 1000 is returned.
git-svn-id: file:///srv/svn/repos/haiku/haiku/trunk@34543 a95241bf-73f2-0310-859d-f6bbb57e9c96
all MTRRs at once.
* Added a respective x86_set_mtrrs() kernel function.
* x86 CPU module:
- Implemented the new hook.
- Prefixed most debug output with the CPU index. Otherwise it gets quite
confusing with multiple CPUs.
- generic_init_mtrrs(): No longer clear all MTRRs, if they are already
enabled. This lets us benefit from the BIOS's setup until we install our
own -- otherwise with caching disabled things are *really* slow.
* arch_vm.cpp: Completely rewrote the MTRR handling as the old one was not
only slow (O(2^n)), but also broken (resulting in incorrect setups (e.g.
with cachable ranges larger than requested)), and not working by design for
certain cases (subtractive setups intersecting ranges added later).
Now we maintain an array with the successfully set ranges. When a new range
is added, we recompute the complete MTRR setup as we need to. The new
algorithm analyzing the ranges has linear complexity and also handles range
base addresses with an alignment not matching the range size (e.g. a range
at address 0x1000 with size 0x2000) and joining of adjacent/overlapping
ranges of the same type.
This fixes the slow graphics on my 4 GB machine (though unfortunately the
8 MTRRs aren't enough to fully cover the complete frame buffer (about 35
pixel lines remain uncachable), but that can't be helped without rounding up
the frame buffer size, for which we don't have enough information). It might
also fix#1823.
git-svn-id: file:///srv/svn/repos/haiku/haiku/trunk@34197 a95241bf-73f2-0310-859d-f6bbb57e9c96
ROUNDUP to use '*' and '/' -- the compiler will optimize that for powers of
two anyway and this implementation works for other numbers as well.
* The thread::fault_handler use in C[++] code was broken with gcc 4. At least
when other functions were invoked. Trying to trick the compiler wasn't a
particularly good idea anyway, since the next compiler version could break
the trick again. So the general policy is to use the fault handlers only in
assembly code where we have full control. Changed that for x86 (save for the
vm86 mode, which has a similar mechanism), but not for the other
architectures.
* Introduced fault_handler, fault_handler_stack_pointer, and fault_jump_buffer
fields in the cpu_ent structure, which must be used instead of
thread::fault_handler in the kernel debugger. Consequently user_memcpy() must
not be used in the kernel debugger either. Introduced a debug_memcpy()
instead.
* Introduced debug_call_with_fault_handler() function which calls a function
in a setjmp() and fault handler context. The architecture specific backend
arch_debug_call_with_fault_handler() has only been implemented for x86 yet.
* Introduced debug_is_kernel_memory_accessible() for use in the kernel
debugger. It determines whether a range of memory can be accessed in the
way specified. The architecture specific back end
arch_vm_translation_map_is_kernel_page_accessible() has only been implemented
for x86 yet.
* Added arch_debug_unset_current_thread() (only implemented for x86) to unset
the current thread pointer in the kernel debugger. When entering the kernel
debugger we do some basic sanity checks of the currently set thread structure
and unset it, if they fail. This allows certain commands (most importantly
the stack trace command) to avoid accessing the thread structure.
* x86: When handling a double fault, we do now install a special handler for
page faults. This allows us to gracefully catch faulting commands, even if
e.g. the thread structure is toast.
We are now in much better shape to deal with double faults. Hopefully avoiding
the triple faults that some people have been experiencing on their hardware
and ideally even allowing to use the kernel debugger normally.
git-svn-id: file:///srv/svn/repos/haiku/haiku/trunk@32073 a95241bf-73f2-0310-859d-f6bbb57e9c96
* SMP:
- Added smp_send_broadcast_ici_interrupts_disabled(), which is basically
equivalent to smp_send_broadcast_ici(), but is only called with interrupts
disabled and gets the CPU index, so it doesn't have to use
smp_get_current_cpu() (which dereferences the current thread).
- Added cpu index parameter to smp_intercpu_int_handler().
* x86:
- arch_int.c -> arch_int.cpp
- Set up an IDT per CPU. We were using a single IDT for all CPUs, but that
can't work, since we need different tasks for the double fault interrupt
vector.
- Set the per CPU double fault task gates correctly.
- Renamed set_intr_gate() to set_interrupt_gate and set_system_gate() to
set_trap_gate() and documented them a bit.
- Renamed double_fault_exception() x86_double_fault_exception() and fixed
it not to use smp_get_current_cpu(). Instead we have the new
x86_double_fault_get_cpu() that deducts the CPU index from the used stack.
- Fixed the double_fault interrupt handler: It no longer calls int_bottom to
avoid accessing the current thread.
* debug.cpp:
- Introduced explicit debug_double_fault() to enter the kernel debugger from
a double fault handler.
- Avoid using smp_get_current_cpu().
- Don't use kprintf() before sDebuggerOnCPU is set. Otherwise
acquire_spinlock() is invoked by arch_debug_serial_puts().
Things look a bit better when the current thread pointer is broken -- we run
into kernel_debugger_loop() and successfully print the "Welcome to KDL"
message -- but we still dereference the thread pointer afterwards, so that we
don't get a usable kernel debugger yet.
git-svn-id: file:///srv/svn/repos/haiku/haiku/trunk@32050 a95241bf-73f2-0310-859d-f6bbb57e9c96
* Added x86_double_fault_get_cpu(), a save way to get the CPU index when in
the double fault handler. smp_get_current_cpu() requires at least a somewhat
intact thread structure, so we rather want to avoid it when handling a double
fault. There are a lot more of those dependencies in the KDL entry code.
Working on it...
git-svn-id: file:///srv/svn/repos/haiku/haiku/trunk@32028 a95241bf-73f2-0310-859d-f6bbb57e9c96