If GCC knows what these functions are actually doing the resulting
code can be optimized better what is especially noticeable in case of
invocations of atomic_{or,and}() that ignore the result. Obviously,
everything is inlined what also improves performance.
Signed-off-by: Paweł Dziepak <pdziepak@quarnos.org>
This patch makes it possible to inline rdmsr and wrmsr instruction. The
performance impact shouldn't be significant since they are used relatively
rarely and wrmsr is usually a serializing instruction, but there is no reason
not to do so.
The goal of this patch is to amortize the cost of context switch by making
the compiler aware that context switch clobbers all registers. Because all
register need to be saved anyway there is no additional cost of using
callee saved register in the function that does the context switch.
Similarly to previous patch regarding GDT this is mostly a rewrite of
IDT handling code from C to C++. Thanks to constexpr IDT is now entirely
generated at compile-time.
Virtually no functional change, just rewriting the code from
"C in *.cpp files" to C++. Use of constexpr may be advantageous but
that code is not performance critical anyway.
Apparently, reading from dr3 is slower than reading from memory
with cache hit.
Also, depending on hypervisor configuration, accessing dr3 may cause
a VM exit (and, at least on kvm, it does), what makes it much slower
than a memory access even when there is a cache miss.
* Create new interface for cpuidle modules (similar to the cpufreq
interface)
* Generic cpuidle module is no longer needed
* Fix and update Intel C-State module
* Mostly useful for virtualization at the moment. Works in QEmu.
* Can be enabled by safemode settings/menu.
* Please note that x2APIC normally requires use of VT-d interrupt remapping feature
on real hardware, which we don't support yet.
Placing commpage and team user data somewhere at the top of the user accessible
virtual address space prevents these areas from conflicting with elf images
that require to be mapped at exact address (in most cases: runtime_loader).
Set execute disable bit for any page that belongs to area with neither
B_EXECUTE_AREA nor B_KERNEL_EXECUTE_AREA set.
In order to take advanage of NX bit in 32 bit protected mode PAE must be
enabled. Thus, from now on it is also enabled when the CPU supports NX bit.
vm_page_fault() takes additional argument which indicates whether page fault
was caused by an illegal instruction fetch.
The physical memory map area was not included in the kernel virtual
address space range (it was below KERNEL_BASE). This caused problems
if an I/O operation took place on physical memory mapped there (the
bad address error seen in #9547 was occurring in lock_memory_etc()).
Changed KERNEL_BASE and KERNEL_SIZE to cover the area and add a null
area that covers all of it. Also changed X86VMTranslationMap64Bit to
handle large pages in Query(), as the physical map area uses large
pages.
The lowest 4 bits of the MSR serves as a hint to the hardware to
favor performance or energy saving. 0 means a hint preference for
highest performance while 15 corresponds to the maximum energy
savings. A value of 7 translates into a hint to balance performance
with energy savings.
The default reset value of the MSR is 0. If BIOS doesn't intialize
the MSR, the hardware will run in performance state. This patch
initialize the MSR with value of 7 for balance between performance
and energy savings
Signed-off-by: Fredrik Holmqvist <fredrik.holmqvist@gmail.com>
Renamed {32,64}/int.cpp to {32,64}/descriptors.cpp, which now contain
functions for GDT and TSS setup that were previously in arch_cpu.cpp,
as well as the IDT setup code. These get called from the init functions
in arch_cpu.cpp, rather than having a bunch of ifdef'd chunks of code
for 32/64.
Reused x86 arch_user_debugger.cpp, with a few minor changes to make
the code work for both 32 and 64 bit. Something isn't quite working
right, if a breakpoint is hit the kernel will hang. Other than that
everything appears to work correctly.
* Changed IS_USER_ADDRESS to check an address using USER_BASE and
USER_SIZE, rather than just !IS_KERNEL_ADDRESS. The old check would
allow user buffers to point into the physical memory map area.
* Added an unmapped hole at the end of the bottom half of the address
space which catches buffers that cross into the uncanonical address
region. This also removes the need to check for uncanonical return
addresses in the syscall handler, it is no longer possible for the
return address to be uncanonical under normal circumstances. All
cases in which the return address might be changed by the kernel
are still handled via the IRET path.
Userland switch is implemented, as is basic system call support (using
SYSCALL/SYSRET). The system call handler is not yet complete: it doesn't
handle more than 6 arguments, and does not perform all the necessary kernel
entry/exit work (neither does the interrupt handler). However, this is
sufficient for runtime_loader to start and print some debug output.
No major changes to the kernel: just compiled in arch_smp.cpp and fixed the
IDT load in arch_cpu_init_percpu to use the correct limit for x86_64 (uses
sizeof(interrupt_descriptor)). In the boot loader, changed smp_boot_other_cpus
to construct a temporary GDT and get the page directory address from CR3, as
what's in kernel_args will be 64-bit stuff and will not work to switch the
CPUs into 32-bit mode in the trampoline code. Refactored 64-bit kernel entry
code to not use the stack after disabling paging, as the secondary CPUs are
given a 32-bit virtual stack address by the SMP trampoline code which will
no longer work.
A proper page fault handler was required for areas that were not locked
into the kernel address space. This enables the boot process to get
up to the point of trying to find the boot volume.
* Thread creation and switching is working fine, however threads do not yet
get interrupted because I've not implemented hardware interrupt handling
yet (I'll do that next).
* I've made some changes to struct iframe: I've removed the e/r prefixes
from the member names for both 32/64, so now they're just named ip, ax,
bp, etc. This makes it easier to write code that works with both 32/64
without having to deal with different iframe member names.
* Uses 64-bit multiplication, special handling for CPUs clocked < 1 GHz
in system_time_nsecs() not required like on x86.
* Tested against a straight conversion of the x86 version, noticably
faster with a large number of system_time() calls.
* Added empty source files for all the 64-bit paging method code, and a
stub implementation of X86PagingMethod64Bit.
* arch_vm_translation_map.cpp has been modified to use X86PagingMethod64Bit
on x86_64.
* Some things are currently ifndef'd out completely for x86_64 because
they aren't implemented, there's a few other ifdef's to handle x86_64
differences but most of the code works unchanged.
* Renamed some i386_* functions to x86_*.
* Added a temporary method for setting the current thread on x86_64
(a global variable, not SMP safe). This will be changed to be done
via the GS segment but I've not implemented that yet.