* Migrate some platform agnostic architecture code into
boot/arch from efi/arch. This helps to avoid conflicts
between kernel and boot sources as well.
* Conflicts between arch_cpu in efi and kernel code means
bootcode really should *never* directly use kernel arch
headers. (other platforms don't, which is why they don't
have this same issue)
* We carefully thread any needed kernel headers (namely
assembly helper macros) into the bootloader headers without
mixing in the whole conflicting kernel/arch headers.
* ARM now properly get its cpu init code called, and we
progress further into the EFI bootloader.
Change-Id: If67ec9758b5ce68563ebd9eb45d5196401911c67
Reviewed-on: https://review.haiku-os.org/c/haiku/+/2975
Reviewed-by: waddlesplash <waddlesplash@gmail.com>
xsave or xsavec are supported.
breaks vregs compatibility.
change the thread structure object cache alignment to 64
the xsave fpu_state size isn't defined, it is for instance 832 here, thus I picked 1024.
Change-Id: I4a0cab0bc42c1d37f24dcafb8259f8ff24a330d2
Reviewed-on: https://review.haiku-os.org/c/haiku/+/2849
Reviewed-by: Adrien Destugues <pulkomandy@gmail.com>
On modern x86, one can use __rdtscp to get the current cpu in userland.
Change-Id: I1767e379606230a75e4622637c7a5aed9cdf9ab0
Reviewed-on: https://review.haiku-os.org/c/haiku/+/2248
Reviewed-by: Adrien Destugues <pulkomandy@gmail.com>
The patched errata are only the AMD ones FreeBSD patches
(it seems there are no Intel errata that can be patched
this way, they are all in microcode updates ... or can't
be patched in the CPU at all.)
This also seems to be roughly the point in the boot that
FreeBSD patches these, too, despite how "critical" some
of them seem.
Change-Id: I9065f8d025332418a21c2cdf39afd7d29405edcc
Reviewed-on: https://review.haiku-os.org/c/haiku/+/1740
Reviewed-by: Jessica Hamilton <jessica.l.hamilton@gmail.com>
Even on 64bit CPUs it's a 32bit register.
Change-Id: I9a4de6eec225de19a90d70fae1382b662e530629
Reviewed-on: https://review.haiku-os.org/c/1625
Reviewed-by: waddlesplash <waddlesplash@gmail.com>
This reverts commit c558f9c8fe.
This reverts commit 44f24718b1.
This reverts commit a69cb33030.
This reverts commit 951182620e.
There have been multiple reports that these changes break mounting NTFS partitions
(on all systems, see #14204), and shutting down (on certain systems, see #12405.)
Until they can be fixed, they are being backed out.
* also adjust BOOT_GDT_SEGMENT_COUNT for x86, the definition is used by the
boot loader.
* add some 32-bit definitions.
* add a UserTLSDescriptor class, this will be used by 32-bit threads.
Change-Id: I5b1d978969a1ce97091a16c9ec2ad7c0ca831656
SMAP will generated page faults when the kernel tries to access user pages unless overriden.
If SMAP is enabled, the override instructions are written where needed in memory with
binary "altcodepatches".
Support is enabled by default, might be disabled per safemode setting.
Change-Id: Ife26cd765056aeaf65b2ffa3cadd0dcf4e273a96
* New Intel SkyLake seems to have 9 mapped ranges
at boot. It seems like this define has been creeping
up for a while.
* Resolves the inital issue reported in #11377 on SkyLake
as well. Bonefish mentioned it might need to be raised
again... he had some good foresight there :-)
* I'm seeing the same no bootable partitions issue though
via USB after this raise. (maybe a USB 3.1 thing?)
The BOOT_GDT_SEGMENT_COUNT was based on USER_DATA_SEGMENT on both
x86 and x86_64. However, on x86_64 the order of the segments is
different, leading to a too small gBootGDT array. Move the define to
the arch specific headers so they can be setup correctly in either case.
Also add a STATIC_ASSERT() to check that the descriptors fit into the
array.
Pointed out by CID 1210898.
The kernel is allowed to use fpu anywhere so we must make sure that
user state is not clobbered by saving fpu state at interrupt entry.
There is no need to do that in case of system calls since all fpu
data registers are caller saved.
We do not need, though, to save the whole fpu state at task swich
(again, thanks to calling convention). Only status and control
registers are preserved. This patch actually adds xmm0-15 register
to clobber list of task swich code, but the only reason of that is
to make sure that nothing bad happens inside the function that
executes that task swich. Inspection of the generated code shows
that no xmm registers are actually saved.
Signed-off-by: Paweł Dziepak <pdziepak@quarnos.org>
Enable SSE as a part of the "preparation of the environment to run any
C or C++ code" in the entry points of stage2 bootloader.
SSE2 is going to be used by memset() and memcpy().
Signed-off-by: Paweł Dziepak <pdziepak@quarnos.org>
The possibility to specify custom memcpy and memset implementations
in cpu modules is currently unused and there is generally no point
in such feature.
There are only 2 x86 vendors that really matter and there isn't
very big difference in performance of the generic optmized versions
of these funcions across different models. Even if we wanted different
versions of memset and memcpy depending on the processor model or
features much better solution would be to use STT_GNU_IFUNC and save
one indirect call.
Long story short, we don't really benefit in any way from
get_optimized_functions and the feature it implements and it only adds
unnecessary complexity to the code.
Signed-off-by: Paweł Dziepak <pdziepak@quarnos.org>
If GCC knows what these functions are actually doing the resulting
code can be optimized better what is especially noticeable in case of
invocations of atomic_{or,and}() that ignore the result. Obviously,
everything is inlined what also improves performance.
Signed-off-by: Paweł Dziepak <pdziepak@quarnos.org>
This patch makes it possible to inline rdmsr and wrmsr instruction. The
performance impact shouldn't be significant since they are used relatively
rarely and wrmsr is usually a serializing instruction, but there is no reason
not to do so.
The goal of this patch is to amortize the cost of context switch by making
the compiler aware that context switch clobbers all registers. Because all
register need to be saved anyway there is no additional cost of using
callee saved register in the function that does the context switch.
Similarly to previous patch regarding GDT this is mostly a rewrite of
IDT handling code from C to C++. Thanks to constexpr IDT is now entirely
generated at compile-time.
Virtually no functional change, just rewriting the code from
"C in *.cpp files" to C++. Use of constexpr may be advantageous but
that code is not performance critical anyway.
Apparently, reading from dr3 is slower than reading from memory
with cache hit.
Also, depending on hypervisor configuration, accessing dr3 may cause
a VM exit (and, at least on kvm, it does), what makes it much slower
than a memory access even when there is a cache miss.