The possibility to specify custom memcpy and memset implementations
in cpu modules is currently unused and there is generally no point
in such feature.
There are only 2 x86 vendors that really matter and there isn't
very big difference in performance of the generic optmized versions
of these funcions across different models. Even if we wanted different
versions of memset and memcpy depending on the processor model or
features much better solution would be to use STT_GNU_IFUNC and save
one indirect call.
Long story short, we don't really benefit in any way from
get_optimized_functions and the feature it implements and it only adds
unnecessary complexity to the code.
Signed-off-by: Paweł Dziepak <pdziepak@quarnos.org>
* Before, it would just overwrite the previous name, leaving extra
bytes from the previous name (they wouldn't become part of the
host name, but it just didn't look that nice).
long_mmu_init() prepares initial paging structures for 64 bit kernel.
Once that function completes bootloader cannot allocate any memory
that needs to be passed to the kernel.
Signed-off-by: Paweł Dziepak <pdziepak@quarnos.org>
* VT100 is much more common than VT52 which the u-boot port was
previously using (a legacy of the Atari m68k port)
* Implement serial_getc (again, code is identical to raspberry port...)
so the boot menu can be used over the serial port. The enter key is
recognized, arrows currently aren't.
* Cursor coordinates are 1-based, not 0-based
* Color change was disabled and broken
This implementation of our console over VT100 is generic and should be
moved out of the raspberry-pi specific folder. However, leaving it there
for now as we will have some bigger reorganization a swe add FDT support
here.
No big functional reason for this, but rather keep it in sync now
then have to do lots of work later on, when there are major changes.
Once I have it fully fleshed out for ARM, I might take a look if
we can generalise it a little more, as there's lots of code
_exactly_ the same for both platforms (and other platforms in
progress using the same code).
* Removes default mapping of a portion of the RAM (will be done
as needed)
* Passes on the page directory area to kernel, so on early vm init
the kernel can use the area for pagetable allocation.
* Leaves it to the platform to pass in physical memory range(s). This
will ultimately come from FDT.
* Fix long standing issue with allocation of the heap, potentially
causing other part of the bootloader to overwrite the heap.
* Implements pagetable allocator in kernel for early vm mapping.
This fixes the first PANIC seen, we now just get the same one later
on when the VM is up... more to come...
We have _start/_end symbols to mark our start and end, use those
to determine where we are loaded. We're slowly getting closer to
a fully dynamic handling of our memory map!
Let the platform mmu_map_physical_memory the initrd region, and
reserve it before calling mmu_init. This removes another hardcoded
address, since e.g. U-Boot gets the address from the uImage file.
This reverts commit 3fbb24680c.
As I mentioned in #11131, this fix is not correct, and works around
the problem. The real reason was that arch_debug_call_with_fault_handler
was not working properly, so the fault handler went crazy.
With commit eb92810 that is fixed so this can be reverted.
This fixes the problem with KDL freaking out when doing a stacktrace
and having its fault handler triggered. Have no clue how this could
have worked before, but it did :P
As discussed on the ML the limitation of the gap between segments
imposed by this check is completely artifical and pointless.
Signed-off-by: Paweł Dziepak <pdziepak@quarnos.org>
cpuid is available in user mode as well and it doesn't look like there
are going to be any x86 platforms with significantly different CPUs anytime
soon.
Signed-off-by: Paweł Dziepak <pdziepak@quarnos.org>
GCC knows whether these functions need to be implemented using syscalls
(or more clever solutions like in Linux) and calls libgcc in such case.
Signed-off-by: Paweł Dziepak <pdziepak@quarnos.org>
The less assembler in our sources the better. These functions wouldn't
be used very much since SupportDef.h inlines them, but the symbols should
be available.
Signed-off-by: Paweł Dziepak <pdziepak@quarnos.org>
Time to get rid of some asm code. Surprisingly, it appears that
on x86[_64] the emitted code for atomic_test_and_get() isn't as efficient
as it could be, even with -O2, but cmpxchg is so expensive that this slight
difference shouldn't matter much.
Signed-off-by: Paweł Dziepak <pdziepak@quarnos.org>
* Restore missing definitions of std::nothrow and mynothrow that are
required for the debug build.
* Additionally, cleanup function overrides provided by kernel_cpp,
such that any exceptions in kernel, bootloader or runtime_loader will
trigger a panic.
* From now on, the gcc-specific system libraries (libgcc, libsupc++ and
libstdc++) are provided by separate packages built along with gcc:
- gcc_syslibs contains the shared libraries (libgcc_s.so, libsupc++.so and
libstdc++.so)
- gcc_syslibs_devel contains the static libraries and both c++ and gcc
headers
The shared libraries now make proper use of symbol versioning and there
are version-specific symlinks
* The buildsystem has been adjusted to no longer use the libraries and
headers from the cross-compiler, but use the ones provided by the
above-mentioned packages. The only exception is that the 32-bit libraries
required for the bootloader of the x86_64 architecture are still taken
from the cross-compiler.
* <stubbed>libroot.so is a shared library which contains all the symbols
from libroot, but without any code. This library will be required by
the (to be introduced) stage0 of the bootstrap process, in order to
be able to link the shared gcc syslibs (libstdc++.so, libsupc++.so
and libgcc_s.so).
* The new libstdc++.so contains program headers of type PT_RELRO (for
making segments read-only after relocation). While the actual feature
has not been implemented, the runtime_loader should now silently
accept (and ignore) those program headers.
* Instead of faking libstdc++.so from libstdc++.a, use libstdc++.so
from the gcc_syslibs build feature for everything except x86_gcc2.
* Use libgcc_s.so from the gcc_syslibs build feature for everything but
x86_gcc2 (which still carries libgcc as part of libroot.so).
* Drop filtering of libgcc objects for libroot, as that is no longer
necessary since we're only using libgcc-as-single-object for libroot
with x86_gcc2, where the filtered object file doesn't exist. Should
the objects that used to be filtered cause any problems as part of
libgcc_s.so, we can always filter them as part of the gcc build.
* Use libsupc++.so from the gcc_syslibs build feature for everything but
x86_gcc2.
* Adjust all Jamfiles accordingly.
* Deactivate building of faked libstdc++.so for non-x86-gcc2. For
x86_gcc2, we still build libstdc++.so from the sources in the Haiku
source tree as part of the Haiku build .
* Put gcc_syslibs package onto the image, when needed.
instead or additionally to string.h, in preparation for functions move.
* moves str[n]casecmp() functions and others to strings.h.
* strings.h doesn't include string.h anymore.
* this solves #10949
in arch_vm_translation_map_is_kernel_page_accessible. Fix borrowed from x86
(commit 428b9e758c).
Signed-off-by: Adrien Destugues <pulkomandy@pulkomandy.tk>
Fixes#11107.
It only makes sense to add the ABI directories to library paths
created by Haiku itself. E.g. on a gcc2h build, appending x86.
This also fixes build issues where LIBRARY_PATH is amended, and
the target binaries and libraries are in different locations.
Note: the develop lib directories were excluded, as runtime_loader
shouldn't be looking at these in the first place.
* Prefix lock functions with __ to mark them as private. Add
forwarding macros to keep existing code working.
* Avoids symbol name clashes with kernel lock APIs, occuring when
using kernellandemu-lib in userlandfs. Thanks to Ingo for the
suggestion.
* PackageFileHeap{Reader,Writer} as well as Package{Reader,Writer} and
their implementation and super classes do now internally use a
BPositionIO instead of a FD to access the package file. This provides
more flexibility needed for features to come.
* BPackageReader has already grown a new Init() version with a
BPositionIO* parameter.
* Ingo copied the methods into a shared location, and then obviously
"forgot" to let BFS use them. As a side note for Ingo: the complete
error GCC reported was "std::fssh_size_t" not defined with the macro
wrapper as code location. The actual problem was a "using std::size_t"
in some C++ header that accidentally got included after the wrapper.
* The shared Query code is not yet used. That'll be done another time.
* Renamed BFS_SHELL define to FS_SHELL, such that QueryParserUtils can be
used in any file system shell, not just the bfs_shell.
* The number of IO vectors is not 256 on x86, but rather 224 as set by
NUM_IO_VECTORS in "arch_int.h".
* Jessicah mentioned hearing about MSI crashes before, but that was a
few weeks ago.
* These were the only CIDs in the MSI code.
Signed-off-by: Michael Lotz <mmlr@mlotz.ch>
* I tried having this test in KDiskDeviceManager.cpp, but it
failed booting in one case and did not solve the problem in another.
I think this is because there is an Open() call here, and that rereads
the blocksize.
* Tested and it solved the problem for me.
* Should fix#10717 and #9489 at least.
Signed-off-by: Jessica Hamilton <jessica.l.hamilton@gmail.com>
* Currently, no command line options are being passed via u-boot
to haiku. However, the comparison doesn't ensure that cmdline
is not an empty string - it merely ensures cmdline is not null.
Signed-off-by: Ithamar R. Adema <ithamar@upgrade-android.com>
* After initializing the page table and enabling MMU,
the pre-MMU stack becomes invalid leading to a fault.
This was fixed by moving the stack to SDRAM as specified
in LOADER_MEMORYMAP before ARM entry point start_netbsd.
Signed-off-by: Ithamar R. Adema <ithamar@upgrade-android.com>
When an ARMv7 CPU is detected, immediately turn on the FPU. This allows
us to use vsnprintf in the TRACE call in that function, as our libc is
compiled with floating point support and will trigger a fault if the FPU
is not available.
This lets the boot go further, and crash in mmu_init. Next steps:
* Find why mmu_init is crashing
* Setup some fault handlers, otherwise we call uboot ones, and they are
not very helpful. They will also probably not work once the mmu is
enabledvery helpful. They will also probably not work once the mmu is
enabledvery helpful. They will also probably not work once the mmu is
enabled...
Loading of haiku_loader from an uImage is a 2-step process:
* First, the uImage is loaded (in our case from SD card using fatload)
to RAM at a temporary address.
* Then (using bootm), it is unpacked. The uImage is a container format
and can hold several files, with a load and execution address. The files
are copied from the uImage to their final location, and it's better if
that doesn't overlap with the uImage content
When this loading is done, bootm jumps to the entry point found in the
uImage.
We now actually execute our code from haiku_loader. This crashes with
the following call stack:
* vsnprintf
* dprintf
* boot_arch_cpu_init
* cpu_init
It seems vsnprintf is trying to use VFP instructions (probably from the
libgcc) but that triggers some kind of fault, and the handler (setup by
uboot?) ends up crashing the system by jumping to unmapped memory at 0.
* Cleanup the SD card image building to allow jam -q @bootstrap-mmc to
work.
There are a few remaining tricks before you can safely build an image:
* This uses a non-POSIX du option, and is only tested with Linux du
only (Linux is the only supported system to run bootstrap builds,
anyway)
* The Python recipe in haikuports.cross is known to not build on
Debian/Ubuntu, but work fine on OpenSuse. There is a patch available in
haikuports bugtracker to allow the reverse.
* You need to populate the haikuports repo package list with some
packages (which don't exist yet) to make the build system happy. But our
git hook to generate the repositories is preventnig me to share this
hack.
Once built, the image currently crashes early in the kernel execution.
On to debug that!
* thanks to Ingo for suggesting the idea, quoting him:
"by holding sVnodeLock read-locked, get_mount() ensures that fs_unmount() can't
process the nodes. If it is already past that point, the root node check
(not NULL, not busy, ref count > 0) is supposed to detect that. But it doesn't
look like this can work. fs_unmount() doesn't set the root node to NULL (the
root node field is NULL only during a short period in fs_mount()), but it just
frees the nodes after releasing sVnodeLock. So the not busy and ref count > 0
checks could already access freed memory".
* tested OK, this fixes#10522.
* replaced mount->root_vnode by the local variable with the same value.
The way we handle paging is very wasteful and relies heavily on virtual
funcions even if there is absolutely no reason to do so. The proper
solution would be to do a major rework of paging code (including
arch-independent parts).
On x86_64 physical page mapper is very simple what makes the overhead
resulting from the desing of paging interface very expensive. This
patch attempts to make things a bit better by helping GCC with
devirtualization and allowing inlining physical page mapper impementation
(well, only when it is devirtualized).
This fixes the problem with find_unique_check_sums() taking a very long
time to complete when one or more drives report read errors.
Fixes#10880.
[Paweł: minor style issue]
Signed-off-by: Paweł Dziepak <pdziepak@quarnos.org>
* A problem with our gcc requires adding casts for gcc4 when
the __builtin_bswap functions are used with a format string
* Unlike gcc2, the __builtin_bswap functions do not get disabled
despite using -fno-builtins, hence added compiler check in
runtime_loader/utility.cpp
This patch makes it possible to inline rdmsr and wrmsr instruction. The
performance impact shouldn't be significant since they are used relatively
rarely and wrmsr is usually a serializing instruction, but there is no reason
not to do so.
The goal of this patch is to amortize the cost of context switch by making
the compiler aware that context switch clobbers all registers. Because all
register need to be saved anyway there is no additional cost of using
callee saved register in the function that does the context switch.
Similarly to previous patch regarding GDT this is mostly a rewrite of
IDT handling code from C to C++. Thanks to constexpr IDT is now entirely
generated at compile-time.
Virtually no functional change, just rewriting the code from
"C in *.cpp files" to C++. Use of constexpr may be advantageous but
that code is not performance critical anyway.
While resolving TLS related relocations it is necessary to know the DSO
that defines the symbol. Without proper support in caching that information
is available only when the symbol is resolved first time. That works well
for TLS since TLS_DTPMOD is guaranteed to be before TLS_DTPOFF relocation.
This patch makes the newly introduced parts of the interface work in a
general case.
Previously TLS_DTPMOD relocation blindly returned ID of the current DSO.
This patch does proper symbol lookup if there is a symbol assigned to the
relocation and uses ID of the DSO in which the symbol is defined.
This patch introduces support of ELF based TLS handling with lazy allocation
and initalization of TLS block for each DSO and thread. The implementation
generally follows the official ABI except that generation counter in dtv
is in fact a pointer to Generation object that contains both generation
counter and size of the dtv. That simplified the implementation a bit, but
could be changed later. The ABI requirements regariding in memory position
of TLS block is not honoured what results in static TLS model being
unsupported. However, that should not be a problem as long as
"executables" in Haiku are in fact shared objects and optimizations which
require specific TLS block in memory layout are not possible anyway.
* Fixes missing atomic stuff that gcc requires
* The gcc build still fails further down, because of a mixup of
VFP/nonVFP objects (at least for beagle build).
The scheduler expects that all threads expect the initial idle threads
have priority in range [THREAD_MIN_SET_PRIORITY, THREAD_MAX_SET_PRIORITY].
If the requested pririty is out of range the value is clamped. Failing
with B_BAD_VALUE is probably an overkill since there isn't any real
change in the guarantees provided by the scheduler about the behavior
of such thread. Also, BeBook suggests that spawn_thread() can specify
priority 0.
For potential boot volumes with older packages states the respective
item in the boot volume menu now has a sub menu for selecting a state.
The boot loader functionality for this feature is complete -- i.e. the
respective kernel is loaded and the name of the old state is added to
the kernel args -- but kernel packagefs and package daemon support is
still missing.
... in filenames. Replace the existing Unicode conversion functions
with UTF conversion functions from js that he relicensed MIT for us.
Put the UTF conversion functions in a private but shared code location
so that they can be accessed throughout the kernel.
Right now we only provide functions to convert between UTF-8 and UTF-16.
At some point we should also add functions to convert between UTF-8 and
UTF-32 and UTF-16 and UTF-32 but these aren't needed by exfat.
Remove the old Unicode conversion functions from exfat as they assumed
UCS-2 characters and don't work with UTF-16 used by exfat.
Rename most variables with the term length with code unit where code units
are intended. The term length, when used, means length in bytes while code
units represent either a full 2-byte UTF-16 character or half a 4-byte
surrogate pair.
This patch remove the old thread migration logic which used few special
cases and (broken) general check that attempted to balance threads.
The new logic is pretty straightforward and seems perform well without
any additional special cases. Current core is compared with the least loaded
one and the thread is migrated if that would result in estimated loads of
both cores (i.e. the current one and the least loaded one) to become closer
to the average load (i.e. average of that two cores).
Currently, ThreadData::ShouldRebalance() (and mode specific functions
it calls) only decides whether to migrate thread to another core or not.
However, in most cases it actually needs to find the best candidate for
new core so it could as well return that information.
After load_image() the child thread is suspended and the parent is
expected to resume it later. However, it is possible that the parent
attempts to resume its child after it has been notified that the image
had been loaded but before the child managed to suspend itself. In such
case the child would suspends itself after that wake up attempt and,
consequently will not be ever resumed.
To mitigate that problem flag Thread::going_to_suspend has been added
which helps synchronizing thread suspension and continuation in a similar
way that "traditional" thread blocking is performed. This means that
the child should behave in a following manner: set its going_to_suspend flag,
notify the parent (i.e. any thread that may want to resume it), acquire
its scheduler_lock and suspend itself if the going_to_suspend flag is set.
The parent should follow pattern: clear going_to_suspend flag of the thread
that is about to be resumed, acquire that thread scheduler_lock and enqueue
it in a run queue if it is suspended.
Thanks Oliver for reporting the bug and identifying what causes it.
Most of the actual UserEvent work is done in DPC so that we don't have
to care about the limitations of the context in which UserEvent::Fire()
is invoked. This requires appropriate management of lifetime of UserEvent
instances to make sure that DoDPC() method is always called on a valid
object.
- POSIX says the behavior for pthread_equal is undefined for
uninitialized arguments.
- However, gcc C++11 threads supports expects C++-compatible behavior,
that is, two uninitialized pthread_t should compare equal.
Avoids some runtime asserts in latest WebKit version.
In low latency mode the scheduler would not attempt to balance load
on not heavily loaded cores unless difference in load exceeded
kLoadDifference * 2 (i.e. 40 percentage points), which does not seem
to be good enough.
To make sure that load statistics are accurate on idle cores each time
idle thread is scheduled a timer is set to update load when current
load measurement interval elapses. However, core load is defined as the
average load during last measurement interval and idle core may be still
considered busy if it was not idle during entire measurement interval.
Since, load update timer is a one shot timer that information will not be
updated until the core becomes active again.
To mitigate that issue load update timer is set to fire after two load
measurement intervals had elapsed.
Should fix#10628. If there is a race condition with a writer getting
minimum or maximum from double ended heap may incorrectly result NULL.
Which is not expected in the most of the thread migration logic. Apart
from that, because of the race condition heap state may be observed as
inconsistent thus failing assertions.
ended heap
* Align all allocations of more than 8 bytes to 8-byte.
* Avoids hitting ASSERTs in WebKit when built in debug mode (it assumes
at least 8 byte alignment)
The main purpose of using atomic_get() was the necessity of a compiler
barrier to prevent the compiler from optimizing busy loops. However,
each such loop contains in its body at least one statement that acts
as a compiler barrier (namely, cpu_wait() or cpu_pause()) making
atomic_get() redundant (well, atomic_get() is stronger - it also issues
a load barrier but in these particular cases we do not need it).
If the initial attempt to acquire read spinlock fails we use more relaxed
loop (which doesn't require CPU to lock the bus). However, check in that
loop, incorrectly, didn't allow a lock to be acquired when there was at
least one other reader.
* Add isb just because.
* pdziepak pointed out that ARMv5 and before
had different barrier support.
* pdziepak also mentioned that dsb was too strong
for __sync_synchronize
* On ARMv6 or older, we do a simulated dsb.
* Move __sync_synchronize into thread.c in libroot
and use the new arch_atomic.h dsb/dmb defines.
* Gets arm @bootstrap-raw to end of bootstrap.
GCC doesn't provide an ARM implementation of it. It's easy to write one
for ARMv6 and above, while older archs will need this implemented as a
syscall just like other atomics.
We have the same problem as on x86_64: posiiton dependant code isn't
allowed in shared libraries. Since Kernel.so is not used at runtime,
we can use the same hack as on x86_64, and use elfedit to make the
linker think our kernel is a shared library.
* As per the ML discussions. Bumps MIPS to tier 3.
* We've reached a unanimous descision that MIPS doesn't
target any real / valid hardware Haiku wants to pursue
at the moment. In the event that anyone wants to pursue
MIPS, feel free to fork Haiku into your own repository
(and we'll even link to it on the website ports page)
* If someone develops a viable plan for MIPS (and gets the
port working, it can be readded at a later date)
This is required to use some SSE instructions, which are generated by
gcc 4.8, most notably when compiling WebKit code (but it may happen
elsewhere as well).
Fixes about 900 crashes and 10000 test failures in WebKit, so this must
be working. Fixes#10509 for x86.
* Use atomic_get_and_set for return value
* Atomics are no longer volatile
* Add missing arch_cpu_pause stub
* Move arch_cpu_idle to arch_cpu header to match
other architectures
* This will be used to implement compressed http streams
* Remove the custom BDataOutput class, and use BDataIO instead, for
easier integration with existing code.
The initial core assignment has to be done without any knowledge about
the thread behaviour. Moreover, short lived tasks may spent most of their
time executing on that initial core. This patch attempts to improve the
qualiti of that initial decision.
The main purpose of this patch is to eliminate the delay between thread
migration and result of that migration being visible in load statistics.
Such delay, in certain circumstances, may cause some cores to become
overloaded because the scheduler migrates too many threads to them before
the effect of migration becomes apparent.
In order to keep the scheduler tickless core load is computed and updated
only during various scheduler events (i.e. thread enqueue, reschedule, etc).
The problem it creates is that if a core becomes idle its load may remain
outdated for an extended period of time thus resulting in suboptimal thread
migration decisions.
The solution to this problem is to add a timer each time an idle thread is
scheudled which, after kLoadMeasureInterval, would fire and force load
update.
Priority penalties were made more strict in order to prevent situation
when two or more high priority threads uses up all available CPU time
in such manner that they do not receive a penalty but starve low priority
threads.
However, a significant change to thread priorites has been made since and
now priority of all non real time threads varies in a range from 1 to
static priority minus penalty. This means that the scheduler is able to
prevent thread starvation without any complex penalty policies.
Originially, core load was a sum of eastimated loads of all currently
running or ready threads on a given core. Such value is changing very
rapidly preventing the thread migration logic from making any reasonable
decisions.
This patch changes the way core load is computed to make it more stable
thus improving the qualitiy of decisions made by the thread migration logic.
Currently core load is a sum of estimated loads of all threads that have been
ready during last load measurement interval and haven't been migrated or
killed.
The main reason for this patch is to fix gcc 4.8.2 warning about
hierarchyLevels possibly being used not initialized. Such thing
actually can not happen since all x2APIC CPUs are aware of at least
3 topology levels. However, once more topology levels are introduced
we will have to deal with CPUs that do not report information about all
of them.
User timers may cause another thread to become ready in which case we would
like this to happen before scheduler_reschedule() chooses next thread to
be executed.
UserEvent can be fired from scheduler_reschedule() i.e. while holding current
thread scheduler_lock. If the current thread goes sleep and during reschedule
one of its timers sends a signel to it, then scheduler_enqueue_in_run_queue()
attempts to acquire again its scheduler_lock resulting in a deadlock.
There was also a minor issue with both scheduler_reschedule() and
scheduler_enqueue_in_run_queue() acquiring current CPU scheduler mode lock.
* Fix incorrect cpu vendor name mapping
* Add additional CPU architectures
* Add additional CPU vendors
* Rework PowerPC arch_system_info passing
PVR back for cpu model
On multisocket systems as well as under virtual machines logical CPUs
may use separate TSC. We could attempt to synchronize them what probably
would solve problems on multisocket systems. Unfortunately, when running
under hypervisor there is still a chance that TSC will get out of sync
again (e.g. cpufreq enabled on host when there is no invariant TSC). As
long as we use RDTSC as our main time source the scheduler must accept the
fact that time may go backwards (what isn't really a serious problem).
Add boot loader debug menu option "Save syslog from previous session
during boot". If enabled (defaults to true), the previous session's
debug syslog data is copy to a separate buffer and passed to the
kernel, which writes it back to the file /var/log/previous_syslog.
As long as Haiku still boots, this should now be the most convenient way
to retrieve the output from a kernel crash.
This reverts commit 667617ad043a4587d8d366d5192d9ad291cfa37a.
Scheduler profiler uses CPU local data to store function information, hence
arch_thread_context_switch() usually is not a problem. However, when
we switch to a new thread we end up scheduler_new_thread_entry() instead
of scheduler_reschedule() what may corrupt data collected by the profiler.
The symbol is needed for global objects. Usually, GCC also requires
this, but for some reason, the linking error only occurs when using
Clang.
Signed-off-by: Jérôme Duval <jerome.duval@gmail.com>