When an ARMv7 CPU is detected, immediately turn on the FPU. This allows
us to use vsnprintf in the TRACE call in that function, as our libc is
compiled with floating point support and will trigger a fault if the FPU
is not available.
This lets the boot go further, and crash in mmu_init. Next steps:
* Find why mmu_init is crashing
* Setup some fault handlers, otherwise we call uboot ones, and they are
not very helpful. They will also probably not work once the mmu is
enabledvery helpful. They will also probably not work once the mmu is
enabledvery helpful. They will also probably not work once the mmu is
enabled...
Loading of haiku_loader from an uImage is a 2-step process:
* First, the uImage is loaded (in our case from SD card using fatload)
to RAM at a temporary address.
* Then (using bootm), it is unpacked. The uImage is a container format
and can hold several files, with a load and execution address. The files
are copied from the uImage to their final location, and it's better if
that doesn't overlap with the uImage content
When this loading is done, bootm jumps to the entry point found in the
uImage.
We now actually execute our code from haiku_loader. This crashes with
the following call stack:
* vsnprintf
* dprintf
* boot_arch_cpu_init
* cpu_init
It seems vsnprintf is trying to use VFP instructions (probably from the
libgcc) but that triggers some kind of fault, and the handler (setup by
uboot?) ends up crashing the system by jumping to unmapped memory at 0.
* Cleanup the SD card image building to allow jam -q @bootstrap-mmc to
work.
There are a few remaining tricks before you can safely build an image:
* This uses a non-POSIX du option, and is only tested with Linux du
only (Linux is the only supported system to run bootstrap builds,
anyway)
* The Python recipe in haikuports.cross is known to not build on
Debian/Ubuntu, but work fine on OpenSuse. There is a patch available in
haikuports bugtracker to allow the reverse.
* You need to populate the haikuports repo package list with some
packages (which don't exist yet) to make the build system happy. But our
git hook to generate the repositories is preventnig me to share this
hack.
Once built, the image currently crashes early in the kernel execution.
On to debug that!
* thanks to Ingo for suggesting the idea, quoting him:
"by holding sVnodeLock read-locked, get_mount() ensures that fs_unmount() can't
process the nodes. If it is already past that point, the root node check
(not NULL, not busy, ref count > 0) is supposed to detect that. But it doesn't
look like this can work. fs_unmount() doesn't set the root node to NULL (the
root node field is NULL only during a short period in fs_mount()), but it just
frees the nodes after releasing sVnodeLock. So the not busy and ref count > 0
checks could already access freed memory".
* tested OK, this fixes#10522.
* replaced mount->root_vnode by the local variable with the same value.
The way we handle paging is very wasteful and relies heavily on virtual
funcions even if there is absolutely no reason to do so. The proper
solution would be to do a major rework of paging code (including
arch-independent parts).
On x86_64 physical page mapper is very simple what makes the overhead
resulting from the desing of paging interface very expensive. This
patch attempts to make things a bit better by helping GCC with
devirtualization and allowing inlining physical page mapper impementation
(well, only when it is devirtualized).
This fixes the problem with find_unique_check_sums() taking a very long
time to complete when one or more drives report read errors.
Fixes#10880.
[Paweł: minor style issue]
Signed-off-by: Paweł Dziepak <pdziepak@quarnos.org>
* A problem with our gcc requires adding casts for gcc4 when
the __builtin_bswap functions are used with a format string
* Unlike gcc2, the __builtin_bswap functions do not get disabled
despite using -fno-builtins, hence added compiler check in
runtime_loader/utility.cpp
This patch makes it possible to inline rdmsr and wrmsr instruction. The
performance impact shouldn't be significant since they are used relatively
rarely and wrmsr is usually a serializing instruction, but there is no reason
not to do so.
The goal of this patch is to amortize the cost of context switch by making
the compiler aware that context switch clobbers all registers. Because all
register need to be saved anyway there is no additional cost of using
callee saved register in the function that does the context switch.
Similarly to previous patch regarding GDT this is mostly a rewrite of
IDT handling code from C to C++. Thanks to constexpr IDT is now entirely
generated at compile-time.
Virtually no functional change, just rewriting the code from
"C in *.cpp files" to C++. Use of constexpr may be advantageous but
that code is not performance critical anyway.
While resolving TLS related relocations it is necessary to know the DSO
that defines the symbol. Without proper support in caching that information
is available only when the symbol is resolved first time. That works well
for TLS since TLS_DTPMOD is guaranteed to be before TLS_DTPOFF relocation.
This patch makes the newly introduced parts of the interface work in a
general case.
Previously TLS_DTPMOD relocation blindly returned ID of the current DSO.
This patch does proper symbol lookup if there is a symbol assigned to the
relocation and uses ID of the DSO in which the symbol is defined.
This patch introduces support of ELF based TLS handling with lazy allocation
and initalization of TLS block for each DSO and thread. The implementation
generally follows the official ABI except that generation counter in dtv
is in fact a pointer to Generation object that contains both generation
counter and size of the dtv. That simplified the implementation a bit, but
could be changed later. The ABI requirements regariding in memory position
of TLS block is not honoured what results in static TLS model being
unsupported. However, that should not be a problem as long as
"executables" in Haiku are in fact shared objects and optimizations which
require specific TLS block in memory layout are not possible anyway.
* Fixes missing atomic stuff that gcc requires
* The gcc build still fails further down, because of a mixup of
VFP/nonVFP objects (at least for beagle build).
The scheduler expects that all threads expect the initial idle threads
have priority in range [THREAD_MIN_SET_PRIORITY, THREAD_MAX_SET_PRIORITY].
If the requested pririty is out of range the value is clamped. Failing
with B_BAD_VALUE is probably an overkill since there isn't any real
change in the guarantees provided by the scheduler about the behavior
of such thread. Also, BeBook suggests that spawn_thread() can specify
priority 0.
For potential boot volumes with older packages states the respective
item in the boot volume menu now has a sub menu for selecting a state.
The boot loader functionality for this feature is complete -- i.e. the
respective kernel is loaded and the name of the old state is added to
the kernel args -- but kernel packagefs and package daemon support is
still missing.
... in filenames. Replace the existing Unicode conversion functions
with UTF conversion functions from js that he relicensed MIT for us.
Put the UTF conversion functions in a private but shared code location
so that they can be accessed throughout the kernel.
Right now we only provide functions to convert between UTF-8 and UTF-16.
At some point we should also add functions to convert between UTF-8 and
UTF-32 and UTF-16 and UTF-32 but these aren't needed by exfat.
Remove the old Unicode conversion functions from exfat as they assumed
UCS-2 characters and don't work with UTF-16 used by exfat.
Rename most variables with the term length with code unit where code units
are intended. The term length, when used, means length in bytes while code
units represent either a full 2-byte UTF-16 character or half a 4-byte
surrogate pair.
This patch remove the old thread migration logic which used few special
cases and (broken) general check that attempted to balance threads.
The new logic is pretty straightforward and seems perform well without
any additional special cases. Current core is compared with the least loaded
one and the thread is migrated if that would result in estimated loads of
both cores (i.e. the current one and the least loaded one) to become closer
to the average load (i.e. average of that two cores).
Currently, ThreadData::ShouldRebalance() (and mode specific functions
it calls) only decides whether to migrate thread to another core or not.
However, in most cases it actually needs to find the best candidate for
new core so it could as well return that information.
After load_image() the child thread is suspended and the parent is
expected to resume it later. However, it is possible that the parent
attempts to resume its child after it has been notified that the image
had been loaded but before the child managed to suspend itself. In such
case the child would suspends itself after that wake up attempt and,
consequently will not be ever resumed.
To mitigate that problem flag Thread::going_to_suspend has been added
which helps synchronizing thread suspension and continuation in a similar
way that "traditional" thread blocking is performed. This means that
the child should behave in a following manner: set its going_to_suspend flag,
notify the parent (i.e. any thread that may want to resume it), acquire
its scheduler_lock and suspend itself if the going_to_suspend flag is set.
The parent should follow pattern: clear going_to_suspend flag of the thread
that is about to be resumed, acquire that thread scheduler_lock and enqueue
it in a run queue if it is suspended.
Thanks Oliver for reporting the bug and identifying what causes it.
Most of the actual UserEvent work is done in DPC so that we don't have
to care about the limitations of the context in which UserEvent::Fire()
is invoked. This requires appropriate management of lifetime of UserEvent
instances to make sure that DoDPC() method is always called on a valid
object.
- POSIX says the behavior for pthread_equal is undefined for
uninitialized arguments.
- However, gcc C++11 threads supports expects C++-compatible behavior,
that is, two uninitialized pthread_t should compare equal.
Avoids some runtime asserts in latest WebKit version.
In low latency mode the scheduler would not attempt to balance load
on not heavily loaded cores unless difference in load exceeded
kLoadDifference * 2 (i.e. 40 percentage points), which does not seem
to be good enough.
To make sure that load statistics are accurate on idle cores each time
idle thread is scheduled a timer is set to update load when current
load measurement interval elapses. However, core load is defined as the
average load during last measurement interval and idle core may be still
considered busy if it was not idle during entire measurement interval.
Since, load update timer is a one shot timer that information will not be
updated until the core becomes active again.
To mitigate that issue load update timer is set to fire after two load
measurement intervals had elapsed.
Should fix#10628. If there is a race condition with a writer getting
minimum or maximum from double ended heap may incorrectly result NULL.
Which is not expected in the most of the thread migration logic. Apart
from that, because of the race condition heap state may be observed as
inconsistent thus failing assertions.
ended heap
* Align all allocations of more than 8 bytes to 8-byte.
* Avoids hitting ASSERTs in WebKit when built in debug mode (it assumes
at least 8 byte alignment)
The main purpose of using atomic_get() was the necessity of a compiler
barrier to prevent the compiler from optimizing busy loops. However,
each such loop contains in its body at least one statement that acts
as a compiler barrier (namely, cpu_wait() or cpu_pause()) making
atomic_get() redundant (well, atomic_get() is stronger - it also issues
a load barrier but in these particular cases we do not need it).
If the initial attempt to acquire read spinlock fails we use more relaxed
loop (which doesn't require CPU to lock the bus). However, check in that
loop, incorrectly, didn't allow a lock to be acquired when there was at
least one other reader.
* Add isb just because.
* pdziepak pointed out that ARMv5 and before
had different barrier support.
* pdziepak also mentioned that dsb was too strong
for __sync_synchronize
* On ARMv6 or older, we do a simulated dsb.
* Move __sync_synchronize into thread.c in libroot
and use the new arch_atomic.h dsb/dmb defines.
* Gets arm @bootstrap-raw to end of bootstrap.
GCC doesn't provide an ARM implementation of it. It's easy to write one
for ARMv6 and above, while older archs will need this implemented as a
syscall just like other atomics.
We have the same problem as on x86_64: posiiton dependant code isn't
allowed in shared libraries. Since Kernel.so is not used at runtime,
we can use the same hack as on x86_64, and use elfedit to make the
linker think our kernel is a shared library.
* As per the ML discussions. Bumps MIPS to tier 3.
* We've reached a unanimous descision that MIPS doesn't
target any real / valid hardware Haiku wants to pursue
at the moment. In the event that anyone wants to pursue
MIPS, feel free to fork Haiku into your own repository
(and we'll even link to it on the website ports page)
* If someone develops a viable plan for MIPS (and gets the
port working, it can be readded at a later date)
This is required to use some SSE instructions, which are generated by
gcc 4.8, most notably when compiling WebKit code (but it may happen
elsewhere as well).
Fixes about 900 crashes and 10000 test failures in WebKit, so this must
be working. Fixes#10509 for x86.
* Use atomic_get_and_set for return value
* Atomics are no longer volatile
* Add missing arch_cpu_pause stub
* Move arch_cpu_idle to arch_cpu header to match
other architectures
* This will be used to implement compressed http streams
* Remove the custom BDataOutput class, and use BDataIO instead, for
easier integration with existing code.
The initial core assignment has to be done without any knowledge about
the thread behaviour. Moreover, short lived tasks may spent most of their
time executing on that initial core. This patch attempts to improve the
qualiti of that initial decision.
The main purpose of this patch is to eliminate the delay between thread
migration and result of that migration being visible in load statistics.
Such delay, in certain circumstances, may cause some cores to become
overloaded because the scheduler migrates too many threads to them before
the effect of migration becomes apparent.
In order to keep the scheduler tickless core load is computed and updated
only during various scheduler events (i.e. thread enqueue, reschedule, etc).
The problem it creates is that if a core becomes idle its load may remain
outdated for an extended period of time thus resulting in suboptimal thread
migration decisions.
The solution to this problem is to add a timer each time an idle thread is
scheudled which, after kLoadMeasureInterval, would fire and force load
update.
Priority penalties were made more strict in order to prevent situation
when two or more high priority threads uses up all available CPU time
in such manner that they do not receive a penalty but starve low priority
threads.
However, a significant change to thread priorites has been made since and
now priority of all non real time threads varies in a range from 1 to
static priority minus penalty. This means that the scheduler is able to
prevent thread starvation without any complex penalty policies.
Originially, core load was a sum of eastimated loads of all currently
running or ready threads on a given core. Such value is changing very
rapidly preventing the thread migration logic from making any reasonable
decisions.
This patch changes the way core load is computed to make it more stable
thus improving the qualitiy of decisions made by the thread migration logic.
Currently core load is a sum of estimated loads of all threads that have been
ready during last load measurement interval and haven't been migrated or
killed.
The main reason for this patch is to fix gcc 4.8.2 warning about
hierarchyLevels possibly being used not initialized. Such thing
actually can not happen since all x2APIC CPUs are aware of at least
3 topology levels. However, once more topology levels are introduced
we will have to deal with CPUs that do not report information about all
of them.
User timers may cause another thread to become ready in which case we would
like this to happen before scheduler_reschedule() chooses next thread to
be executed.
UserEvent can be fired from scheduler_reschedule() i.e. while holding current
thread scheduler_lock. If the current thread goes sleep and during reschedule
one of its timers sends a signel to it, then scheduler_enqueue_in_run_queue()
attempts to acquire again its scheduler_lock resulting in a deadlock.
There was also a minor issue with both scheduler_reschedule() and
scheduler_enqueue_in_run_queue() acquiring current CPU scheduler mode lock.
* Fix incorrect cpu vendor name mapping
* Add additional CPU architectures
* Add additional CPU vendors
* Rework PowerPC arch_system_info passing
PVR back for cpu model
On multisocket systems as well as under virtual machines logical CPUs
may use separate TSC. We could attempt to synchronize them what probably
would solve problems on multisocket systems. Unfortunately, when running
under hypervisor there is still a chance that TSC will get out of sync
again (e.g. cpufreq enabled on host when there is no invariant TSC). As
long as we use RDTSC as our main time source the scheduler must accept the
fact that time may go backwards (what isn't really a serious problem).
Add boot loader debug menu option "Save syslog from previous session
during boot". If enabled (defaults to true), the previous session's
debug syslog data is copy to a separate buffer and passed to the
kernel, which writes it back to the file /var/log/previous_syslog.
As long as Haiku still boots, this should now be the most convenient way
to retrieve the output from a kernel crash.
This reverts commit 667617ad043a4587d8d366d5192d9ad291cfa37a.
Scheduler profiler uses CPU local data to store function information, hence
arch_thread_context_switch() usually is not a problem. However, when
we switch to a new thread we end up scheduler_new_thread_entry() instead
of scheduler_reschedule() what may corrupt data collected by the profiler.
The symbol is needed for global objects. Usually, GCC also requires
this, but for some reason, the linking error only occurs when using
Clang.
Signed-off-by: Jérôme Duval <jerome.duval@gmail.com>
* Displays standard CPUID, and shows what the
internal CPUID used by OS.h *should* be.
* Should help out in identifying new CPU's
as all end users have to do is run sysinfo
to get the CPU info + value for OS.h
Nested functions are a (again, broken) GNU extension which is not
supported by Clang. It has been replaced by a bunch of gotos and a
variable that works as a return address.
Previous implementation based on the actual load of each core and share
each thread has in that load turned up to be very problematic when
balancing load on very heavily loaded systems (i.e. more threads
consuming all available CPU time than there is logical CPUs).
The new approach is to estimate how much load would a thread produce
if it had all CPU time only for itself. Summing such load estimations
of each thread assigned to a given core we get a rank that contains
much more information than just simple actual core load.
* Previously PE binaries would trigger the "incorrectly
executable" dialog. Now we get a special message for
B_LEGACY_EXECUTABLE and B_UNKNOWN_EXECUTABLE
* Legacy at the moment is a R3 x86 PE binary. This could
be extended to gcc2 binaries someday far, far, down the
road though
* The check for legacy is based on a PE flag I see
set on every R3 binary (that isn't set on dos ones)
* Unknown is something we know *is* an executable, but
can't do anything with (such as an MSDOS or Windows
application)
* No performance drops as we do the PE scan last
* Tested on x86 and x86_gcc2
This field forces kernel to track each CPU load all the time. It is not
a problem with the current scheduler on a multicore systems, but on
single core machnies or with any other future scheduler this field may
become just an unnecessary burden. It isn't difficult for an application
to compute CPU load by itself when it needs it.