The longer core is idle the deeper idle state it has entered. That's
why the scheduler should always choose the core that has gone idle
most recently (both for performance and power saving reasons).
Moreover, if there are more than one package the scheduler should
minimize the number of packages with at least one core active when
power saving is the priority. Contrary, as many packages as possible
should be used when aiming for high performance.
There is a global heap of cores, where the key is the highest priority
of threads running on that core. Moreover, for each core there is
a heap of logical processors on this core where the key is the priority
of currently running thread.
The per-core heap is used for load balancing among logical processors
on that core. The global heap is used in initial decision where to put
the thread (note that the algorithm that makes this decision is not
complete yet).
The scheduler is in very early stage. There is no thread migration and
the algorithms choosing CPU for thread are very simple.
Since affine scheduler is going to use one run queue per core simple on
single core machines it will work exactly the same as simple scheduler.
That would allow us to have only one scheduler implementation usable
on all kinds of machines.
Simple scheduler is used when we do not have to worry about cache affinity
(i.e. single core with or without SMT, multicore with all cache levels
shared).
When we replace gSchedulerLock with more fine grained locking affine
scheduler should also be chosen when logical CPU count is high (regardless
of cache).
In SMP systems simple scheduler will be used only when all logical
processors share all levels of cache and the number of CPUs is low.
In such systems we do not have to care about cache affinity and
the contention on the lock protecting shared run queue is low. Single
run queue makes load balancing very simple.
Kernel support for yielding to all (including lower priority) threads
has been removed. POSIX sched_yield() remains unchanged.
If a thread really needs to yield to everyone it can reduce its priority
to the lowest possible and then yield (it will then need to manually
return to its prvious priority upon continuing).
Each thread has its minimal priority that depends on the static priority.
However, it is still able to starve threads with even lower priority
(e.g. CPU bound threads with lower static priority). To prevent this
another penalty is introduced. When the minimal priority is reached
penalty (count mod minimal_priority) is added, where count is the number
of time slices since the thread reached its minimal priority. This prevents
starvation of lower priorirt threads (since all CPU bound threads may have
their priority temporaily reduced to 1) but preserves relation between
static priorities - when there are two CPU bound threads the one with
higher static priority would get more CPU time.
The maximum penalty the thread can receive is now limited depending on
the real thread priority. However, since it make it possible to starve
threads with priority lower than that limit. To prevent that threads
that have already earned the maximum penalty are periodically forced
to yield CPU to all other threads.
Until now, when the thread has been preempted by higher priority
thread it was then placed at the end of its priority FIFO and given
a new time slice. This patch changes it allowing the thread to
complete its time slice (when the higher priority threads are done),
unless there was very little time left in which case this time is added
to the next time slice.
Apart from making the algorithm more fair this change allows to identify
CPU bound threads more easily. (Earlier they could 'hide' by being
preempted by higher priority thread and consequently never using
their whole time slice).
This patch appears to fix#8007.
Thread that consume its whole quantum has its priority reduced. The penalty
is cancelled when the thread voluntarily gives up CPU. Real-time threads
are not affected.
The problem of thread starvation is not solved completely. The worst case
latency is still unbounded (even in systems with bounded number of threads).
When a middle priority thread is constantly preempted by high priority
threads it would not earn the penalty, thus the lower priority threads
still can be starved. Moreover, the punishment is probably too aggressive
as it reduces priority of virtually all CPU bound threads to 1.
Since we are using libraries originally intendent for user mode in kernel
mode providing them with some userland functions is inevitable. This
particular patch is to make zlib happy and able to call exit() when
its debug assertions fails.
Latest gcc converts the old ones to the new ones anyway...
including when passing to gas, which of course is not new enough,
so we have to also force gcc to pass the old one around in one case.
jam fails in execve() trying to run the command due to
a too large arguments list because of the many objects in libgcc.
We split them into two intermediate objects,
then we link them to libroot.
* __pthread_destroy_thread() will in turn free the pthread_thread object.
* this fixes a leak of 2072 bytes on each thread construction/destruction
and #9945. MediaExtractor spawns a thread on construction, which leaked
its pthread_thread object on destuction.
If the alternate signal stack is used randomize the initial stack
pointer in the same way it is randomized on "normal" thread stacks.
Also, update MINSIGSTKSZ value so that regardless of where the new
stack pointer points to there is at least 4k of stack left.
Support for 64-bit atomic operations for ARMv7+ is currently stubbed
out in libroot, but our current targets do not use it anyway.
We now select atomics-as-syscalls automatically based on the ARM
architecture we're building for. The intent is to do away with
most of the board specifics (at the very least on the kernel side)
and just specify the lowest ARMvX version you want to build for.
This will give flexibility in being able to distribute a single
image for a wide range of devices, and building a tuned system
for one specific core type.
Now we check whether the virtual address corresponding to the PTE lies
in an allocated virtual address range. This fixes a cause of #8345:
The assertion would trigger when such an entry was encountered. There
might be other causes that trigger the same assertion, though.
This adds the -mapcs-frame compiler flag for ARM to have "stable"
stack frames, adds support to the kernel for dumping stack crawls,
and initial support for iframes. There' much more functionality
to unlock in KDL, but this makes debugging already a lot more
comfortable.....
This helps when debugging, since when a driver/module causes a crash
while registering with the device manager, you can actually look at
the device manager state ;-)
The previously used method for programming the timer did not take
into account that our timespec is 64bit while the register we poke
it into is 32 bit. Since the PXA (SoC in Verdex target) has a limited
scale of resolution (us,ms,second) we dynamicly determine the one
that we can most closely match, and set that.
For f.ex. snooze to work however, we also need system_time to work.
The current implementation uses a system timer at microsecond
resolution to keep track of time.
Although the code is far from perfect, committing it now before
it gets lost, since I'm working on the infrastructure code
to properly factor out the SoC specific code out of the core
ARM architecture code (so the kernel can support more then
our poor old Verdex QEMU target ;))
The "blobs" in a U-Boot uimage are aligned at 4 bytes, which we
did not take into account. Found this when adding a 3rd blob
containing the Flattened Device Tree for ARM.
Turns out dd on MacOS does not like '1M' as size descriptor, but
wants '1m'. To prevent us breaking Linux builds (as it does not
accept 1m), just use the actual number of bytes explicitely instead.
This helps when debugging, since when a driver/module causes a crash
while registering with the device manager, you can actually look at
the device manager state ;-)
The previously used method for programming the timer did not take
into account that our timespec is 64bit while the register we poke
it into is 32 bit. Since the PXA (SoC in Verdex target) has a limited
scale of resolution (us,ms,second) we dynamicly determine the one
that we can most closely match, and set that.
For f.ex. snooze to work however, we also need system_time to work.
The current implementation uses a system timer at microsecond
resolution to keep track of time.
Although the code is far from perfect, committing it now before
it gets lost, since I'm working on the infrastructure code
to properly factor out the SoC specific code out of the core
ARM architecture code (so the kernel can support more then
our poor old Verdex QEMU target ;))
The "blobs" in a U-Boot uimage are aligned at 4 bytes, which we
did not take into account. Found this when adding a 3rd blob
containing the Flattened Device Tree for ARM.
Turns out dd on MacOS does not like '1M' as size descriptor, but
wants '1m'. To prevent us breaking Linux builds (as it does not
accept 1m), just use the actual number of bytes explicitely instead.
Fixing the autoconf test: attempt to create file in place of already
existing symlink. On error exit put_vnode was called explicitly before
returning error. The second, implicit call to put_vnode was issued on
destroying the VNodePutter instance that references the same vnode. At
this time the vnode has references count equal to 0 so corresponding
panic was executed. Great thanks to Ingo for pointing it out!
Fixes#9140.
* adding zlib to the kernel unfortunately introduces a cyclic dependency
with respect to the zlib, haiku and haiku_devel packages (AFAICS)
* circumvent this by building kernel_zlib as a static library again,
this time with PIC, such that it can be used by kernel add-ons
This patch fix one of the compatibility issues mentioned in #3255. It
allows applications to call bind() or connect() passing an sockaddr_un
structure with a pathname that is not null-terminated.
Some systems did not require pathname in sockaddr_un::sun_path to be
null-terminated, instead the end of the string is determined by the size
of the structure passed as an argument of bind() or connect().
The standard is a bit vague in this matter but suggest that the path
should be null-terminated and the functions bind() and connect() should
be given sizeof(sockaddr_un) as a structure size.
* Both filesystems used to link to a static kernel-zlib, which
was being built with -fno-pic. This doesn't work on x86_64 as the
filesystem add-ons are meant to be relocatable, which requires their
code to be compiled as position independent.
Solve that by moving zlib into the kernel, so any add-on can just use
it from there (packagefs is mandatory, so we can't really do without
zlib anyway).
* the Virtio RNG PCI device has the class 0, so can't be found using usual
paths. Add 0 to _AlwaysRegisterDynamic() and "busses/virtio" in _GetNextDriverPath()
for non generic drivers to help finding virtio_pci.
* The RNG Virtio device is generic and needs "busses/random" to find virtio_rng.
* Mostly useful for virtualization at the moment. Works in QEmu.
* Can be enabled by safemode settings/menu.
* Please note that x2APIC normally requires use of VT-d interrupt remapping feature
on real hardware, which we don't support yet.
* fix unitialized variables in __printf_fphex() in case of architectures
without support for long double - this triggered unreliable results
or crashes when using %La or %La on x86
* activate long double implementation in use for x86_64 for x86, too,
as they share the long double format
(cherry picked from commit d1716b277c)
* fix unitialized variables in __printf_fphex() in case of architectures
without support for long double - this triggered unreliable results
or crashes when using %La or %La on x86
* activate long double implementation in use for x86_64 for x86, too,
as they share the long double format
* For the comparison cast the character parameter to char as required
by the spec.
* Fix broken handling of strrchr(..., 0). It is supposed to return a
pointer to the end of the string. It did return a pointer to the
start.
* All packaging architecture dependent variables do now have a
respective suffix and are set up for each configured packaging
architecture, save for the kernel and boot loader variables, which
are still only set up for the primary architecture.
For convenience TARGET_PACKAGING_ARCH, TARGET_ARCH, TARGET_LIBSUPC++,
and TARGET_LIBSTDC++ are set to the respective values for the primary
packaging architecture by default.
* Introduce a set of MultiArch* rules to help with building targets for
multiple packaging architectures. Generally the respective targets are
(additionally) gristed with the packaging architecture. For libraries
the additional grist is usually omitted for the primary architecture
(e.g. libroot.so and <x86>libroot.so for x86_gcc2/x86 hybrid), so that
Jamfiles for targets built only for the primary architecture don't
need to be changed.
* Add multi-arch build support for all targets needed for the stage 1
cross devel package as well as for libbe (untested).
devfs_io() can't fall back to calling vfs_synchronous_io(), if the
device driver doesn't support handling requests asynchronously. The
presence of the io() hook leads the VFS (do_iterative_fd_io()) to
believe that asynchronous handling is supported and set a
finished-callback on the request which calls the io() hook to start the
next chunk. Thus, instead of iterating through the request in a loop
the iteration happens recursively. For sufficiently fragmented requests
the stack may overflow (ticket #9900).
* Introduce a new vnode operation supports_operation(). It can be called
by the VFS to determine whether a present hook is actually currently
supported for a given vnode.
* devfs: implement the new hook and remove the fallback handling in
devfs_io().
* vfs_request_io.cpp: use the new hook to determine whether the io()
hook is really supported.
Although syscalls are done through SYSCALL and therefore don't actually
have an interrupt number, set it to 99 (the syscall vector on 32-bit)
in the iframe so that a syscall frame can be identified. Also added
vector/error_code to x86_64_debug_cpu_state for Debugger to use, not
sure why I didn't put them there in the first place.
* Add a VMArea* version of AddArea().
* AddAreaCacheAndLock(): Use the new AddArea() version. This not only
saves the ID hash table lookup, but also fixes a race condition with
delete_area(). delete_area() removes the area from the hash before
removing it from its cache, so iterating through the cache's areas
can turn up an area that no longer is in the hash. In that case we
would fail immediately. The new AddArea() won't fail in this
situation, though.
Fixes#9686: vm_copy_area() could fail for the "commpage" area. That's
an area all teams share, so any team terminating while another one was
fork()ing could trigger it.
We the meta data area couldn't be allocated in any of the supported
(reattachable) places, just use a static allocation. The tracing feature
wouldn't be available at all in such a case.
In debug_cleanup(), if the debug syslog buffer is disabled (the default when
KDEBUG_LEVEL is 0), then a new buffer is allocated with kernel_args_malloc().
This is done after kernel_args addresses have been converted to 64-bit, so
the address the kernel gets will be 32-bit, resulting in the page fault seen
in #9842. Fixed by moving the call to debug_cleanup() to before
convert_kernel_args().
... in case of team creation error. Once assigned to Team::io_context
the Team object takes responsibility of the I/O context object and
releases the reference on destruction. load_image_internal() and
fork_team() were thus releasing one reference too many.
Fixes#9851.
* In case the locale backend could not be loaded, these functions (and
their reentrant counterparts) just returned an error. So we reactivate
parts of the BSD-/Olson-implementation in localtime_fading_out.c in
order to use them as fallback.
* Cleanup localtime_fading_out.c (remove a lot of unused cruft).
* all those functions need to return the given wc unchanged in case of
error, not 0
* towctrans() didn't actually look at the requested transition, but
always acted as if _ISlower was given