The maximum penalty the thread can receive is now limited depending on
the real thread priority. However, since it make it possible to starve
threads with priority lower than that limit. To prevent that threads
that have already earned the maximum penalty are periodically forced
to yield CPU to all other threads.
Until now, when the thread has been preempted by higher priority
thread it was then placed at the end of its priority FIFO and given
a new time slice. This patch changes it allowing the thread to
complete its time slice (when the higher priority threads are done),
unless there was very little time left in which case this time is added
to the next time slice.
Apart from making the algorithm more fair this change allows to identify
CPU bound threads more easily. (Earlier they could 'hide' by being
preempted by higher priority thread and consequently never using
their whole time slice).
This patch appears to fix#8007.
Thread that consume its whole quantum has its priority reduced. The penalty
is cancelled when the thread voluntarily gives up CPU. Real-time threads
are not affected.
The problem of thread starvation is not solved completely. The worst case
latency is still unbounded (even in systems with bounded number of threads).
When a middle priority thread is constantly preempted by high priority
threads it would not earn the penalty, thus the lower priority threads
still can be starved. Moreover, the punishment is probably too aggressive
as it reduces priority of virtually all CPU bound threads to 1.
Since we are using libraries originally intendent for user mode in kernel
mode providing them with some userland functions is inevitable. This
particular patch is to make zlib happy and able to call exit() when
its debug assertions fails.
Latest gcc converts the old ones to the new ones anyway...
including when passing to gas, which of course is not new enough,
so we have to also force gcc to pass the old one around in one case.
If the alternate signal stack is used randomize the initial stack
pointer in the same way it is randomized on "normal" thread stacks.
Also, update MINSIGSTKSZ value so that regardless of where the new
stack pointer points to there is at least 4k of stack left.
Support for 64-bit atomic operations for ARMv7+ is currently stubbed
out in libroot, but our current targets do not use it anyway.
We now select atomics-as-syscalls automatically based on the ARM
architecture we're building for. The intent is to do away with
most of the board specifics (at the very least on the kernel side)
and just specify the lowest ARMvX version you want to build for.
This will give flexibility in being able to distribute a single
image for a wide range of devices, and building a tuned system
for one specific core type.
Now we check whether the virtual address corresponding to the PTE lies
in an allocated virtual address range. This fixes a cause of #8345:
The assertion would trigger when such an entry was encountered. There
might be other causes that trigger the same assertion, though.
This adds the -mapcs-frame compiler flag for ARM to have "stable"
stack frames, adds support to the kernel for dumping stack crawls,
and initial support for iframes. There' much more functionality
to unlock in KDL, but this makes debugging already a lot more
comfortable.....
This helps when debugging, since when a driver/module causes a crash
while registering with the device manager, you can actually look at
the device manager state ;-)
The previously used method for programming the timer did not take
into account that our timespec is 64bit while the register we poke
it into is 32 bit. Since the PXA (SoC in Verdex target) has a limited
scale of resolution (us,ms,second) we dynamicly determine the one
that we can most closely match, and set that.
For f.ex. snooze to work however, we also need system_time to work.
The current implementation uses a system timer at microsecond
resolution to keep track of time.
Although the code is far from perfect, committing it now before
it gets lost, since I'm working on the infrastructure code
to properly factor out the SoC specific code out of the core
ARM architecture code (so the kernel can support more then
our poor old Verdex QEMU target ;))
This helps when debugging, since when a driver/module causes a crash
while registering with the device manager, you can actually look at
the device manager state ;-)
The previously used method for programming the timer did not take
into account that our timespec is 64bit while the register we poke
it into is 32 bit. Since the PXA (SoC in Verdex target) has a limited
scale of resolution (us,ms,second) we dynamicly determine the one
that we can most closely match, and set that.
For f.ex. snooze to work however, we also need system_time to work.
The current implementation uses a system timer at microsecond
resolution to keep track of time.
Although the code is far from perfect, committing it now before
it gets lost, since I'm working on the infrastructure code
to properly factor out the SoC specific code out of the core
ARM architecture code (so the kernel can support more then
our poor old Verdex QEMU target ;))
Fixing the autoconf test: attempt to create file in place of already
existing symlink. On error exit put_vnode was called explicitly before
returning error. The second, implicit call to put_vnode was issued on
destroying the VNodePutter instance that references the same vnode. At
this time the vnode has references count equal to 0 so corresponding
panic was executed. Great thanks to Ingo for pointing it out!
Fixes#9140.
* adding zlib to the kernel unfortunately introduces a cyclic dependency
with respect to the zlib, haiku and haiku_devel packages (AFAICS)
* circumvent this by building kernel_zlib as a static library again,
this time with PIC, such that it can be used by kernel add-ons
This patch fix one of the compatibility issues mentioned in #3255. It
allows applications to call bind() or connect() passing an sockaddr_un
structure with a pathname that is not null-terminated.
Some systems did not require pathname in sockaddr_un::sun_path to be
null-terminated, instead the end of the string is determined by the size
of the structure passed as an argument of bind() or connect().
The standard is a bit vague in this matter but suggest that the path
should be null-terminated and the functions bind() and connect() should
be given sizeof(sockaddr_un) as a structure size.
* Both filesystems used to link to a static kernel-zlib, which
was being built with -fno-pic. This doesn't work on x86_64 as the
filesystem add-ons are meant to be relocatable, which requires their
code to be compiled as position independent.
Solve that by moving zlib into the kernel, so any add-on can just use
it from there (packagefs is mandatory, so we can't really do without
zlib anyway).
* the Virtio RNG PCI device has the class 0, so can't be found using usual
paths. Add 0 to _AlwaysRegisterDynamic() and "busses/virtio" in _GetNextDriverPath()
for non generic drivers to help finding virtio_pci.
* The RNG Virtio device is generic and needs "busses/random" to find virtio_rng.
* Mostly useful for virtualization at the moment. Works in QEmu.
* Can be enabled by safemode settings/menu.
* Please note that x2APIC normally requires use of VT-d interrupt remapping feature
on real hardware, which we don't support yet.
* All packaging architecture dependent variables do now have a
respective suffix and are set up for each configured packaging
architecture, save for the kernel and boot loader variables, which
are still only set up for the primary architecture.
For convenience TARGET_PACKAGING_ARCH, TARGET_ARCH, TARGET_LIBSUPC++,
and TARGET_LIBSTDC++ are set to the respective values for the primary
packaging architecture by default.
* Introduce a set of MultiArch* rules to help with building targets for
multiple packaging architectures. Generally the respective targets are
(additionally) gristed with the packaging architecture. For libraries
the additional grist is usually omitted for the primary architecture
(e.g. libroot.so and <x86>libroot.so for x86_gcc2/x86 hybrid), so that
Jamfiles for targets built only for the primary architecture don't
need to be changed.
* Add multi-arch build support for all targets needed for the stage 1
cross devel package as well as for libbe (untested).
devfs_io() can't fall back to calling vfs_synchronous_io(), if the
device driver doesn't support handling requests asynchronously. The
presence of the io() hook leads the VFS (do_iterative_fd_io()) to
believe that asynchronous handling is supported and set a
finished-callback on the request which calls the io() hook to start the
next chunk. Thus, instead of iterating through the request in a loop
the iteration happens recursively. For sufficiently fragmented requests
the stack may overflow (ticket #9900).
* Introduce a new vnode operation supports_operation(). It can be called
by the VFS to determine whether a present hook is actually currently
supported for a given vnode.
* devfs: implement the new hook and remove the fallback handling in
devfs_io().
* vfs_request_io.cpp: use the new hook to determine whether the io()
hook is really supported.
Although syscalls are done through SYSCALL and therefore don't actually
have an interrupt number, set it to 99 (the syscall vector on 32-bit)
in the iframe so that a syscall frame can be identified. Also added
vector/error_code to x86_64_debug_cpu_state for Debugger to use, not
sure why I didn't put them there in the first place.
* Add a VMArea* version of AddArea().
* AddAreaCacheAndLock(): Use the new AddArea() version. This not only
saves the ID hash table lookup, but also fixes a race condition with
delete_area(). delete_area() removes the area from the hash before
removing it from its cache, so iterating through the cache's areas
can turn up an area that no longer is in the hash. In that case we
would fail immediately. The new AddArea() won't fail in this
situation, though.
Fixes#9686: vm_copy_area() could fail for the "commpage" area. That's
an area all teams share, so any team terminating while another one was
fork()ing could trigger it.
We the meta data area couldn't be allocated in any of the supported
(reattachable) places, just use a static allocation. The tracing feature
wouldn't be available at all in such a case.
... in case of team creation error. Once assigned to Team::io_context
the Team object takes responsibility of the I/O context object and
releases the reference on destruction. load_image_internal() and
fork_team() were thus releasing one reference too many.
Fixes#9851.
* When looking for a place for new area the size of the area to be
inserted instead of the next area size was used to check whether
we are already past the upper bound.
* There was an attempt to insert area even if we were past the
upper bound.
* Add optional packages Zlib and Zlib-devel.
* Simplify the build feature section for zlib and also extract the
source package.
* Replace all remaining references to the zlib instance in the tree and
remove it.
Casting the difference of the two off_t values to size_t may truncate
the result. Doing so before the comparison will therefore break it.
Instead cast the size to off_t to get around the signed versus unsigned
integer expression comparison and then cast the result of the comparison
to size_t again. Should fix#9714.
This reverts commit f7176b0ee5. Citing Ingo:
"off_t is the correct type to use for addressing pages in a cache/file,
which page_num_t should only be used for physical pages." I'll see how to
fix the GCC 4.7 warnings differently :)
* error message: error: cannot bind packed field
'args->kernel_args::platform_args.platform_kernel_args::apm' to 'apm_info&'
* the reason would be that the reference doesn't have alignment information anymore.
* changed the reference to const for read access, and use the long form for setting a field.
- Instead of implicitly registering and unregistering a service
instance on construction/destruction, DefaultNotificationService
now exports explicit Register()/Unregister() calls, which subclasses
are expected to call when they're ready.
- Adjust all implementing subclasses. Resolves an issue with deadlocks
when booting a DEBUG=1 build.
* Add "bool kernel" parameter to vfs_entry_ref_to_path(), so it can be
specified for which I/O context the entry ref shall be translated.
* _user_entry_ref_to_path(): Use the calling team's I/O context instead
of the kernel's. Fixes the bug that in a chroot the syscall would
return a path for outside the chroot.
* make runtime_loader a dynammically linked object
* add kernel support for loading user images that need to be relocated
* load runtime_loader at random address
Currently there are two generators. The fast one is the same one the scheduler
is using. The standard one is the same algorithm libroot's rand() uses. Should
there be a need for more cryptographically PRNG MD4 or MD5 might be a good
candidates.
This address specification is actually not needed since PIC images can be
located anywhere. Only their size is restriced but that is the compiler and
linker concern. Thanks to Alex Smith for pointing that out.
B_ALREADY_WIRED, which was erroneously passed for the area protection
parameter to map_backing_store(), has the value 7 which implies user
readable and writable. Hence the address ranges around 0xdeadbeef and
0xcccccccc could actually be read and written from anywhere.
On some 64 bit architectures program and library images have to be mapped in
the lower 2 GB of the address space (due to instruction pointer relative
addressing). Address specification B_RANDOMIZED_IMAGE_ADDRESS ensures that
created area satisfies that requirement.
Placing commpage and team user data somewhere at the top of the user accessible
virtual address space prevents these areas from conflicting with elf images
that require to be mapped at exact address (in most cases: runtime_loader).
This patch introduces randomization of commpage position. From now on commpage
table contains offsets from begining to of the commpage to the particular
commpage entry. Similary addresses of symbols in ELF memory image "commpage"
are just offsets from the begining of the commpage.
This patch also updates KDL so that commpage entries are recognized and shown
correctly in stack trace. An update of Debugger is yet to be done.
Set execute disable bit for any page that belongs to area with neither
B_EXECUTE_AREA nor B_KERNEL_EXECUTE_AREA set.
In order to take advanage of NX bit in 32 bit protected mode PAE must be
enabled. Thus, from now on it is also enabled when the CPU supports NX bit.
vm_page_fault() takes additional argument which indicates whether page fault
was caused by an illegal instruction fetch.
x86_userspace_thread_exit() is a stub originally placed at the bottom of
each thread user stack that ensures any thread invokes exit_thread() upon
returning from its main higher level function.
Putting anything that is expected to be executed on a stack causes problems
when implementing data execution prevention. Code of x86_userspace_thread_exit()
is now moved to commpage which seems to be much more appropriate place for it.
When forking a process team user data area is not cloned but a new one is
created instead. However, the new one has to be at exactly the same address
parent's team user data area is. When process is exec then team user data
area may be recreated at random position.
This patch also make sure that instances of struct user_thread in team user
data are each in separate cache line in order to prevent false sharing since
these data are very likely to be accessed simultaneously from threads executing
on different CPUs. This change however reduces the number of threads process
can create. It is fixed by reserving 512kB of address space in case team user
data area needs to grow.
Randomized equivalent of B_ANY_ADDRESS. When a free space is found (as in
B_ANY_ADDRESS) the base adress is then randomized using _RandomizeAddress
pretty much like it is done in B_RANDOMIZED_BASE_ADDRESS.
B_RAND_BASE_ADDRESS is basically B_BASE_ADDRESS with non-deterministic created
area's base address.
Initial start address is randomized and then the algorithm looks for a large
enough free space in the interval [randomized start, end]. If it fails then
the search is repeated in the interval [original start, randomized start]. In
case it also fails the algorithm falls back to B_ANY_ADDRESS
(B_RANDOMIZED_ANY_ADDRESS when it is implemented) just like B_BASE_ADDRESS does.
Randomization range is limited by kMaxRandomize and kMaxInitialRandomize.
Inside the page randomization of initial user stack pointer is not only a part
of ASLR implementation but also a performance improvement that helps
eliminating aligned 64 kB data access.
Minimal user stack size is increased to 8 kB in order to ensure that regardless
of initial stack pointer value there is still enough space on stack.
The physical memory map area was not included in the kernel virtual
address space range (it was below KERNEL_BASE). This caused problems
if an I/O operation took place on physical memory mapped there (the
bad address error seen in #9547 was occurring in lock_memory_etc()).
Changed KERNEL_BASE and KERNEL_SIZE to cover the area and add a null
area that covers all of it. Also changed X86VMTranslationMap64Bit to
handle large pages in Query(), as the physical map area uses large
pages.
The standard states that F_GETLK should check whether given lock would be
blocked by another one and return description of the conflicting one (or
set l_type to F_UNLCK if there is no collision).
Current implementation of F_GETLK performs completely different actions, it
"Retrieves the first lock that has been set by the current team". Moreover,
if there are no locks (advisory_locking == NULL) an error is returned
instead of l_type set to F_UNLCK.
* Added the aforementioned functions.
* create_area_etc() now takes a guard size parameter.
* The thread_info::stack_base/end range now refers to the usable range
only.
As there are only 8 bits for the index in the coarse page table entries
the maximum index is 256. This makes us correctly move to the next page
directory once we've run through all entries. Fixes missing unmap of
pages that crossed that boundary and consequent panic "page still has
mappings" when the page was removed from a cache.
These can be used for on-screen debug output with relatively little
effort, as they just need a plain framebuffer definition to work.
Some stubs are added to not clutter up the kernel sources with too
many ifdefs.
When iterator->current is NULL, hash_next() assumes we've reached the
end of a bucket (linked list) and moves to the next one. Wehn the first
element of a linked list was removed in hash_remove_current()
iterator->current was set to NULL, causing the next call to hash_next()
to skip over the rest of the list of that bucket.
To fix this we now decrement iterator->bucket by one, so the next call
to hash_next() correctly arrives at the new first element of the same
bucket. Doing it this way avoids having to search backwards through the
table to find the actual previous item.
This caused modules to be skipped in module_init_post_boot_device()
when normalizing module image paths so some of the module images ended
up non-normalized. This could then cause images to be loaded a second
time for modules that were part of an actually already loaded image.
This setup is present for the PCI module with the pci/x86 submodule
and would lead to a second copy of the PCI module image to be loaded
but without being initialized, eventually leading to #8684.
The affected module images were pretty much random, as it depended on
the order in which they were loaded from the file system, in this case
the boot floppy archive of the El-Torito boot part of ISO and anyboot
images. The r1alpha4 release images unfortunately had the module files
ordered in the archive just so that the PCI module image would be
skipped, allowing #8684 to happen on many systems with MSI support.
Since the block cache uses hash_remove_current() as well in some cases,
it is possible that transactions in its list could've been skipped.
Cursory testing didn't reveal this to be a usual case, and it is
possible that in the pattern it is used there, the bug wouldn't be
triggered at all. It's still possible that it caused rare misbehaviour
though.
Remove the dummies from the C code and implement them in assembly,
due to the label referencing issues with the fault handler.
This code is ripe for optimisation, my ARM assembly is pretty
basic ;)
Does work though, and gets us one step closer to a full arch.
This also implements the fault handler correctly now, and cleans up the
exception handling. Seems a lot more stable now, no unexpected panics or
faults happening anymore.
This will generate asm_offsets.h which makes our assembly code
easier to maintain by preventing hardcoded offsets for fields within structures.
(copied from X86 and removed the X86 specifics)