This helps when debugging, since when a driver/module causes a crash
while registering with the device manager, you can actually look at
the device manager state ;-)
The previously used method for programming the timer did not take
into account that our timespec is 64bit while the register we poke
it into is 32 bit. Since the PXA (SoC in Verdex target) has a limited
scale of resolution (us,ms,second) we dynamicly determine the one
that we can most closely match, and set that.
For f.ex. snooze to work however, we also need system_time to work.
The current implementation uses a system timer at microsecond
resolution to keep track of time.
Although the code is far from perfect, committing it now before
it gets lost, since I'm working on the infrastructure code
to properly factor out the SoC specific code out of the core
ARM architecture code (so the kernel can support more then
our poor old Verdex QEMU target ;))
Fixing the autoconf test: attempt to create file in place of already
existing symlink. On error exit put_vnode was called explicitly before
returning error. The second, implicit call to put_vnode was issued on
destroying the VNodePutter instance that references the same vnode. At
this time the vnode has references count equal to 0 so corresponding
panic was executed. Great thanks to Ingo for pointing it out!
Fixes#9140.
* adding zlib to the kernel unfortunately introduces a cyclic dependency
with respect to the zlib, haiku and haiku_devel packages (AFAICS)
* circumvent this by building kernel_zlib as a static library again,
this time with PIC, such that it can be used by kernel add-ons
This patch fix one of the compatibility issues mentioned in #3255. It
allows applications to call bind() or connect() passing an sockaddr_un
structure with a pathname that is not null-terminated.
Some systems did not require pathname in sockaddr_un::sun_path to be
null-terminated, instead the end of the string is determined by the size
of the structure passed as an argument of bind() or connect().
The standard is a bit vague in this matter but suggest that the path
should be null-terminated and the functions bind() and connect() should
be given sizeof(sockaddr_un) as a structure size.
* Both filesystems used to link to a static kernel-zlib, which
was being built with -fno-pic. This doesn't work on x86_64 as the
filesystem add-ons are meant to be relocatable, which requires their
code to be compiled as position independent.
Solve that by moving zlib into the kernel, so any add-on can just use
it from there (packagefs is mandatory, so we can't really do without
zlib anyway).
* the Virtio RNG PCI device has the class 0, so can't be found using usual
paths. Add 0 to _AlwaysRegisterDynamic() and "busses/virtio" in _GetNextDriverPath()
for non generic drivers to help finding virtio_pci.
* The RNG Virtio device is generic and needs "busses/random" to find virtio_rng.
* Mostly useful for virtualization at the moment. Works in QEmu.
* Can be enabled by safemode settings/menu.
* Please note that x2APIC normally requires use of VT-d interrupt remapping feature
on real hardware, which we don't support yet.
* All packaging architecture dependent variables do now have a
respective suffix and are set up for each configured packaging
architecture, save for the kernel and boot loader variables, which
are still only set up for the primary architecture.
For convenience TARGET_PACKAGING_ARCH, TARGET_ARCH, TARGET_LIBSUPC++,
and TARGET_LIBSTDC++ are set to the respective values for the primary
packaging architecture by default.
* Introduce a set of MultiArch* rules to help with building targets for
multiple packaging architectures. Generally the respective targets are
(additionally) gristed with the packaging architecture. For libraries
the additional grist is usually omitted for the primary architecture
(e.g. libroot.so and <x86>libroot.so for x86_gcc2/x86 hybrid), so that
Jamfiles for targets built only for the primary architecture don't
need to be changed.
* Add multi-arch build support for all targets needed for the stage 1
cross devel package as well as for libbe (untested).
devfs_io() can't fall back to calling vfs_synchronous_io(), if the
device driver doesn't support handling requests asynchronously. The
presence of the io() hook leads the VFS (do_iterative_fd_io()) to
believe that asynchronous handling is supported and set a
finished-callback on the request which calls the io() hook to start the
next chunk. Thus, instead of iterating through the request in a loop
the iteration happens recursively. For sufficiently fragmented requests
the stack may overflow (ticket #9900).
* Introduce a new vnode operation supports_operation(). It can be called
by the VFS to determine whether a present hook is actually currently
supported for a given vnode.
* devfs: implement the new hook and remove the fallback handling in
devfs_io().
* vfs_request_io.cpp: use the new hook to determine whether the io()
hook is really supported.
Although syscalls are done through SYSCALL and therefore don't actually
have an interrupt number, set it to 99 (the syscall vector on 32-bit)
in the iframe so that a syscall frame can be identified. Also added
vector/error_code to x86_64_debug_cpu_state for Debugger to use, not
sure why I didn't put them there in the first place.
* Add a VMArea* version of AddArea().
* AddAreaCacheAndLock(): Use the new AddArea() version. This not only
saves the ID hash table lookup, but also fixes a race condition with
delete_area(). delete_area() removes the area from the hash before
removing it from its cache, so iterating through the cache's areas
can turn up an area that no longer is in the hash. In that case we
would fail immediately. The new AddArea() won't fail in this
situation, though.
Fixes#9686: vm_copy_area() could fail for the "commpage" area. That's
an area all teams share, so any team terminating while another one was
fork()ing could trigger it.
We the meta data area couldn't be allocated in any of the supported
(reattachable) places, just use a static allocation. The tracing feature
wouldn't be available at all in such a case.
... in case of team creation error. Once assigned to Team::io_context
the Team object takes responsibility of the I/O context object and
releases the reference on destruction. load_image_internal() and
fork_team() were thus releasing one reference too many.
Fixes#9851.
* When looking for a place for new area the size of the area to be
inserted instead of the next area size was used to check whether
we are already past the upper bound.
* There was an attempt to insert area even if we were past the
upper bound.
* Add optional packages Zlib and Zlib-devel.
* Simplify the build feature section for zlib and also extract the
source package.
* Replace all remaining references to the zlib instance in the tree and
remove it.
Casting the difference of the two off_t values to size_t may truncate
the result. Doing so before the comparison will therefore break it.
Instead cast the size to off_t to get around the signed versus unsigned
integer expression comparison and then cast the result of the comparison
to size_t again. Should fix#9714.
This reverts commit f7176b0ee5. Citing Ingo:
"off_t is the correct type to use for addressing pages in a cache/file,
which page_num_t should only be used for physical pages." I'll see how to
fix the GCC 4.7 warnings differently :)
* error message: error: cannot bind packed field
'args->kernel_args::platform_args.platform_kernel_args::apm' to 'apm_info&'
* the reason would be that the reference doesn't have alignment information anymore.
* changed the reference to const for read access, and use the long form for setting a field.
- Instead of implicitly registering and unregistering a service
instance on construction/destruction, DefaultNotificationService
now exports explicit Register()/Unregister() calls, which subclasses
are expected to call when they're ready.
- Adjust all implementing subclasses. Resolves an issue with deadlocks
when booting a DEBUG=1 build.
* Add "bool kernel" parameter to vfs_entry_ref_to_path(), so it can be
specified for which I/O context the entry ref shall be translated.
* _user_entry_ref_to_path(): Use the calling team's I/O context instead
of the kernel's. Fixes the bug that in a chroot the syscall would
return a path for outside the chroot.
* make runtime_loader a dynammically linked object
* add kernel support for loading user images that need to be relocated
* load runtime_loader at random address
Currently there are two generators. The fast one is the same one the scheduler
is using. The standard one is the same algorithm libroot's rand() uses. Should
there be a need for more cryptographically PRNG MD4 or MD5 might be a good
candidates.
This address specification is actually not needed since PIC images can be
located anywhere. Only their size is restriced but that is the compiler and
linker concern. Thanks to Alex Smith for pointing that out.
B_ALREADY_WIRED, which was erroneously passed for the area protection
parameter to map_backing_store(), has the value 7 which implies user
readable and writable. Hence the address ranges around 0xdeadbeef and
0xcccccccc could actually be read and written from anywhere.
On some 64 bit architectures program and library images have to be mapped in
the lower 2 GB of the address space (due to instruction pointer relative
addressing). Address specification B_RANDOMIZED_IMAGE_ADDRESS ensures that
created area satisfies that requirement.
Placing commpage and team user data somewhere at the top of the user accessible
virtual address space prevents these areas from conflicting with elf images
that require to be mapped at exact address (in most cases: runtime_loader).
This patch introduces randomization of commpage position. From now on commpage
table contains offsets from begining to of the commpage to the particular
commpage entry. Similary addresses of symbols in ELF memory image "commpage"
are just offsets from the begining of the commpage.
This patch also updates KDL so that commpage entries are recognized and shown
correctly in stack trace. An update of Debugger is yet to be done.
Set execute disable bit for any page that belongs to area with neither
B_EXECUTE_AREA nor B_KERNEL_EXECUTE_AREA set.
In order to take advanage of NX bit in 32 bit protected mode PAE must be
enabled. Thus, from now on it is also enabled when the CPU supports NX bit.
vm_page_fault() takes additional argument which indicates whether page fault
was caused by an illegal instruction fetch.
x86_userspace_thread_exit() is a stub originally placed at the bottom of
each thread user stack that ensures any thread invokes exit_thread() upon
returning from its main higher level function.
Putting anything that is expected to be executed on a stack causes problems
when implementing data execution prevention. Code of x86_userspace_thread_exit()
is now moved to commpage which seems to be much more appropriate place for it.
When forking a process team user data area is not cloned but a new one is
created instead. However, the new one has to be at exactly the same address
parent's team user data area is. When process is exec then team user data
area may be recreated at random position.
This patch also make sure that instances of struct user_thread in team user
data are each in separate cache line in order to prevent false sharing since
these data are very likely to be accessed simultaneously from threads executing
on different CPUs. This change however reduces the number of threads process
can create. It is fixed by reserving 512kB of address space in case team user
data area needs to grow.
Randomized equivalent of B_ANY_ADDRESS. When a free space is found (as in
B_ANY_ADDRESS) the base adress is then randomized using _RandomizeAddress
pretty much like it is done in B_RANDOMIZED_BASE_ADDRESS.
B_RAND_BASE_ADDRESS is basically B_BASE_ADDRESS with non-deterministic created
area's base address.
Initial start address is randomized and then the algorithm looks for a large
enough free space in the interval [randomized start, end]. If it fails then
the search is repeated in the interval [original start, randomized start]. In
case it also fails the algorithm falls back to B_ANY_ADDRESS
(B_RANDOMIZED_ANY_ADDRESS when it is implemented) just like B_BASE_ADDRESS does.
Randomization range is limited by kMaxRandomize and kMaxInitialRandomize.
Inside the page randomization of initial user stack pointer is not only a part
of ASLR implementation but also a performance improvement that helps
eliminating aligned 64 kB data access.
Minimal user stack size is increased to 8 kB in order to ensure that regardless
of initial stack pointer value there is still enough space on stack.
The physical memory map area was not included in the kernel virtual
address space range (it was below KERNEL_BASE). This caused problems
if an I/O operation took place on physical memory mapped there (the
bad address error seen in #9547 was occurring in lock_memory_etc()).
Changed KERNEL_BASE and KERNEL_SIZE to cover the area and add a null
area that covers all of it. Also changed X86VMTranslationMap64Bit to
handle large pages in Query(), as the physical map area uses large
pages.
The standard states that F_GETLK should check whether given lock would be
blocked by another one and return description of the conflicting one (or
set l_type to F_UNLCK if there is no collision).
Current implementation of F_GETLK performs completely different actions, it
"Retrieves the first lock that has been set by the current team". Moreover,
if there are no locks (advisory_locking == NULL) an error is returned
instead of l_type set to F_UNLCK.
* Added the aforementioned functions.
* create_area_etc() now takes a guard size parameter.
* The thread_info::stack_base/end range now refers to the usable range
only.
As there are only 8 bits for the index in the coarse page table entries
the maximum index is 256. This makes us correctly move to the next page
directory once we've run through all entries. Fixes missing unmap of
pages that crossed that boundary and consequent panic "page still has
mappings" when the page was removed from a cache.
These can be used for on-screen debug output with relatively little
effort, as they just need a plain framebuffer definition to work.
Some stubs are added to not clutter up the kernel sources with too
many ifdefs.
When iterator->current is NULL, hash_next() assumes we've reached the
end of a bucket (linked list) and moves to the next one. Wehn the first
element of a linked list was removed in hash_remove_current()
iterator->current was set to NULL, causing the next call to hash_next()
to skip over the rest of the list of that bucket.
To fix this we now decrement iterator->bucket by one, so the next call
to hash_next() correctly arrives at the new first element of the same
bucket. Doing it this way avoids having to search backwards through the
table to find the actual previous item.
This caused modules to be skipped in module_init_post_boot_device()
when normalizing module image paths so some of the module images ended
up non-normalized. This could then cause images to be loaded a second
time for modules that were part of an actually already loaded image.
This setup is present for the PCI module with the pci/x86 submodule
and would lead to a second copy of the PCI module image to be loaded
but without being initialized, eventually leading to #8684.
The affected module images were pretty much random, as it depended on
the order in which they were loaded from the file system, in this case
the boot floppy archive of the El-Torito boot part of ISO and anyboot
images. The r1alpha4 release images unfortunately had the module files
ordered in the archive just so that the PCI module image would be
skipped, allowing #8684 to happen on many systems with MSI support.
Since the block cache uses hash_remove_current() as well in some cases,
it is possible that transactions in its list could've been skipped.
Cursory testing didn't reveal this to be a usual case, and it is
possible that in the pattern it is used there, the bug wouldn't be
triggered at all. It's still possible that it caused rare misbehaviour
though.
Remove the dummies from the C code and implement them in assembly,
due to the label referencing issues with the fault handler.
This code is ripe for optimisation, my ARM assembly is pretty
basic ;)
Does work though, and gets us one step closer to a full arch.
This also implements the fault handler correctly now, and cleans up the
exception handling. Seems a lot more stable now, no unexpected panics or
faults happening anymore.
This will generate asm_offsets.h which makes our assembly code
easier to maintain by preventing hardcoded offsets for fields within structures.
(copied from X86 and removed the X86 specifics)
* don't enforce a zero boundary or a zero alignment
* when going to the next range, takes alignment into account.
It could previously just be enforced again through alignment and loop infinite.
* it should help with some FreeBSD based drivers
This contains both the common ARM(v5) vector handling as well as
the PXA(verdex) specific interrupt controller code, to be seperated
when ARM support for FDT is implemented.
Functional enough to handle interrupts, needs work on KDL support.
* The only implementation that would accept more than 2 TB was the one in
scsi_disk. But even that one was limited to 63 TB.
* Now there is a new utility function devfs_compute_geometry_size() which
does it correctly for sizes up to 2^64 which should be good enough for
quite some time :-)
* This fixes bug #8992.
The function fill_team_info() completely ignored the user id and the
group id of the process (fields info->uid and info->gid respectively).
Since the info structure was zeroed earlier, the ps output showed uid
and gid of each process equal to zero.
The patch fixes the problem by properly initializing the members with
effective uid and gid. Now the output is correct.
Fixes#8995.
Signed-off-by: Ryan Leavengood <leavengood@gmail.com>
* For now this allows linking the kernel and the pci bus manager.
* Could later on be turned into a wrapper to FDT methods since the
concepts are similar.
* When a block was only used in a sub-transaction, it was thrown away,
but the transaction::num_blocks field was not decremented.
* This caused transactions never considered finished which eventually
led to bug #8942. This does not explain the disk corruption occurring
in #8969, though.
sFreeAreaCount wasn't decremented after removing an area from
sFreeAreas, thus causing the loop to continue until enountering and
crashing on a NULL pointer after removing the last area. Introduce
helper methods _PushFreeArea() and _PopFreeArea() to ensure this cannot
easily happen again.
Fixes ticket #8972.
* Avoid floating point numbers in the kernel
* Warning would always show if custom swap file in use
* Don't change a custom swap file size if low space occurs
* Ram > 1GB? Don't double the memory for the automatic size
* Heavily based on Hamish Morrison's GCI work with some
modified logic and cleanup. #3723
* Adds automatic swap as well as user specified swap
* Limits:
Automatic: (ram * 2) up to 25% of the disk
User: user specified up to 90% of the disk
* Supports changing the swap disk location
* The ASSERT() I introduced in r44585 was incorrect: when the sub transaction
used block_cache_get_empty() to get the block, there is no original_data for
a reason.
* Added a test case that reproduces this situation.
* The block must be moved to the unused list in this situation, though, or else
it might contain invalid data. Since the block can only be allocated in the
current transaction, this should not be a problem, though, AFAICT.
The lowest 4 bits of the MSR serves as a hint to the hardware to
favor performance or energy saving. 0 means a hint preference for
highest performance while 15 corresponds to the maximum energy
savings. A value of 7 translates into a hint to balance performance
with energy savings.
The default reset value of the MSR is 0. If BIOS doesn't intialize
the MSR, the hardware will run in performance state. This patch
initialize the MSR with value of 7 for balance between performance
and energy savings
Signed-off-by: Fredrik Holmqvist <fredrik.holmqvist@gmail.com>
* In cache_abort_sub_transaction(), the original_data can already be freed
when the block is being removed from the transaction.
* block_cache::_GetUnusedBlock() no longer frees original/parent data - it
now requires them to be freed already (it makes no sense to have them still
around at this point).
* AFAICT the previous version did not have any negative consequences besides
freeing the original data late.
* cache_abort_sub_transaction() was setting the transaction_next pointer to
NULL in order to remove a block from a transaction -- however, it forgot to
actually remove it from the transaction's block list.
* Minor restructuring.
Renamed {32,64}/int.cpp to {32,64}/descriptors.cpp, which now contain
functions for GDT and TSS setup that were previously in arch_cpu.cpp,
as well as the IDT setup code. These get called from the init functions
in arch_cpu.cpp, rather than having a bunch of ifdef'd chunks of code
for 32/64.
- If a trace entry has a stack trace, attempt to demangle the associated symbols.
Could be enhanced further to also demangle the arguments but doesn't yet.
Interestingly there are some mangled symbols that our demangler appears to
not handle correctly (gcc4).
Kernel mode code on x86_64 needs to be built with -mno-red-zone as
interrupts would corrupt the red zone if it were in use. However, the
kernel is linked with libsupc++, which was not compiled with
-mno-red-zone. If an interrupt occurred in libsupc++ code the red zone
would get corrupted. This was causing random panics, particularly under
heavy system load. Therefore, on x86_64 a separate build of libsupc++
with -mno-red-zone is now done for the kernel to use. Note: this commit
will require a rerun of configure and rebuild of cross tools.
Initializing the IO-APIC will initialize the PCI module, which does
read the MSI config of the devices only when MSIs are available.
Since we initialized them only after that, that condition wasn't met.
Later, due to the uninitialized arch info, MSIs were still marked as
available (0xcc = 204 MSIs). Due to the also uninitialized configured
count, they were always deemed busy however, in effect just breaking
MSI support whereever IO-APICs were available.
This bug was introduced by changing IS_USER_ADDRESS to check against
USER_BASE AND USER_TOP rather than just !IS_KERNEL_ADDRESS. Faults
on addresses outside both the user and kernel address spaces (i.e. the
gap between user and kernel) would result in addressSpace being NULL,
but addressSpace was being used without checking for NULL at one point.
If an uncanonical address is accessed a general protection fault will
be raised. When in the debugger, uncanonical address faults should be
handled by the fault handler (if any).
Reused x86 arch_user_debugger.cpp, with a few minor changes to make
the code work for both 32 and 64 bit. Something isn't quite working
right, if a breakpoint is hit the kernel will hang. Other than that
everything appears to work correctly.
* Changed IS_USER_ADDRESS to check an address using USER_BASE and
USER_SIZE, rather than just !IS_KERNEL_ADDRESS. The old check would
allow user buffers to point into the physical memory map area.
* Added an unmapped hole at the end of the bottom half of the address
space which catches buffers that cross into the uncanonical address
region. This also removes the need to check for uncanonical return
addresses in the syscall handler, it is no longer possible for the
return address to be uncanonical under normal circumstances. All
cases in which the return address might be changed by the kernel
are still handled via the IRET path.
The USER_BASE_ANY definition exists to specify where to start searching
for B_ANY_ADDRESS allocations, but this was not being used correctly.
On x86_64, this was causing the runtime loader's heap to be allocated
at address 0 so NULL pointer accesses were not getting caught.
The cookie is used to store the base address of the area that was just
visited. On 64-bit systems, int32 is not sufficient. Therefore, changed
to ssize_t which retains compatibility on x86 while expanding to a
sufficient size on x86_64.
The interrupt and system call handlers now perform all the necessary
kernel entry/exit work, and the system call handler now handles calls
with more than 6 arguments. Debugging and system call tracing hooks
are not yet called, will be added when user debugging gets implemented.
Userland switch is implemented, as is basic system call support (using
SYSCALL/SYSRET). The system call handler is not yet complete: it doesn't
handle more than 6 arguments, and does not perform all the necessary kernel
entry/exit work (neither does the interrupt handler). However, this is
sufficient for runtime_loader to start and print some debug output.
Since the demangle debugger extension now gets loaded when booting
from an image, use it in stack traces. Can't print argument values
like on x86, however, since x86_64 uses registers to pass the first
6 arguments rather than the stack we can't easily get to them.
Since the commpage is at a kernel address, changed 64-bit paging code
to match x86's behaviour of allowing user-accessible mappings to be
created in the kernel portion of the address space. This is also
required by some drivers.
Since this argument may be used to pass pointers, uint32 is not
correct for 64-bit. Effectively no change on 32-bit targets, both
size_t and uint32 are unsigned long there.
* cache_abort_transaction() left the block dirty which was causing bug
#8123 as well.
* cache_abort_sub_transaction() did, in addition to not clearing the dirty
flag, not reset the block's transaction member either if the block was
not part of the parent transaction.
Added the necessary build flags for modules, and added a module (dpc)
to the floppy image for x86_64 builds for testing purposes. The module
gets loaded correctly and its code runs without issue. Only non-trivial
addition is the different method for generating kernel.so, this is
explained in the kernel Jamfile.
Currently all debugger commands assume 32-bit pointers when formatting their
output. This means that on x86_64 the output is incorrectly formatted. Fixed
this by adding a B_PRINTF_POINTER_WIDTH definition (16 on 64-bit, 8 on
32-bit), and using this to correctly format the output. Not all commands have
been fixed yet, but all VM, slab, VFS, team, thread and image commands should
be correct.
No major changes to the kernel: just compiled in arch_smp.cpp and fixed the
IDT load in arch_cpu_init_percpu to use the correct limit for x86_64 (uses
sizeof(interrupt_descriptor)). In the boot loader, changed smp_boot_other_cpus
to construct a temporary GDT and get the page directory address from CR3, as
what's in kernel_args will be 64-bit stuff and will not work to switch the
CPUs into 32-bit mode in the trampoline code. Refactored 64-bit kernel entry
code to not use the stack after disabling paging, as the secondary CPUs are
given a 32-bit virtual stack address by the SMP trampoline code which will
no longer work.
This matches layout in ACPICA and keeps a cleaner boundry between
Haiku and ACPICA code. The only haiku specific file in ACPICA is
achaiku.h and it will hopefully be included upstream soon.
Merging will be simpler as we can just replace acpica contents and
fix Jamfile and build errors in our code.