NetBSD/doc/roadmaps/system

$NetBSD: system,v 1.5 2009/01/26 05:09:25 agc Exp $

NetBSD System Roadmap
=====================

This is a small roadmap document, and deals with the main system
aspects of the operating system.

NetBSD 5.0 will ship with the following main changes to the system:

1. Modularized scheduler
2. Real-time scheduling classes and priorities
3. Processor sets, processor affinity and processor control
4. Multiprocessor optimized scheduler
5. High-performance 1:1 threading implementation
6. Pushback of the global kernel lock
7. New kernel concurrency model
8. Multiprocessor optimized memory allocators
9. POSIX asynchronous I/O and message queues
10. In-kernel linker
11. SysV IPC tuneables
12. Improved observability: minidumps, lockstat and tprof
13. Power management framework

The following element has been added to the NetBSD-current tree, and will be
in NetBSD 6.0

14. 64-bit time values supported

The following projects are expected to be included in NetBSD 6.0

15. Full kernel preemption for real-time threads
16. POSIX shared memory
17. namei() tactical changes
18. Better resource controls
19. Improved observability: online crashdumps, remote debugging
20. Processor and cache topology aware scheduler

The timescales for 6.0 are not known at the present time, but we would
expect to branch 6.0 late in 2009, with a view to a 6.0 release in
early 2010.

We'll continue to update this roadmap as features and dates get firmed up.


Some explanations
=================

1. Modularized scheduler
------------------------

Traditionally the only method of control on process scheduling was the
'nice' value assigned to each process.  The scheduler interface has been
redesiged to allow for pluggable schedulers, selected at compile time.
At the current time, there are no plans to switch schedulers at run-time,
since there is little appreciable gain to be had from that, and the extra
performance hit to provide this functionality is thought not to be worth
it.

The in-kernel scheduler interface has been enhanced to provide a framework
for adding new schedulers, called the common scheduler framework - more
information can be found in the csf(9) manual page.

Responsible: ad, dsieger, rmind, yamt

2. Real-time scheduling classes and priorities
----------------------------------------------

The scheduler has been extended to allow provide multiple new priority
bands, including real-time.  POSIX standard interfaces for controlling
thread priority and scheduling class have been implemented, along with
a command line tool to allow control by the system administrator.

3. Processor sets, processor affinity and processor control
-----------------------------------------------------------

A Solaris and HP-UX compatible interface for defining and controlling
processor sets has been added.  Processor sets allow applications and
the administrator complete flexibility in partitioning CPU resources
among applications, down to thread-level granularity.

Linux compatibile interface controlling processor affinity, similar
in spirit to processor sets, is provided.

A new utility to control CPU status (cpuctl) is provided.  cpuctl
allows the administrator to enable and disable individual CPUs at
the software level, while the system is running.  It is expected that
this will in time be extended to support full dynamic reconfiguration,
in concert with a hypervisor such as Xen.

4. Multiprocessor optimized scheduler
-------------------------------------

An intelligent, pluggable scheduler named M2 that is optimized for
multiprocessor systems, supports POSIX real-time extensions,
time-sharing class, and implements thread affinity.

5. High-performance 1:1 threading implementation
------------------------------------------------

A new lightweight 1:1 threading implementation, replacing the M:N based
implementation found in NetBSD 4.0 and earlier.  The new implementation is
more correct according to POSIX thread standards, and provides a massive
performance boost to threaded workloads in both uni- and multi-processor
configurations.

6. Pushback of the global kernel lock
-------------------------------------

Previously, most access to the kernel was single threaded on multiprocessor
systems by the global kernel_lock.  The kernel_lock has been pushed back to
to the device driver and wire-protocol layers, providing a significant
performance boost on heavily loaded multiprocessor systems.

7. New kernel concurrency model
-------------------------------

The non-preemptive spinlock and "interrupt priority level" synchronization
model has been replaced wholesale with a hybrid thread/interrupt model.  A
full range of new, lightweight synchronization primitives are available to
the kernel programmer, including: adaptive mutexes, reader/writer locks,
memory barriers, atomic operations, threaded soft interrupts, generic cross
calls, workqueues, priority inheritance, and per-CPU storage.

8. Multiprocessor optimized memory allocators
---------------------------------------------

The memory allocators in both the kernel and user space are now fully
optimized for multiprocessor systems and eliminate the performance
degradation typically associated with memory allocators in an MP setting.

9. POSIX asynchronous I/O and message queues
---------------------------------------------

A full implementation of the POSIX asynchronous I/O and message
queue facilities is now available.

10. In-kernel linker
--------------------

A in-kernel ELF object linker has been added, and a revamped kernel module
infrastructure developed to accompany it.  It is expected that the kernel
will become completely modular over time, while continuing to retain the
ability to link to a single binary image for embedded and hobby systems.

11. SysV IPC tuneables
----------------------

Parameters for the SVR3-compatible IPC mechanisms can now be tuned
completely at runtime.

12. Improved observability: minidumps, lockstat and tprof
---------------------------------------------------------

The x86 architecture now supports mini crash-dumps as a support aid for
kernel debugging. Only memory contents actively in use by the kernel at
the time of crash are dumped to and recovered from disk, an improvement
over the traditional scheme where the complete contents of memory is
dumped to disk.

The lockstat and tprof commands have been addded to the system. lockstat
provides a high-resolution description of lock activity in a running system.

tprof uses sample based profiling in conjuction with the available
performance counters in order to better profile system activity.

13. Power management framework
------------------------------

A new power management framework has been introduced that improves
handling of device power state transitions. As power management support
is now integrated with the auto-configuration subsystem, the kernel can
ensure that a parent device is powered on before attempting to access
the device.

With these changes comes an updated release of the Intel ACPI
Component Architecture and an x86 emulator which assists in restoring
uninitialized display adapters.

Leveraging this work, the i386 and amd64 kernels now support suspend
to RAM in uni- and multi-processor configurations on ACPI-capable
machines. This support has been successfully tested on a wide variety of
laptops, including (but not limited to) recent systems from Dell, IBM/Lenovo,
Fujitsu, Toshiba, and Sony.

Responsible: jmcneill, joerg

14. 64-bit time_t support
-------------------------

The Unix 32-bit time_t value will overflow in 2037 - any mortgage calculations
which use a time_t value are in danger of overflowing at the present time -
and to address this, 64-bit time_t values will be used to contain the number
of seconds since 1970.

Responsible: christos

15. Full kernel preemption for real-time threads
------------------------------------------------

With the revamp of the kernel concurrency model, much of the kernel is fully
multi-threaded and can therefore be preempted at any time. In support of
lower context switch and dispatch times for real-time threads, full kernel
preemption is being implemented.

16. POSIX shared memory
-----------------------

Implement POSIX shared memory facilities, which can be used to create the
shared memory objects and add the memory locations to the address space of
a process.

Responsible: rmind

17. Incremental namei improvements, Phase 1
-------------------------------------------

Implement the rest of the changes to namei outlined in Message-ID:
<20080319053709.GB3951@netbsd.org>.  Simplify the locking and behavior
of namei() calls within the kernel to resolve path names within file
systems. This phase simplifies the majority of calls to namei().

Responsible: dholland

18. Better resource controls
----------------------------

A resource provisioning and control framework that extends beyond the
traditional Unix process limits.

19. Improved observability: online crashdumps, remote debugging
---------------------------------------------------------------

XXX crashdumps while the system is running
XXX firewire support in libkvm

20. Processor and cache topology aware scheduler
------------------------------------------------

Implement the detection of the topology of the processors and caches.
Improve the scheduler to make decisions about thread migration
according to the topology, to get better thread affinity and less
cache thrashing, and thus improve overall performance in modern SMP
systems.

Responsible: rmind

29. Incremental namei improvements, Phase 2
-------------------------------------------

Implement the rest of the changes to namei outlined in Message-ID:
<20080319053709.GB3951@netbsd.org>.  Simplify the locking and behavior
of namei() calls within the kernel to resolve path names within file
systems.

Responsible: dholland


Andrew Doran
Alistair Crooks
Sun 25 Jan 2009 21:03:04 PST