Previous implementation based on the actual load of each core and share
each thread has in that load turned up to be very problematic when
balancing load on very heavily loaded systems (i.e. more threads
consuming all available CPU time than there is logical CPUs).
The new approach is to estimate how much load would a thread produce
if it had all CPU time only for itself. Summing such load estimations
of each thread assigned to a given core we get a rank that contains
much more information than just simple actual core load.