Linux Scheduler
The PCB on linus is combined with the kernel stack, into a contiguous
block of 8K (two 4K pages), with the kernel stack from top growing
downwards, and the PCB a fixed structed aligned along the low memory
boundary.
The Linux schedulers recognizes three scheduling classes: real-time
round-robin and fifo classes, and a time-share other class.
The policy field of the the task_struct contains the integer
defined in include/linux/sched.h line 108 (linux 2.4.22).
Real-time run first based on the rt_priority value in the
PCB. Else the algorithm for selection is based on the result of the goodness
function in sys/kernel/sched.c (line 144).
the task_struct.counter is added to the nice value, with bonus points
for CPU and memory affinity. The counter is the number of ticks left
in the qunatum. Once everyone has eaten their quantum, an epoch completes
and new quantum are distributed.
The scheduler was major overhauled from 2.4 to 2.5. The file to see
is kernel/sched.c. At first glance, the changes seem larger than
they were.
It seems to me now that just some fairly standard data structure
techniques were used, and in hindsight it is odd that the original
scheduler didn't use these techniques.
But there are some other changes whose impact might be more subtle.
Google O(1) Scheduler Linux for more information.
Exhibits, from linux 2.4.22
The loop on line 606 of schedule is the heart of scheduling the
SCHED_OTHER task. On line 578 is the scheduling of SCHED_RR tasks.
Note position at end of run queue and the reloading of the count.
On line 622 wis the end of epoch calculation. Note that nice is
added twice. It makes the quota longer (count) and count is a
part of goodness, but also goodness adds the nice value again.
Is that odd?
Goals of Scheduling
Scheduling can be seen as either:
- The OS objective of efficient use of resource. In particular, the
CPU Burst/IO Burst model of process execution means that a process will
be waiting much of the time. Interleaving several processes will keep
the CPU busy. This is a gain in efficiency for free.
- The OS objective of support process abstractions. First, the process
abstraction hides the CPU resource allocation. If a process expects to
execute, it is the scheduler that must provide CPU cycles to fulfill this
expectation.
- Second, the process abstraction might include a seemingly simultaneous
advancement of processes. This is the essence of time-sharing, or of
interactive processes, depending on how things are defined. The scheduler
must allocate CPU is such a way the the interleaving of cycle allocations
among processes is transparent to the user.
- Or the OS objective of isolation and protection. A process should
not be able to hog the CPU.
Preemption
Preemption is the removal of the CPU resource from a process
except at certain natural pauses in the process' progress.
In order to provide transparent support for the simultaneous progression
model of processes, preemption is necessary. Let's say that the
natural pauses of a process is an I/O wait, or a deliberate wait
of some sort (death of child, wait for a signal), or process exit.
The the additional preemption points used by most kernels are:
-
Return from a system call.
-
Return from a hardware interrupt (unrelated to the process).
-
A dedicated preemption interrupt.
The dedicated preemption interrupt is a clock which forces a preemption
decision on a regular basis regardless of the process' behavior.
From another viewpoint, this third case can be merged with the
second, since the preemption interrupt will be a hardware interrupt
driven by a clock device, but we have kept it separate because it
seems to represent the Big Step in the decision to implement a fully
robust preemption.
Preemption can further be considered as to whether it can occur in
when the process is in kernel mode, or only in user mode. Traditional
Unix kernels did not preempt the process when running un kernel mode.
The kernel shared data structures across processes, hence kernel
preemption would introduce issues of data structure consistency and
synchronization.
Instead, in Unix, the preemption event occuring in kernel mode
was handled on return from the system call which placed to
process in kernel mode.
Scheduling Qualities
I am going to go out on a limb and say some original things.
There are there big aspects to scheduling:
- CPU Utilization. Scheduling should keep the CPU busy.
- Overall utilization. CPU Utilization is trivial, but the
propoer overlap so that jobs are not kept waiting is important.
In the boot they categorize this as total wait time.
- Latency. Scheduling might want to shorten the time between
an event and the time the process handling the event runs.
- Progress. Scheduling might want to make sure that every
process makes some amount of progress.
There are some other measures that can be employed. For Batch
processes, one can measure the number of jobs finished over
a time period (through-put); or he time from start to finish of
a job (turn-around). These have real-life analogs. The express
lane in a supermarket has higher through-put. Why is that good?
It should also have better turn-around. The first class line
for airplanes does not have utilization but it has the best
turn-around. Why is that good?
Processes have various needs and the scheduler should match those needs.
Interactive processes need low latency. Hopefully they are easily
satisfied in terms of progress (quick but small repsonses to events).
Finally, satisfaction of those needs can be measured either by
average, best or worst case results, or by a consistency value.