Previous Table of Contents Next


7.3.1 Issues In Software Monitor Design

This section covers some of the design issues that occur during the design of software monitors.

1.  Activation Mechanism: The first decision that a software monitor designer has to make is how to trigger the data collection routine. Three mechanisms have been used in the past:
(a)  Trap instruction
(b)  Trace mode
(c)  Timer interrupt

The first mechanism is to instrument the system software with trap instructions at appropriate points in the code. The trap instruction is a software interrupt mechanism that transfers control to a data collection routine. The effect of the trap instruction is very similar to that of a subroutine call. In fact, if a trap instruction is not available on a processor, a call instruction would generally be used instead. The event-trace monitor described by Tolopka (1982) is an example of this approach. The monitor measures elapsed time for various operating system services using trap instructions at the beginning and at the end of the service code. Thus, to measure I/O service time, a trap instruction placed at the beginning of the I/O service-call-handling routine enables the monitor to record the time clock. After finishing the I/O, another trap instruction reads the clock, subtracts the beginning value, and thus obtains the time spent in the service routine.
The second mechanism is that of changing the processor to the trace mode. In this mode, which is available on many processors, the instruction execution is interrupted after every instruction, and the control is passed on to a data collection routine. This method has a very high overhead and is used only for those monitoring applications where time between events is not to be measured. For example, this approach can be used to develop an instruction-trace monitor to produce a program counter histogram (a histogram of instruction addresses). The monitor records the address (program counter) and returns control to the user program.
The final mechanism is that of a timer interrupt. A timer-interrupt service provided by the operating system is used to transfer control to a data collection routine at fixed intervals. This mechanism, called sampling, is specially suitable for frequent events since the overhead is independent of the event rate. The overhead per activation, input width, and rate of variation of the sampled quantity determine the desired sampling rate. If a counter is being sampled, the sampling should be done so that the probability of counter overflows between sampling is minimized.
2.  Buffer Size: Most software monitors record data in buffers, which is later written onto disk or magnetic tape for storage. The size of the buffers should be large so that the frequency of writing onto the secondary storage is minimized. The size should be small so that the time lost per writing operation is not too large and so the effect of reduced memory available for system usage is not perceptible. Thus, the optimal buffer size is a function of the input rate, input width, and emptying rate.
3.  Number of Buffers: Buffers are usually organized in a ring so that the recording (buffer-emptying) process follows the monitoring (buffer-filling) process as closely as possible. If there is only one buffer, the two processes cannot proceed simultaneously, and monitoring may have to be stopped while recording is in progress. Thus, a minimum of two buffers is required for continuous, simultaneous operation. Generally, the number of buffers is much higher to allow for variation in the filling and emptying rate.
4.  Buffer Overflow: In spite of multiple buffers per ring, there is always a finite probability that all buffers become full. The monitoring process is required to either overwrite a previously written buffer or stop monitoring until a buffer becomes available. In either case, some information is lost. If a buffer is overwritten, the relatively old information is lost. Whereas if the monitoring is blocked, new information is lost. Thus, the choice between the two alternatives depends upon the value of old versus new information. In either case, the fact that the buffer overflow occurred should be recorded.
A similar problem occurs when a counter (used to count the events) overflows. The choice is to keep the counter stuck at the highest value or to reinitialize it. The fact that counter overflow occurred should also be recorded.
5.  Data Compression or Analysis: It is possible for the monitor to process the data as it is observed. This helps reduce the storage space required as the detailed data need not be stored. However, it adds to the monitor overhead.
6.  On/Off Switch: Most hardware monitors have an on/off switch that enables/disables the monitoring operation. A software monitor should similarly have conditional (IF ... THEN ...) statements so that the monitoring can be enabled/disabled easily. Since monitoring does add to the system overhead, it should be possible to disable the monitor when it is not being used. Also, a software monitor usually requires modification of system codes, which may introduce bugs. The on/off switch helps during monitor development and debugging.
7.  Language: Most monitors are written in a low-level system programming language, such as assembly, Bliss, or C, to keep the overhead at a minimum. Since a software monitor is usually a part of the system being monitored, it is better to write both in the same programming language.
8.  Priority: If the monitor runs asynchronously, its priority should be low so that key system operations are least affected. However, if timely observation and recording of events is important, the priority should be high so that the delay in its execution does not cause a significant skew in the time values recorded.
9.  Abnormal-Events Monitoring: A monitor should be able to observe normal as well as abnormal events on the system. Examples of abnormal events include system initialization, device failures, and program failures. In fact, if both cannot be accommodated, users may often prefer to monitor abnormal events at a higher priority than normal events. This is because abnormal events occur at a lower rate and impose less monitoring overhead than normal events. The abnormal events also help the user take preventive action long before the system becomes unavailable.

7.4 HARDWARE MONITORS

A hardware monitor consists of separate pieces of equipment that are attached to the system being monitored via probes. No system resources are consumed in monitoring. Thus, hardware monitors generally have lower overhead than software monitors. Their input rate is also higher. Further, the probability of their introducing bugs into the system operation is generally lower than that of software monitors.

A number of general-purpose hardware monitors are available on the market. They consists of the following elements:

1.  Probes: High-impedance probes are used to observe signals at desired points in the system hardware.
2.  Counters: These are incremented whenever a particular event occurs.
3.  Logic Elements: Signals from many probes can be combined using AND, OR, and other logic gates. The combinations are used to indicate events that may increment the counters.
4.  Comparators: These can be used to compare counters or signal values with preset values.
5.  Mapping Hardware: This allows histograms of observed quantities to be computed. It consists of multiple comparators and counters.
6.  Timer: Used for time stamping or for triggering a sampling operation.
7.  Tape/Disk: Most monitors have built-in tape/disk drives to store the data.

The monitor manufacturers also supply probe-point libraries for various systems that can be observed when using the monitor. Each library contains a list of points on the system where the probes can be attached and explains the signal that is observed.

Hardware monitors have gone through several generations of development. Originally, the monitors contained wired control logic. The next generation contained mapping hardware with memory and comparators. Today’s monitors are intelligent in that they are programmable and contain their own processor, memory, and I/O devices.


Previous Table of Contents Next

Copyright © John Wiley & Sons, Inc.