This system integrates advanced technologies for digital sensor based fault and event detection, context-based reasoning, hierarchical monitor performance tuning, intelligent monitor-to-action linking, and human computer interfaces. Such functionality is required to enable other autonomy enabling functions such as fault handling, goaloriented self commanding, automatic replanning, and resource management. This proposal includes enhancements to the specific technologies and methods for applying the technologies in order to provide proven, reliable monitoring automation. The integrated system supports self monitoring in distributed systems, a range of possible human attention, and provides for migration of monitoring between segments of a scalable distributed system. Finally, the system has been designed to simplify the tuning and application of the self monitoring system to a range of systems and missions.
This digital signal based behavioral and model deviation based event and fault detection is a data processing function that will be distributable between the ground and space segments in this proposed architecture. The most significant problem with the use of such automated detection systems is the "tuning" required to achieve beneficial automation for any given monitoring application. This software layer is tunable in terms of decision boundaries, methods, and input transformation fidelity such that overall system detection performance can be optimized in terms of mission requirements for detection performance (false alarms and probability of missed faults). A direct interface between the fault detection layer and the command and control layer should provide that capability to link reactions to detection. This detection reaction linking is fundamental to automation and overall autonomy. Finally, the detection must be context-sensitive such that false alarms which might be raised otherwise due to commands, sequences, and planned mode changes do not raise false alarms leading to unnecessary system load and possibly detrimental incorrect reactions.
In general, fault and event detection methods include: limit sensing, difference-based reasoning, causal reasoning (changes in relationships), reference model deviation, trending analysis, frequency domain entropy-based reasoning, configuration and sequence discrepancy analysis, and many other logical and statistical modeling methods. Difference-based reasoning methods such as distance measures are to be incorporated which include both causal distances (changes in relationships) and simple distances (changes in behavior) [5,7]. Traditional statistical modeling methods for trending including methods such as ARMA (Autoregression Moving Average) and methods for modal modeling such as linear mixture models will be incorporated and used as benchmarks. Finally, the applicability of ANN (Artificial Neural Networks) as non-linear mixture deviation models and classifiers of time series transformed input is being considered for incorporation.
One of the problems associated with incorporating a large number of detection methods is how to determine when to apply a particular method or a combination of methods to provide the best detection performance. As previously noted, detection methods available include a variety of approaches that can be applied to the sensor time and/or frequency domain output. A support tool for analyzing detection method applicability to specific failure modes will be incorporated in the system. This tool will provide evaluation of the best single method or combination of methods for a specific monitoring application based upon testbed simulation and injection of software and hardware simulated faults into nominal simulations.
Monitors can be related in a hierarchy such that specific sensor monitors can be grouped by component, subsystem, and system. This approach supports monitor tuning at all levels as well as diagnosis and tracing. The structure and the associated relationships would also provide valuable information that could be used by an embedded expert system for autonomous tracing, or a ground based object-oriented database management schema and/or expert system to support diagnosis and tracing that cannot be handled completely autonomously. Finally, this is also key to highly abstracted monitoring since a hierarchical structure of fault information can be directly associated with this monitor structure.
The occurrence of false alarms and false opportunities will be minimized with context-based reasoning. For example, simple mode changes and associated nominal behavior and relational changes can cause false alarms unless the context of the mode change is incorporated into the monitor. Multiple levels of context-based reasoning can be supported with the proposed system, including simple reasoning local to the monitor based upon related sensor values, global information such as subsystem or system mode information provided by external context, and high level rule-based constraints based upon mission and systems engineering parameters. Where this context reasoning is performed depends upon the time criticality of the monitor and the complexity of the context-based reasoning required to provide appropriate monitor performance with respect to detection and false alarms.
Key performance measures of automated monitoring are the false alarm rate and the missed fault rate. While monitor performance is tied to hardware design parameters such as sampling frequency and sensor placement, given a set of samples, the monitoring method performance can be analyzed and tuned. For example, automated fault monitoring is in fact a classification prob- lem, even when faults themselves are not classified, since the fundamental classes are nominal and anomalous. According to the theory of ROC (Receiver Operator Curves) any given classification scheme will result in a tradeoff between the "hit rate" and the "false alarm rate" for a given input sample set. In the case of fault detection, the hit rate is the percentage of correctly classified anomalous samples and the false alarm rate is number of nominal samples incorrectly classified. The ROC tradeoff is based upon imperfections in the monitoring method, (a perfect monitor would correctly classify all samples and therefore raise no false alarms and have a perfect hit rate). The placement of monitor decision boundaries determines where a monitor is on its ROC, and therefore the current tradeoff between hit rate and false alarms can be tuned for a specific detection method.
This method of decision boundary tuning can be used to optimize monitor performance with an individual or set of combined detection methods that provide the best overall hit rate with the lowest false alarm rate for a given monitor. For example, the optimization tool can determine detection method decision boundary synergism for failure mode sets. The ability to look at individual monitor performance and translate this into system level performance is implemented by making use of the monitor hierarchy such that individual monitor, subsystem, and system level monitoring performance can be evaluated and tuned. The proposed monitor performance tuning methods and tools include:
The self monitoring system does not preclude human monitoring, but does minimize it by elevating the monitoring to a high level. The height of the level can be specified according to mission objectives and the ability to automate monitoring and to link detection with intelligent reactions. At the highest level, the health of the overall system would simply be related to a human operator by a green, yellow, or red indication of misslob status. The automated monitoring focuses operator attention only on problems which cannot be handled autonomously or those where operator concurrence is desired (e.g. a red condition). This approach will be integrated with the hierarchy of monitors proposed in the self monitoring system.