Downtime Prediction
Time to Failure - Models
Time to failure models are powerful tools to understand the probability of an event occurring as well as the largest drivers of this event occurring. The goal of a time to failure model is to understand past trends in data which cause an event. In this use case, our models are trained to predict the probability of a downtime. The inputs of the model are the sensors and detected anomaly events. The output is the predicted probability of a downtime occurring within one hour. This model requires examples of the output on which to train. In this case, our output variable is downtimes on the stamping press machine.
Users have the option of setting context window and prediction window parameters in this model. The context window defines the amount of data used to understand critical conditions. The prediction window defines the length of time we can predict a downtime within. A prediction window of one hour means we will predict the probability of a downtime occurring within one hour. Larger prediction windows will decrease the accuracy of the model but increase the window for a preventative fix. In our experience, most plants focus on winning the hour, which is why our models default to a prediction window of one hour.
The features provided to the model are the value, moving average, rolling standard deviation and incremental variance. Creating features for each sensor which includes this information can lead to large vector spaces that do not provide useful context to the model. We prune the size of the feature set by systematically determining the least influential features and retraining the model.
We train models with various architectures and choose the best model based on the highest accuracy and f1 score.
We have many options including:
- Praxis PredictiveTransformer: A transformer based model based on PatchTST architecture which breaks input time series data into patches
- Praxis PredictiveNet: A LSTM based model
- Praxis PredictiveGPT: Various foundational models such as MOMENT, TimeGPT, TimeFM and Praxis’ proprietary foundational model which is actively in development
- Praxis PredictiveForest: Random forest based models adapted for time series data
The models are trained to predict two classes. Class A captures times when there is no downtime occurring within one hour (the prediction window). Class B captures time when there is a downtime occurring within one hour. Users can set a threshold parameter which identifies the probability at which we detect a downtime. This parameter defaults to 0.5 in training and 0.25 in simulation due to weaker signals in unseen data. The model was trained with a synthetic data generation factor of 10. We saw significant improvements in the model after this step.
Time to Failure - Results
In the graphs below, the blue line represents downtimes. The green line represents the predicted probability of the downtime event occurring within an hour. In training we can see the model is almost perfect at predicting the correct class.
In testing, we can see results differ based on different synthetic data generation factors. We see a synthetic data generation factor of 10 helps increase the magnitude of positive signals and decreases the number of false positives.
Time to Failure - Analysis
To test the model’s behavior in real world scenarios, we deployed the downtime prediction model with the synthetic generation factor of 10 and a probability threshold of 0.4. The model was simulated on unseen data in the month of June. While simulating our alerts service in June, we generated a total of 8 alerts. 6 of these alerts correctly predicted downtimes. There were 2 false positives. There were a total of 21 downtimes in June for a total reduction of 28.5%. We also verified our feature analysis by examining downtimes and cross testing sensor values with the warning ranges. Below we can see an example output in the Praxis platform, correctly showing a warning for the problematic sensor as the value was in the warning range before the three downtimes.