9 Monitoring
Why do we need monitoring?
- Alerting to know when something goes wrong (failures or problems such as application running slow)
- Debugging to have system information to reason about (inspecting code might not be enough)
- Trending to know how your system is being used (for capacity planning and design decisions)
Logging vs Monitoring
- Logging is typically time based and are qualitative information about current application state, events, errors, etc.
- Monitoring is time-series based (we look at something that happens consistently over time) and are quantitative information about system health, performance, resource usage, etc.
Model monitoring happens at two different levels:
- Resource level: ensuring the model is running correctly in the production environment (e.g., CPU, memory, disk usage, network latency)
- Performance level: ensuring the model is performing as expected
- Model staleness can occur for many reasons. We need to understand the different types of issues that can cause model’s performance to decay
- Data drift: the distribution of the production data is different from the training data.
- Target drift: the distribution of the target variable changes over time.
- Concept drift: the relationship between input data and the target variable changes. The patterns the model learned to map between the original inputs and outputs are no longer relevant.
- Model staleness can occur for many reasons. We need to understand the different types of issues that can cause model’s performance to decay
9.1 Drift for univariate features
- Continuous Data: use statistical tests such as the Kolmogorov-Smirnov test to compare the distribution of a feature in the training data vs production data.
- Kolmogorov-Smirnov test determines the maximum distance between two distribution’s cumulative density functions
- Categorical Data: use Chi-Squared test to compare the distribution of categorical features in training vs production data.
- Chi-Squared test compares the observed frequencies of categories in production data to the expected frequencies from training data.
