9 Monitoring

Why do we need monitoring?

Alerting to know when something goes wrong (failures or problems such as application running slow)
Debugging to have system information to reason about (inspecting code might not be enough)
Trending to know how your system is being used (for capacity planning and design decisions)

Logging vs Monitoring

Logging is typically time based and are qualitative information about current application state, events, errors, etc.
Monitoring is time-series based (we look at something that happens consistently over time) and are quantitative information about system health, performance, resource usage, etc.

Model monitoring happens at two different levels:

Resource level: ensuring the model is running correctly in the production environment (e.g., CPU, memory, disk usage, network latency)
Performance level: ensuring the model is performing as expected
- Model staleness can occur for many reasons. We need to understand the different types of issues that can cause model’s performance to decay
  - Data drift: the distribution of the production data is different from the training data.
  - Target drift: the distribution of the target variable changes over time.
  - Concept drift: the relationship between input data and the target variable changes. The patterns the model learned to map between the original inputs and outputs are no longer relevant.

9.1 Drift for univariate features

Continuous Data: use statistical tests such as the Kolmogorov-Smirnov test to compare the distribution of a feature in the training data vs production data.
- Kolmogorov-Smirnov test determines the maximum distance between two distribution’s cumulative density functions
Categorical Data: use Chi-Squared test to compare the distribution of categorical features in training vs production data.
- Chi-Squared test compares the observed frequencies of categories in production data to the expected frequencies from training data.