1 Introduction

An AI-enabled system is a software system that uses AI to provide value for users.

In traditional software systems, we have a set of rules and logic defined by programmers to process inputs and produce outputs. The knowledge is in the program, and we define a failure when the expected output does not match the actual output. In contrast, AI-enabled systems learn from data (given input and output, we have a program). Knowledge is in the data. We define a failure when the system produces outputs that are not aligned with user expectations, which may not always be captured by traditional correctness metrics.

We define two kind of development processes:

Model-first process: the model is developed first and then a system is built around the model
- Benefit: project risk reduction
- Drawback: model quality may not match system requirements
System-first process: the system is developed first and then a model is integrated into the system. The data scientist receive concrete goals, derived from the system requirements
- Benefit: models built to be compatible with production environments
- Drawback: unrealistic expectations

The big challenge is how to take an idea and a model developed by a data scientist and deploy it as part of a scalable and maintainable system.

Machine-learning models and pipelines must evolve continuously to remain effective. Common reasons to update them include:

Fixing recurring errors or failure modes discovered in production
Incorporating new feature-engineering ideas or tuning hyperparameters
Adopting improved machine-learning techniques and tools
Retraining with fresh data to address data drift, concept drift, or label changes
Meeting changing quality requirements, such as lower latency, better explainability, or improved fairness

We can decompose the big challenge into four sub-challenges:

Reproducible and auditable process
- Repeatability: same team, same experimental setup
- Reproducibility: different team, same experimental setup
- Replicability: different team, different experimental setup
Unexpected complexity: only a small fraction of real-world ML systems is composed of the ML code
Cross-functional teams: lack of ML literacy leads to unrealistic requirements, and product requirements are often not translated into clear model requirements
Testing and quality