AI Project Run: Managing the Life Cycle of an ML Model
Although the dependency between the Build and the Run is no longer an issue in “classic” IT projects, it’s even more pronounced in Machine Learning (ML) projects. In this type of project, constant Run monitoring is required to maintain Build performance and deliver a successful model with each iteration.
- Maintaining the model quality with each iteration
Those involved in ML projects (developers, decision-makers, data engineers, data scientists, etc.) are well aware of the issues that come with them, particularly in terms of maintenance: Only one out of every ten ML projects will make it to production (VentureBeat Transform, 2019), and maintenance will take up more than half the time of an ML team’s developers (MLOps Virtual Event, 2020).
Therefore, these parties have been developing tools to help deploy and maintain ML projects for a decade. “MLOps” follows the same concept as DevOps, with one crucial exception: it applies to Machine Learning, which includes code and training data (a large amount of data), each with its own set of problems and life cycle. One of the challenges is being able to combine MLOps and DevOps seamlessly.
- Responsible AI design from the start
The need to make all parties involved in creating an ML system aware of their obligations and consider their social, economic, and/or environmental impact has also grown in recent years. In 2018, Gartner predicted that by 2022, 85% of artificial intelligence (AI) projects would produce incorrect results due to biases in the data, algorithms, or the teams managing them.
Governments have also taken to regulating the development of these AI systems.
On the government side, UNESCO published its recommendations on AI ethics in November 2021. The Council of Europe published a report titled “Towards Regulation of AI Systems” in December 2020, while the OECD outlined its AI principles, which were adopted in May 2019. In April 2021, the European Commission proposed a draft regulation for trustworthy AI.
On the private side, companies have taken several regulatory initiatives, sometimes in collaboration with universities or other organizations: standardization bodies (ISO/IEC, NIST (Kicking off NIST AI Risk Management Framework, Oct. 2021), IEEE (Ethics In Action – in Autonomous and Intelligent Systems); creation of the Global Partnership on AI (GPAI), etc.).
Another challenge today is to reap the benefits of an ML system (automate and streamline time-consuming and unrewarding tasks to refocus the professional on their core business and relationship with their patient or client, improve performance on specific tasks, reduce errors in strategic tasks, etc.) while ensuring that the model designed does not pose risks to society, the economy, or the environment (confidentiality and protection of data used to train the model, risks of algorithmic bias caused by an imbalanced training data set, etc.).
AI Governance, Machine Learning Operations (MLOps), and Responsible AI… The requirements for developing AI systems have become clearer in recent years, as have the guidelines for designing responsible AI systems:
- Equity and non-discrimination: Develop tools and procedures to understand, document, and monitor development and production bias. AI systems should treat all people fairly.
- Human augmentation: Promote human values and ensure human oversight, which would allow domain experts to assess the model’s relevance at the end of a cycle and understand the impact of a wrong decision, especially if the automated ML system could have a significant effect on human life (health, energy, finance, banking, justice, transport, etc.).
- Confidentiality and data governance: Trust through confidentiality requires the development of tools and methods for processing and protecting data that are tailored to the data type (personal, sensitive, etc.). Awareness of the risks associated with data is achieved by developing the procedures and infrastructure required to ensure the security of data and models.
- Accountability All parties involved bear some accountability for the AI system developed.
- Transparency and explainability: Technologies and methods are required to increase ML system transparency and explainability continuously. Relevant information about the models must be identified and documented.
- Security and reliability: An AI system’s life cycle must be reproducible and robust, with metrics for assessing model accuracy and performance.
- Ethical purpose and social benefit: AI systems must be designed with humans in mind, and a project impact assessment must be completed upstream to identify potential risks and repercussions for society, the economy, and the environment (precautionary principle).
The challenge is to consider these issues and apply these principles from the start of a project, incorporate safeguards throughout the Build phase, maintain them during the Run phase, and ensure the system’s robustness and proper scaling throughout its life cycle.
Disclaimer: the purpose of this post is not to:
- explain the MLOps steps, which are well documented online. You can find MLOps-related articles and videos on the Cellenza blog.
- provide an exhaustive list of tools now available on the market, but give you examples of tools that exist: the various MLOps steps and related tools will be explored in more depth in videos and posts to be released over the next few months!
Preparation and Data Protection
Privacy and data protection regulation has increased in recent years (General Data Protection Regulation – GDPR, California Consumer Privacy Act – CCPA, etc.). These requirements also form part of a responsible AI framework because AI systems are based on vast quantities of data. Initiatives and regulations converge on privacy principles (OECD AI Privacy Principles at a global level, draft European regulation for trustworthy AI at European level):
- ensuring collection limitation
- ensuring data quality
- ensuring purpose specification
- ensuring use limitation
- ensuring accountability and individual participation
Some of these principles are therefore already incorporated into various individual rights and privacy statutes, for example, the EU GDPR for the fairness principle (Article 5), human oversight (Article 22), robustness and security of processing (Article 5), the right to explanation (Articles 12 to 15, 22) or the right to be forgotten (Article 17).
Classification and Data Governance
You need an overview of the data available and its classification (personal, sensitive, strategic, business data, etc.) to fulfill these requirements, and avoid the often substantial fines imposed by this legislation (GDPR: up to 4% of annual turnover worldwide), so you can assign responsibility for it and control access to it.
Several examples of data classification and governance tools are listed below:
- Data classification – Azure Purview
This Microsoft unified data governance solution can track data and manage access and accountability.
- Data classification – Azure SQL Information Protection
This tool introduces advanced features for discovering, classifying, tagging, and protecting sensitive database data integrated into Azure SQL Database.
- Compliance with legal obligations – example of the GDPR right to be forgotten with Delta. Read our “Right to Be Forgotten: What Impact Does it Have on Data? How to Implement It?” post to learn more.
- Data quality
To learn more about this topic, we recommend reading the following post: “Data observability”
Imbalanced Datasets and Algorithmic Biases
Algorithmic bias is a flaw in an AI system’s output. This can sometimes be due to the data used to train the model: it could, for example, be poor human selection, resulting in the underrepresentation of a population, or a historical bias (for example, a credit-granting model trained on data from the last few decades without considering the evolution of women’s economic empowerment: Le Point article by Aurélie Jean – Quand les algorithmes discriminent les femmes, [When Algorithms Discriminate Against Women], 2019).
Here are some useful tools:
- Microsoft Responsible AI Dashboard
You can use this visual representation to search for specific information, such as the error rate and data representation.
- Preparing data using the TensorFlow Responsible AI guide
- Know Your Data: interactively analyze the dataset to improve data quality (reducing bias and fairness problems)
- TF Data Validation: analyze and transform data to detect bias issues
- Limit the risk of exposing sensitive training data with differentially-private stochastic gradient descent (DP-SGD)
- Data cards
- Off-the-shelf AI: Black box?
Out-of-the-box AI (Azure Cognitive Services – Face API, Speech To Text, etc.) saves time but is sometimes limited in terms of access to training data and models used, leaving you with less scope for customization and control (black box effect).
However, after a few high-profile cases, publishers are trying to give control back to users and regain their trust. After some gender and skin color biases were revealed in 2018 by MIT Media Lab-based computer scientist Joy Buolamwini, Microsoft and IBM had to upgrade their facial recognition technology and give the service user more options to evaluate the model.
How to Design and Train an ML Model
One of the most difficult aspects of creating an AI system is making it more responsible and reliable. This is true even during the project’s conception phase and to a considerable extent throughout the Build phase as well. The ML model will interact with a wide range of people (data scientists, decision-makers, users, and people directly or indirectly impacted by the AI system).
These parties must have confidence in the developed ML model for the project to succeed. To achieve this, you need to define your model’s potential impact (social, economic, environmental), ethics, and explainability upstream to establish this confidence in your model and its supporting automated process.
Developing and Training the Model: Fairness and Interpretation
It’s essential to use or develop tools to ensure the model’s fairness and explainability when training the model.
Fairness and Non-Discrimination
In a report released in 2020, the Human Rights Commissioner, in collaboration with the French data protection agency, CNIL, highlighted the “considerable risks of discrimination that the exponential use of algorithms in all spheres of our lives can bring to bear on each and every one of us,” insisting that this subject had “for a long time remained a blind spot in the public debate” and that it “should no longer be so” (Algorithmes: prévenir l’automatisation des discriminations [Algorithms: Preventing the Automation of Discrimination], 2020).
In addition to data-related biases (e.g., lack of representativeness of the data used), the model can amplify these biases (e.g., by focusing only on the model’s overall performance and ignoring subgroups/populations).
Here are a few examples of useful tools:
- AzureML: Azure Machine Learning Studio is a GUI-based development environment for developing and deploying machine learning workflows in Azure. It has numerous features to measure the mode’s accuracy. The Fairlean function assesses the “impartiality” of a model’s predictions.
- Dataiku – Model fairness reports tell us more about the level of bias in a model by implementing certain fairness measures.
Model Interpretation and Explainability
Finding the correct balance between the model’s performance and interpretation is often difficult. Making a model more complex typically means making it less explainable, sometimes to the point of it becoming a “black box.”
Thus, tools and methods should be included in ML project development whenever possible to continuously improve the model’s transparency and explainability.
This interpretation also permits experts in the field to maintain control over their knowledge and question inconsistent predictions. Pressure can also be found in regulations. For example, Article 22 of the GDPR states that no person shall be subject to a decision based solely on automated processing and originating solely from a machine.
Examples of tools:
- LIME (Local Interpretable Model-agnostic Explanations) and SHAP (SHapley Additive exPlanation) algorithms: two popular Python libraries for model interpretability.
- Microsoft Responsible AI Dashboard – InterpretML introduces a powerful “Glass Box Model” (GBM) for explaining a model’s behavior.
Evaluation and Increased Reliability: Model Accuracy Measurement
Designing methods to ensure model accuracy and minimize model errors through tools and metrics appropriate to the model type and domain application is vital.
Some examples of tools:
- Microsoft Responsible AI Toolbox – Debugging:
- Error Analysis: analyzes and diagnoses model errors
- DiCE: provides debugging of individual predictions using counterfactual analysis
- EconML: uses causal inference to help decision-makers deliberate on the implications of actions in the world
- HAX Toolkit: guides teams in delivering seamless and responsible human-AI collaborative experiences
- Dataiku – Model Error Analysis: a plugin with tools for automatically breaking down model errors into meaningful groups for better analysis, highlighting the most common types of errors, etc.
Deployment and Monitoring Of A Machine Learning (ML) Model
ML models are dynamic: this type of project requires multiple iterations because the model is constantly evolving in response to the input data. As a result, it’s critical to ensure that this model, which works well on delivery, continues to work effectively over time.
To avoid the dangers of concept drift, data drifts, and other issues, it’s important to combine continuous deployment (CI/CD pipelines), monitoring, and alerting tools (see our “Simplify Your Data Science Projects Using MLflow” post for more information on these concepts).
Model Deployment and Reproducibility
The development of an infrastructure is required to ensure a reasonable level of continuous deployment and reproducibility across ML system operations.
Below are some tools for model deployment and reproducibility:
- MLflow is an open-source platform for managing the life cycle of ML models, from experimentation to deployment, ensuring reproducibility, and establishing a central model registry.
More about MLflow:
- Other robust open-source platforms for streamlining the life cycle of an ML project (experiment tracking, data versioning, etc.): KubeFlow, Pachyderm.
- For corporate solutions: Azure ML or AWS SageMaker
- To learn more about Kubernetes, we recommend reading Kubernetes: Building a Platform With Run in Mind
Retaining Human Control
It’s easy to overlook the consequences of bad AI predictions in an automated AI system. Features that allow workers and domain experts to constantly refine or reframe the model accuracy should be implemented as soon as possible, especially if this automation could significantly impact on society and human life (health, banking, justice, etc.).
This includes identifying, documenting, and/or popularizing relevant information from the model.
Examples of tools:
- Model Card is a tool that allows you to create a kind of identity card for your developed ML models (TensorFlow Model Card Toolkit)
- Dataiku Model Document Generator generates documentation for any trained model, producing a Word file that includes information such as the model’s purpose, how it was developed (algorithm, features, processing, etc.), any changes made, and the model’s performance.
To learn more about Run
There is a strong connection between the Build and the Run in an AI project. Similarly, maintaining consistency between the Data and Model life cycles and adhering to responsible AI design standards makes the ML project difficult to maintain in the Run phase.
However, due to this complexity, publishers and developers have implemented many MLOps and DataOps tools and functions in recent years. This has helped to make the Build & Run phases easier to manage.
MLOps, in particular, helps to make ML models more scalable, reliable, ethical, and explainable to meet legal requirements that are expected to grow over the coming years.