Model Drift: The Achilles Heel of AI Explained

A machine learning model is a mathematical representation of a set of rules that are learned from data. It is the output of the process of training a machine learning algorithm. The model is then used to make predictions or decisions based on new, unseen data.

There Are Many Different Types of Machine Learning Models.

You’ll want to become familiar with the many different types of machine learning, including decision trees, random forests, support vector machines, and neural networks. Each type of model has its own strengths and weaknesses and is suitable for different types of tasks.

To create a machine learning model, you need to provide the algorithm with a set of training data. The algorithm then uses this data, along with a set of rules called a learning algorithm, to learn about the relationships and patterns in the data. The resulting model is a set of mathematical equations that capture these patterns and can be used to make predictions or decisions based on new, unseen data.

What Is Model Drift?

Model drift is when a machine learning model’s performance declines over time due to real-world changes in the data it takes as inputs. There are two main types of model drift:

Concept drift occurs when the relationships or patterns in the data change over time. For example, consider a machine learning model that has been trained to predict credit card fraud. The model might be trained on a data set that includes a certain proportion of fraudulent and non-fraudulent transactions. If the proportion of fraudulent transactions changes over time, the model’s performance may decline because it is no longer able to accurately predict the outcome based on the new data distribution.
Data drift occurs when the data itself changes over time. For example, consider a machine learning model that has been trained to classify images of animals. If the model is trained on a data set that includes images of dogs, cats, and birds, it might perform well on new images of these animals. However, if the model is then presented with a new type of animal that it has not seen before, such as a dolphin, it might perform poorly because the data it was trained on does not include any examples of dolphins.

One way to mitigate the impact of drift is to regularly retrain the model on new data to ensure that it remains accurate and up-to-date. Learn more about this technical deep-dive ML model, drift (aporia dotcom; concept of drift).

How Does Model Drift Impact Production AI Systems?

Model drift can have a significant impact on production AI systems, as it can cause them to make inaccurate predictions or classifications. This can lead to poor performance and potentially harmful decisions. In some cases, it could lead to the system malfunctioning, causing financial losses or even physical harm.

In production AI systems, model drift can occur due to changes in the distribution of the input data over time, such as changes in customer behavior or market conditions. It can also occur due to changes in the system itself, such as updates to the hardware or software.

To mitigate the impact of model drift, it’s important to regularly monitor the performance of AI systems and retrain the models as needed. Techniques such as active learning and online learning can also be used to adapt the models to changes in the input data continuously. Additionally, it can be beneficial to use ensemble methods that combine multiple models, as this can help to reduce the impact of model drift.

It’s also important to have a good understanding of the underlying data and the system to detect any signs of drift and take the necessary actions, such as retraining the model, fine-tuning the parameters, or collecting more data.

Can We Trust AI Given the Problem of Model Drift?

It is important to be aware of the potential for model drift when using artificial intelligence (AI) systems, as it can affect the accuracy and reliability of the predictions or decisions made by the model. However, this does not necessarily mean that AI systems cannot be trusted.

The key is to accept and manage the risk inherent in machine learning models. This is known as “model risk” – the risk that a machine learning model may make incorrect predictions or decisions, which can have negative consequences for its owners or users.

For example, take the case of Zillow, a real estate and rental marketplace. In 2021, it accrued losses of over $500 million due to the property valuation algorithm overestimating real estate values, leading the company to overinvest when purchasing houses. As a result, the company has had to reduce its workforce.

Zillow probably implemented rigorous testing before rolling out the machine learning model. The rollout in production was gradual, allowing the company to evaluate its performance in the real world. However, the company then expanded its purchasing program in a short period while market conditions began to change (concept drift). Thus, the model no longer reflected the real estate market.

This shows why it is important for companies to be proactive in managing model risk in order to ensure that their machine learning systems are making accurate predictions or decisions. The impact of the model drift could have been averted if Zillow monitored the model more closely.

What AI Developers Can Do About Drift

There are several things that AI developers can do to mitigate the impact of model drift:

Regularly retrain the model on new data: One way to ensure that the model remains accurate and up-to-date is to regularly retrain it on new data. This can help to reduce the impact of concept drift and data drift.
Use techniques such as online learning: Online learning is a machine learning approach that allows the model to continuously update itself as new data becomes available. This can help to reduce the impact of concept drift and data drift.
Monitor the model’s performance: Once the model has been deployed in a production environment, it is important to continuously monitor its performance to ensure that it is still making accurate predictions or decisions. This can help to identify any changes in the data distribution or other factors that may be causing model drift. Monitoring should be an ongoing process.
Use multiple models: Using multiple models can help to reduce the risk of relying on a single model that may be subject to model drift. By combining the predictions or decisions of multiple models, the overall performance of the system can be improved.
Add human oversight: In some cases, it may be appropriate to use human oversight to review or validate the predictions or decisions made by the model. This can help to ensure that the system is being used appropriately and that any potential issues are addressed.

Conclusion

In conclusion, model drift is a phenomenon that can significantly impact the performance of artificial intelligence (AI) systems over time. It occurs when the data distribution or relationships in the data that the model was trained on change, resulting in a decline in the model’s accuracy and reliability.

Both concept drift and data drift can be challenging to manage because they are difficult to anticipate and detect. However, by taking steps such as regularly retraining the model on new data, using online learning techniques, and using multiple models, AI developers can mitigate the impact of model drift and improve the trustworthiness of their systems.

Featured Image Credit: Provided by the Author; Vecteezy; Thank you!