It’s cool (maybe) to train a machine learning model on a dataset with some loops and make it perform well. You put the model into production, and it is running smoothly, but is it the end of an ML project after deployment of model?
Machine learning model is just like any other machines, it’s performance will degrade overtime and requires monitoring. We call it model drift, and there are 2 types of model drifts:
Concept drift
Concept drift refers to the relationship between the features and the target has changed overtime. This could cause problems to the model performance as the model was trained on a dataset that does not reflect current data relationship. For example, customer behavior has changed over time during the past years. If a model was trained on a data collected 10 years ago and its performance was not monitored and still being used currently, it will give terrible results for the target customer.
Data drift
Data drift refers to the change of input data, such as the change of distribution of data. If the input data is statistically different from the training data, the machine learning model performance will degrade as it’s training does not reflect that different kind of dataset. It is quite common that the input data will change overtime in the real world, for example the data of a sensor will change overtime as the sensitivity of the sensor degrades.