In machine learning, when working with model training and testing, we often need to save and restore the trained models in a file, to reuse them to compare the model with other models, and to deploy the model on to another place for new data. Data saving in a file is called Serialization, while data restoration is called Deserialization.
We are also interested in various data forms and sizes. Some datasets are easily trained i.e. they take less time to train but even with GPU the datasets whose size is huge (more than 1 GB or more ) will take very long to train on a local machine. If in another project, or sometime later, we need the same trained data to avoid wasting the training time, store trained model so that it can be used sometime in the future.
We will be covering the following 2 approaches to Save and Reload an ML Model.
- Pickle Approach
- Joblib Approach
For the purpose of demonstration let create a simple Knn model using a scikit-learn library with iris dataset, which is preloaded in scikit library.
let's save and load our Knn model using pickle approach. the pickle module is used for the serialization and de-serialization of an object structure in the Python . The following functions are given in the pickle module.
– pickle.dump(): For the serialization of an object.
– pickle.load(): To deserialize data.
joblib replaces pickled objects with large numpy arrays because it is more powerful. Instead of filenames, these functions accept file-like objects. The following functions are given in the pickle module.
–joblib.dump():To serialize an object.
–joblib.load(): To deserialize an object.
In this article you have found out how your machine learning model can be saved and loaded with pickle and joblib packages. You learned two techniques :
1. The pickle API for basic Python serialization.
2. The joblib API for powerful Python object serialization with NumPy arrays.