

Saving Machine Learning Models
PlatformsTools & Languagesposted by Damian Mingle May 7, 2018 Damian Mingle

title | author | date |
---|---|---|
Saving Machine Learning Models | Damian Mingle | 04/30/2018 |
Let’s take a look at two conventional ways to save models using scikit-learn
- a pickle string
- a pickled model as a file.
Preliminaries
from sklearn import datasets
import pickle
from sklearn.linear_model import LogisticRegression
from sklearn.externals import joblib
Load Data
# Load example data (iris dataset)
iris = datasets.load_iris()
# Create a matrix of features and a vector target
features, target = iris.data, iris.target
Train Model
# Train an example model (logistic Regression)
clf = LogisticRegression(random_state=0)
clf.fit(features, target)
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
penalty='l2', random_state=0, solver='liblinear', tol=0.0001,
verbose=0, warm_start=False)
A Pickle String
# Save the trained model as a pickle string.
saved_model = pickle.dumps(clf)
# Display pickled model
saved_model
b'\x80\x03csklearn.linear_model.logistic\nLogisticRegression\nq\x00)\x81q\x01}q\x02(X\x06\x00\x00\x00solverq\x03X\t\x00\x00\x00liblinearq\x04X\n\x00\x00\x00intercept_q\x05cnumpy.core.multiarray\n_reconstruct\nq\x06cnumpy\nndarray\nq\x07K\x00\x85q\x08C\x01bq\t\x87q\nRq\x0b(K\x01K\x03\x85q\x0ccnumpy\ndtype\nq\rX\x02\x00\x00\x00f8q\x0eK\x00K\x01\x87q\x0fRq\x10(K\x03X\x01\x00\x00\x00<q\x11NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00tq\x12b\x89C\x18\xce\x86D\x03\xb1\xff\xd0?\xcd\xcc=I\xe5]\xf1?\xa9\'\xad\x8dxo\xf3\xbfq\x13tq\x14bX\x0c\x00\x00\x00random_stateq\x15K\x00X\x07\x00\x00\x00penaltyq\x16X\x02\x00\x00\x00l2q\x17X\n\x00\x00\x00warm_startq\x18\x89X\x0c\x00\x00\x00class_weightq\x19NX\x11\x00\x00\x00intercept_scalingq\x1aK\x01X\x01\x00\x00\x00Cq\x1bG?\xf0\x00\x00\x00\x00\x00\x00X\x08\x00\x00\x00max_iterq\x1cKdX\r\x00\x00\x00fit_interceptq\x1d\x88X\x0b\x00\x00\x00multi_classq\x1eX\x03\x00\x00\x00ovrq\x1fX\x07\x00\x00\x00n_iter_q h\x06h\x07K\x00\x85q!h\t\x87q"Rq#(K\x01K\x01\x85q$h\rX\x02\x00\x00\x00i4q%K\x00K\x01\x87q&Rq\'(K\x03h\x11NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00tq(b\x89C\x04\x07\x00\x00\x00q)tq*bX\x07\x00\x00\x00verboseq+K\x00X\x05\x00\x00\x00coef_q,h\x06h\x07K\x00\x85q-h\t\x87q.Rq/(K\x01K\x03K\x04\x86q0h\x10\x88C`\xa6\x1c\x904+\x8f\xda?\x8b7\xf6\x7f9\xaa\xda?.VL\xe5\x05R\xfb\xbf\xf3\xad\xd9^ya\xf7?\x95\x86\x10B\x03\x9d\xf9\xbf\x92\xa7x\xf5\\\x8c\xf8\xbf\x8b$8y\xdd\x18\x02\xc0\x8f\x8f\xee\xd9+|\xe2?X\x10\xf2\xcc\x8c\xc4\x03@\xda\xb0;l,w\xf0\xbf\xbb^\xe7W*+\xf6\xbf\xe2T`-lq\x04@q1tq2bX\x10\x00\x00\x00_sklearn_versionq3X\x06\x00\x00\x000.19.0q4X\x06\x00\x00\x00n_jobsq5K\x01X\x08\x00\x00\x00classes_q6h\x06h\x07K\x00\x85q7h\t\x87q8Rq9(K\x01K\x03\x85q:h\rX\x02\x00\x00\x00i4q;K\x00K\x01\x87q<Rq=(K\x03h\x11NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00tq>b\x89C\x0c\x00\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00q?tq@bX\x04\x00\x00\x00dualqA\x89X\x03\x00\x00\x00tolqBG?\x1a6\xe2\xeb\x1cC-ub.'
# Load the pickled model
clf_from_pickle = pickle.loads(saved_model)
# Load pickled model to make predictions
clf_from_pickle.predict(features)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
A Pickled Model as a File
# Save the model as a pickle in a file
joblib.dump(clf, 'example_file.pkl')
['example_file.pkl']
# Load the model from a pickled file
clf_from_joblib = joblib.load('example_file.pkl')
# Load pickled model to make predictions
clf_from_joblib.predict(features)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

Damian Mingle
Damian Mingle is an American businessman, investor, and data scientist. He is the Founder and Chief Data Scientist of LoveToThink.org, a way for skilled professionals to contribute their expertise and empower the world’s social changemakers. Formerly, Damian was the Chief Data Scientist at Intermedix (an R1 company) where he was responsible for leading a team of international data scientists to drive business value. As a leading authority on data science, Damian speaks nationally and internationally on patient safety, global health, and applied data science.
Data Validation at Scale – Detecting and Responding to Data Misbehavior
Europe 2023Modelingposted by ODSC Community Jun 2, 2023
AI Girlfriends and Other Ridiculous Examples of Using Generative AI
Generative AIposted by ODSC Team Jun 2, 2023
Paralyzed Man Walks Again Thanks To AI-Powered Tool
AI and Data Science Newsposted by ODSC Team Jun 2, 2023