After identifying the best parameters using a pipeline
and GridSearchCV
, how do I pickle
/joblib
this process to re-use later? I see how to do this when it s a single classifier...
import joblib
joblib.dump(clf, filename.pkl )
But how do I save this overall pipeline
with the best parameters after performing and completing a gridsearch
?
I tried:
joblib.dump(grid, output.pkl )
- But that dumped every gridsearch attempt (many files)joblib.dump(pipeline, output.pkl )
- But I don t think that contains the best parameters
X_train = df[ Keyword ]
y_train = df[ Ad Group ]
pipeline = Pipeline([
( tfidf , TfidfVectorizer()),
( sgd , SGDClassifier())
])
parameters = { tfidf__ngram_range : [(1, 1), (1, 2)],
tfidf__use_idf : (True, False),
tfidf__max_df : [0.25, 0.5, 0.75, 1.0],
tfidf__max_features : [10, 50, 100, 250, 500, 1000, None],
tfidf__stop_words : ( english , None),
tfidf__smooth_idf : (True, False),
tfidf__norm : ( l1 , l2 , None),
}
grid = GridSearchCV(pipeline, parameters, cv=2, verbose=1)
grid.fit(X_train, y_train)
#These were the best combination of tuning parameters discovered
##best_params = { tfidf__max_features : None, tfidf__use_idf : False,
## tfidf__smooth_idf : False, tfidf__ngram_range : (1, 2),
## tfidf__max_df : 1.0, tfidf__stop_words : english ,
## tfidf__norm : l2 }