English 中文(简体)
我在修改一个玩具的血清收入网格研究CV实例时,用“Userwarning:一个或多个试验计分是非定点”。
原标题:I got the warning "UserWarning: One or more of the test scores are non-finite" when revising a toy scikit-learn gridsearchCV example

我有以下法典,通常运作起来,但可以操作。

UserWarning: One or more of the test scores are non-finite: [nan nan]
  category=UserWarning

当我将其修订为更简明的版本(载于随后的法典中)。 一个热点的输出是否使这一问题的主线化?

import pandas as pd
from sklearn.model_selection import StratifiedKFold
from sklearn.linear_model import RidgeClassifier
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.model_selection import GridSearchCV

train = pd.read_csv( /train.csv )
test = pd.read_csv( /test.csv )
sparse_features = [col for col in train.columns if col.startswith( cat )]
dense_features = [col for col in train.columns if col not in sparse_features+[ target ]]
X = train.drop([ target ], axis=1)
y = train[ target ].values
skf = StratifiedKFold(n_splits=5)
clf = RidgeClassifier()

full_pipeline = ColumnTransformer(transformers=[
    ( num , StandardScaler(), dense_features),
    ( cat , OneHotEncoder(), sparse_features)
])
X_prepared = full_pipeline.fit_transform(X)
param_grid = {
     alpha : [ 0.1],
     fit_intercept : [False]
}
gs = GridSearchCV(
    estimator=clf,
    param_grid=param_grid,
    scoring= roc_auc ,
    n_jobs=-1,
    cv=skf
)
gs.fit(X_prepared, y)

修订如下。

clf2 = RidgeClassifier()
preprocess_pipeline2 = ColumnTransformer([
    ( num , StandardScaler(), dense_features),
    ( cat , OneHotEncoder(), sparse_features)
])
from sklearn.pipeline import Pipeline
final_pipeline = Pipeline(steps=[
    ( p , preprocess_pipeline2),
    ( c , clf2)
])
param_grid2 = {
     c__alpha : [0.4, 0.1],
     c__fit_intercept : [False]
}
gs2 = GridSearchCV(
    estimator=final_pipeline,
    param_grid=param_grid2,
    scoring= roc_auc ,
    n_jobs=-1,
    cv=skf
)
gs2.fit(X, y)

谁能指出哪一部分错误?

EDIT:在建立<代码>error_score至之后,我可以收到更多有关这个问题的反馈意见。 在我看来,我需要把培训组和测试组结合起来的合并数据集上调一个热层。 我是否正确? 但是,如果是这样,为什么没有第一版就抱怨同一问题? BTW,提出以下论点是否明智?

ValueError
---------------------------------------------------------------------------
_RemoteTraceback                          Traceback (most recent call last)
_RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 431, in _process_worker
    r = call_item()
  File "/opt/conda/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "/opt/conda/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/joblib/parallel.py", line 263, in __call__
    for func, args, kwargs in self.items]
  File "/opt/conda/lib/python3.7/site-packages/joblib/parallel.py", line 263, in <listcomp>
    for func, args, kwargs in self.items]
  File "/opt/conda/lib/python3.7/site-packages/sklearn/utils/fixes.py", line 222, in __call__
    return self.function(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 620, in _fit_and_score
    test_scores = _score(estimator, X_test, y_test, scorer, error_score)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_scorer.py", line 200, in __call__
    sample_weight=sample_weight)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_scorer.py", line 334, in _score
    y_pred = method_caller(clf, "decision_function", X)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/utils/metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/pipeline.py", line 493, in decision_function
    Xt = transform.transform(Xt)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/compose/_column_transformer.py", line 565, in transform
    Xs = self._fit_transform(X, None, _transform_one, fitted=True)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/compose/_column_transformer.py", line 444, in _fit_transform
    self._iter(fitted=fitted, replace_strings=True), 1))
  File "/opt/conda/lib/python3.7/site-packages/joblib/parallel.py", line 1044, in __call__
    while self.dispatch_one_batch(iterator):
  File "/opt/conda/lib/python3.7/site-packages/joblib/parallel.py", line 859, in dispatch_one_batch
    self._dispatch(tasks)
  File "/opt/conda/lib/python3.7/site-packages/joblib/parallel.py", line 777, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/opt/conda/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  File "/opt/conda/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 572, in __init__
    self.results = batch()
  File "/opt/conda/lib/python3.7/site-packages/joblib/parallel.py", line 263, in __call__
    for func, args, kwargs in self.items]
  File "/opt/conda/lib/python3.7/site-packages/joblib/parallel.py", line 263, in <listcomp>
    for func, args, kwargs in self.items]
  File "/opt/conda/lib/python3.7/site-packages/sklearn/utils/fixes.py", line 222, in __call__
    return self.function(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/pipeline.py", line 733, in _transform_one
    res = transformer.transform(X)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/preprocessing/_encoders.py", line 462, in transform
    force_all_finite= allow-nan )
  File "/opt/conda/lib/python3.7/site-packages/sklearn/preprocessing/_encoders.py", line 136, in _transform
    raise ValueError(msg)
ValueError: Found unknown categories [ MR ,  MW ,  DA ] in column 10 during transform
"""

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
<ipython-input-48-b81f3b7b0724> in <module>
     21     cv=skf
     22 )
---> 23 gs2.fit(X, y)

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
     61             extra_args = len(args) - len(all_args)
     62             if extra_args <= 0:
---> 63                 return f(*args, **kwargs)
     64 
     65             # extra_args > 0

/opt/conda/lib/python3.7/site-packages/sklearn/model_selection/_search.py in fit(self, X, y, groups, **fit_params)
    839                 return results
    840 
--> 841             self._run_search(evaluate_candidates)
    842 
    843             # multimetric is determined here because in the case of a callable

/opt/conda/lib/python3.7/site-packages/sklearn/model_selection/_search.py in _run_search(self, evaluate_candidates)
   1286     def _run_search(self, evaluate_candidates):
   1287         """Search all candidates in param_grid"""
-> 1288         evaluate_candidates(ParameterGrid(self.param_grid))
   1289 
   1290 

/opt/conda/lib/python3.7/site-packages/sklearn/model_selection/_search.py in evaluate_candidates(candidate_params, cv, more_results)
    807                                    (split_idx, (train, test)) in product(
    808                                    enumerate(candidate_params),
--> 809                                    enumerate(cv.split(X, y, groups))))
    810 
    811                 if len(out) < 1:

/opt/conda/lib/python3.7/site-packages/joblib/parallel.py in __call__(self, iterable)
   1052 
   1053             with self._backend.retrieval_context():
-> 1054                 self.retrieve()
   1055             # Make sure that we get a last message telling us we are done
   1056             elapsed_time = time.time() - self._start_time

/opt/conda/lib/python3.7/site-packages/joblib/parallel.py in retrieve(self)
    931             try:
    932                 if getattr(self._backend,  supports_timeout , False):
--> 933                     self._output.extend(job.get(timeout=self.timeout))
    934                 else:
    935                     self._output.extend(job.get())

/opt/conda/lib/python3.7/site-packages/joblib/_parallel_backends.py in wrap_future_result(future, timeout)
    540         AsyncResults.get from multiprocessing."""
    541         try:
--> 542             return future.result(timeout=timeout)
    543         except CfTimeoutError as e:
    544             raise TimeoutError from e

/opt/conda/lib/python3.7/concurrent/futures/_base.py in result(self, timeout)
    433                 raise CancelledError()
    434             elif self._state == FINISHED:
--> 435                 return self.__get_result()
    436             else:
    437                 raise TimeoutError()

/opt/conda/lib/python3.7/concurrent/futures/_base.py in __get_result(self)
    382     def __get_result(self):
    383         if self._exception:
--> 384             raise self._exception
    385         else:
    386             return self._result

ValueError: Found unknown categories [ MR ,  MW ,  DA ] in column 10 during transform
最佳回答

首先,我要说,我有一个类似的问题,并感谢大家注意到这个问题。

在确定错误之后 - 计分

这确实帮助我处理我的问题。 我正在使用一种习俗变压器,我有一些法典正在形成培训链中的变数,而由于这些类别没有在验证中存在,因此没有在验证中产生这些变量。 我认为,你再次面临同样的问题。

比如IHotEncoder的种子可能会在你的培训过程中产生一些类别,然后在鉴定过程中发现一些新类别,因为培训中确实存在这些类别。

ValueError: Found unknown categories [ MR , MW , DA ] in column 10 during transform

为了解决这一问题,我建议考虑使用习惯变压器,因为你的数据更为复杂。

问题回答

如果是多年级,则取消。 他们没有很好地发挥作用。 使用违约评分或选择其他东西。

我在试图进行多级分类时,在发出同样的警告信息时,我 here。 我的问题是,我试图与GridSearchCV一道使用scoring= roc_auc ,后者与多级公司合作。 我使用的是<条码>,显示= f1_micro ,代之以多级的罚款。

F1 scoring is discussed e.g. in here: How to do GridSearchCV for F1-score in classification problem with scikit-learn? A list of different scoring options can be found here (part 3.3.1): https://scikit-learn.org/stable/modules/model_evaluation.html

我也遇到了一个类似的问题,我的建议可能有助于你。 不要将变压器列入不需要参数调整的管道,而是最好只通过在使用GrodSearchCV时希望优化参数的物体。

就你而言,似乎不太可能需要调整“一个HotEncoder”。 因此,我的建议是,首先把你管道的每一步分别应用于数据集,并获取经过处理的数据集。

之后,你可以安全地利用GridSearchCV来调整模型,具体列出相关参数,以优化。 您可以采用最佳“准”方法检索最佳参数,然后将这些参数重新纳入你的管道,并继续与你打算做些什么。





相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签