English 中文(简体)
处理数据集的树木模型,该数据集具有大量信息特征
原标题:tree models dealing with a dataset that has a large number of informative features

数据集和数据集如何适当

X,y=make_classification(
     n_samples=1000, n_features=5000, n_redundant=2, n_informative=200, random_state=1
 ) 


names = [
     lightGBM ,
    "Decision Tree",
    "Random Forest",
    "Nearest Neighbors",
    "Neural Net",
    "AdaBoost",
    "Naive Bayes",
]


classifiers = [
    LGBMClassifier(),
    DecisionTreeClassifier(),
    RandomForestClassifier(),
    KNeighborsClassifier(3),
    MLPClassifier(alpha=1, max_iter=1000),
    AdaBoostClassifier(),
    GaussianNB(),
]


scores, times=[], []
for name, clf in zip(names, classifiers):
    start=process_time()
    clf.fit(X_train,y_train)
    end=process_time()

    score=clf.score(X_test,y_test)
    run_time=end-start

    times.append(run_time)
    scores.append(score)
    
df=pd.DataFrame({ runtime :times, score :scores}).T
df.columns=names
print(df)

所尝试的若干模式的结果如下(所有参数都是缺省)。

             lightGBM  Decision Tree  Random Forest  Nearest Neighbors  
runtime  123.859375       3.359375           3.75           0.015625   
score      0.560000       0.470000           0.54           0.726667   

         Neural Net   AdaBoost  Naive Bayes  
runtime   87.703125  27.734375     0.046875  
score      0.813333   0.533333     0.546667  

可以看出,树木模型分类器在这一数据集上表现不佳。 然而,当我将N_informative parailes调整到20时,树木模型的预测性能大大提高。

Is this a problem with the structure of the tree model itself or with the parameters? I want to know the reason why tree models have a poor behavior on this dataset and how i can improve it except for chaning the dataset. I have tried adjusting some parameters of lightGBM like reg_lambda, max_depth or num_leaves but it hasn t helped improve the performance any help is much appreciated

问题回答

暂无回答




相关问题
Resample Filter of WEKA - How to interpret the result

I am currently strugeling with a machine learning problem whereas I have to deal with great unbalanced data sets. That is, there are six classes ( 1 , 2 ... 6 ). Unfortunately there are e.g. for class ...

How to recognize rectangles in this image?

I have a image with horizontal and vertical lines. In fact, this image is the BBC website converted to horizontal and vertical lines. My problem is that I want to be able to find all the rectangles in ...

Question About Using Weka, the machine learning tool

I m using the explorer feature of Weka for classification. So I have my .arff file, with 2 features of NUMERIC value, and my class is a binary 0 or 1 (eg {0,1}). Sample: @RELATION summary @...

Implementing a linear, binary SVM (support vector machine)

I want to implement a simple SVM classifier, in the case of high-dimensional binary data (text), for which I think a simple linear SVM is best. The reason for implementing it myself is basically that ...

libsvm model file format

According to this FAQ the model format in libsvm should be straightforward. And in fact it is, when I call just svm-train. As an example, the first SV for the a1a dataset is 1 3:1 11:1 14:1 19:1 39:...

Competitive Learning in Neural Networks

I am playing with some neural network simulations. I d like to get two neural networks sharing the input and output nodes (with other nodes being distinct and part of two different routes) to compete. ...

热门标签