English 中文(简体)
4. 采用地形特征方法清除森林特征
原标题:Random Forest Classifier Removing Features using Top-N Features Method

I am a new-comer to data science and machine learning techniques and processes. I m working on a personal project that predicts the winner of an NBA game using a random forest classifier. I have sought to remove and modify my list of features so that I can increase accuracy and decrease noise.

I implemented the solution found here: https://datascience.stackexchange.com/questions/57697/decision-trees-should-we-discard-low-importance-features, where I would loop through the top N most important features and plot out the resulting accuracy. After all my features have gone through that loop, I m left with a plot that looks like this: enter image description here

如你所知,由此产生的图表是全局的。 我是否删除了有负坡的特征? 或者消除特征的门槛值是什么? 是否有更好的方法计算噪音? 鉴于我对培训数据模型准确性有如此多的影响,我如何获得最准确的模式?

问题回答

In ML/DL, some features affect positive side but some features affect negative side in Model accuary, Model performance.
Each feature is related to each other with correlation or some one else.

sklearn s Random Forest提供了许多参数,例如max_ deep,max_featuresor max_page_nodes 等。

So you can use GridSearch in sklearn, that class tunes hyperparameter in Randomforest. If you search best hyperparameter in Your model, Your model have better preformance before.





相关问题
Resample Filter of WEKA - How to interpret the result

I am currently strugeling with a machine learning problem whereas I have to deal with great unbalanced data sets. That is, there are six classes ( 1 , 2 ... 6 ). Unfortunately there are e.g. for class ...

How to recognize rectangles in this image?

I have a image with horizontal and vertical lines. In fact, this image is the BBC website converted to horizontal and vertical lines. My problem is that I want to be able to find all the rectangles in ...

Question About Using Weka, the machine learning tool

I m using the explorer feature of Weka for classification. So I have my .arff file, with 2 features of NUMERIC value, and my class is a binary 0 or 1 (eg {0,1}). Sample: @RELATION summary @...

Implementing a linear, binary SVM (support vector machine)

I want to implement a simple SVM classifier, in the case of high-dimensional binary data (text), for which I think a simple linear SVM is best. The reason for implementing it myself is basically that ...

libsvm model file format

According to this FAQ the model format in libsvm should be straightforward. And in fact it is, when I call just svm-train. As an example, the first SV for the a1a dataset is 1 3:1 11:1 14:1 19:1 39:...

Competitive Learning in Neural Networks

I am playing with some neural network simulations. I d like to get two neural networks sharing the input and output nodes (with other nodes being distinct and part of two different routes) to compete. ...

热门标签