Question

I am a new-comer to data science and machine learning techniques and processes. I m working on a personal project that predicts the winner of an NBA game using a random forest classifier. I have sought to remove and modify my list of features so that I can increase accuracy and decrease noise.

I implemented the solution found here: https://datascience.stackexchange.com/questions/57697/decision-trees-should-we-discard-low-importance-features, where I would loop through the top N most important features and plot out the resulting accuracy. After all my features have gone through that loop, I m left with a plot that looks like this:

如你所知,由此产生的图表是全局的。我是否删除了有负坡的特征? 或者消除特征的门槛值是什么? 是否有更好的方法计算噪音? 鉴于我对培训数据模型准确性有如此多的影响,我如何获得最准确的模式?

Answer 1

In ML/DL, some features affect positive side but some features affect negative side in Model accuary, Model performance.
Each feature is related to each other with correlation or some one else.

sklearn s Random Forest提供了许多参数,例如max_ deep,max_featuresor max_page_nodes 等。

So you can use GridSearch in sklearn, that class tunes hyperparameter in Randomforest. If you search best hyperparameter in Your model, Your model have better preformance before.

友情链接