I am a new-comer to data science and machine learning techniques and processes. I m working on a personal project that predicts the winner of an NBA game using a random forest classifier. I have sought to remove and modify my list of features so that I can increase accuracy and decrease noise.
I implemented the solution found here: https://datascience.stackexchange.com/questions/57697/decision-trees-should-we-discard-low-importance-features, where I would loop through the top N most important features and plot out the resulting accuracy. After all my features have gone through that loop, I m left with a plot that looks like this:
如你所知,由此产生的图表是全局的。 我是否删除了有负坡的特征? 或者消除特征的门槛值是什么? 是否有更好的方法计算噪音? 鉴于我对培训数据模型准确性有如此多的影响,我如何获得最准确的模式?