Question

I have a very basic question about calculating RMSE in an NB classification scenario. My training data X has some 1000-odd reviews with ratings in [1,5] which are the class labels Y. So what I am doing is something like this:

model = nb_classifier_train(trainingX,Y)
Yhat = nb_classifier_test(model,testingX)

My testing data has some 400-odd reviews with missing ratings (whose labels/ratings I need to predict. Now to calculate RMSE

RMSE = sqrt(mean((Y - Yhat).^2))

在这种情况下,Y是什么? 我理解RMSE是用预测值和实际价值之间的差额计算的。这里的实际价值是什么? 还是缺少东西?

Answer 1

Y in this case is the labels for your training data, so the RMSE you re calculating does not make much sense since you are making a prediction on the test examples and comparing against the training labels. In fact, there is no reason that Y and Yhat vectors would even be the same length. Instead you should replace the Y with your test labels, and if you don t have test labels then you simply have no way of calculating your test error.

友情链接