English 中文(简体)
Question About Using Weka, the machine learning tool
原标题:

I m using the explorer feature of Weka for classification.

So I have my .arff file, with 2 features of NUMERIC value, and my class is a binary 0 or 1 (eg {0,1}).

Sample:

@RELATION summary
@ATTRIBUTE feature1 NUMERIC
@ATTRIBUTE feature2 NUMERIC
@ATTRIBUTE class {1,0}

@DATA
23,11,0
20,100,1
2,36,0
98,8,1
.....

I load this .arff file, use 10-fold cross validation (no test file), and choose NaiveBayes, then I classify the data, and it gives me: 5 incorrectly labeled, 100 correctly labeled. So far so good.

Now, I significantly change my .arff file (give completely random values for my feature attributes). And repeat the above, and I get the EXACT same statistics when classifying.

I tried this with more changes to my .arff file, different classification algorithms. Still, EXACT same statistic (within the same algorithm) no matter what values I give to my .arff file.

Am I doing something wrong here?

最佳回答

It s hard to tell without more information, but I have two suggestions:

  1. What are the relative proportions of the two classes? Is it 5 to 100? Lots of algorithms don t work well with highly skewed class label distributions.

  2. Just a hunch, but try changing your class labels from numbers to strings (e.g. class1 and class2 ). Weka calls these nominal attributes so maybe using numbers is not allowed.

问题回答

Also: keep in mind that cross validation is pretty horrid in the UI as they only show you the original tree, anyhow (before they fold in other data). If you want the final trees generated, you need the programmatic API. I suggest using a split training/test data set.

Have you tried to change

@ATTRIBUTE class {1,0} 

with

@ATTRIBUTE class {yes,no} 




相关问题
Data-mining related forums [closed]

Which forums you are using for data mining questions? SO is mainly intended for programming, not for DM questions.

How to find common phrases in a large body of text

I m working on a project at the moment where I need to pick out the most common phrases in a huge body of text. For example say we have three sentences like the following: The dog jumped over the ...

Find HEX patterns and number of occurrences

I d like to find patterns and sort them by number of occurrences on an HEX file I have. I am not looking for some specific pattern, just to make some statistics of the occurrences happening there and ...

Question About Using Weka, the machine learning tool

I m using the explorer feature of Weka for classification. So I have my .arff file, with 2 features of NUMERIC value, and my class is a binary 0 or 1 (eg {0,1}). Sample: @RELATION summary @...

Grouping to extract common values in semi-structured data

I ve got a somewhat ugly field in a database which holds the names of locations. For instance, Madison Square Gardens which has also been entered as "The Madison Square Gardens", etc. etc. I m ...

热门标签