Different classification results in Weka: GUI vs Java library -
i've problems when comparing weka gui classification results java program, performing tree (j48) iris dataset. i'd grateful if me.
i'm working iris dataset, , i'm trying develop java program classify new instances. this, i've used weka gui obtained model ("iris_tree(cv).model"), trained , validated (cross-validated 10 folds). results weka gui , expected: 4 incorrectly classified instances. after save model used later java program.
when load model "iris_tree(cv).model" in java program, , try classify new instances (testing dataset), results different: java programm classifies 'setosa' , 'virginica', not 'versicolour'. these results:
classification: setosa classification: setosa classification: virginica classification: virginica classification: virginica classification: virginica
when expected obtain:
classification: setosa classification: setosa classification: versicolour classification: versicolour classification: virginica classification: virginica
i've read related posts, couldn't find clear response strange behaviour when using java instead of weka gui.
i attach java code in 2 classes, , later training , testing set. in advance.
the main class:
public static void main(string[] args) { try { hashtable<string, string> values = new hashtable<string, string>(); //loading model string pathmodel=""; string pathtestset=""; jfilechooser choosermodel = new jfilechooser(); choosermodel.setcurrentdirectory(new java.io.file(".")); choosermodel.setdialogtitle("holides: choose model"); choosermodel.setfileselectionmode(jfilechooser.files_and_directories); choosermodel.setacceptallfilefilterused(true); if (choosermodel.showopendialog(null) == jfilechooser.approve_option) { file filepathmodel=choosermodel.getselectedfile(); pathmodel=filepathmodel.getpath(); state irismodel = new state(pathmodel); //loading model jfilechooser choosertestset = new jfilechooser(); choosertestset.setdialogtitle("holides: choose test set"); choosertestset.setfileselectionmode(jfilechooser.files_and_directories); choosertestset.setacceptallfilefilterused(true); //loading testing dataset if (choosertestset.showopendialog(null) == jfilechooser.approve_option) { file filepathtestset=choosertestset.getselectedfile(); pathtestset=filepathtestset.getpath(); //transforming data set pairs attribute-value converterutils.datasource unlabeledsource = new converterutils.datasource(pathtestset); instances unlabeleddata = unlabeledsource.getdataset(); if (unlabeleddata.classindex() == -1){ unlabeleddata.setclassindex(unlabeleddata.numattributes() - 1); } (int = 0; < unlabeleddata.numinstances(); i++) { instance ins=unlabeleddata.instance(i); (int j = 0; j < ins.numattributes(); j++) { string attrib=ins.attribute(j).name(); double val=ins.value(ins.attribute(j)); values.put(attrib,string.valueof(val)); } system.out.println("classification: " + irismodel.classifyspecies(values,pathmodel)); } } } } catch (exception ex) { logger.getlogger(pilotpatternclassifier.class.getname()).log(level.severe, null, ex); } }
and state class:
public class state { //private string classmodelfile = "/iris_tree.model"; private classifier classmodel; private instances datamodel; /** * class constructor. */ public state(string pathmodel) throws exception { //inputstream classmodelstream; // create stream object model file embedded within jar file. //classmodelstream = getclass().getresourceasstream(classmodelfile); classmodel=(classifier) weka.core.serializationhelper.read(pathmodel); } /** * close instance setting both model file string , * model object null. when garbage collector * runs, should make clean simpler. however, garbage * collector not called synchronously since should * managed larger execution environment. */ public void close() { classmodel = null; //classmodelfile=null; } /** * evaluate model on data provided @param measures. * returns string species name. * * @param measures object petal , sepal measurements * @return string species name * @throws exception */ public string classifyspecies(dictionary<string, string> measures, string pathtestset) throws exception { fastvector dataclasses = new fastvector(); fastvector dataattribs = new fastvector(); attribute species; double values[] = new double[measures.size() + 1]; int = 0, maxindex = 0; // assemble potential species options. dataclasses.addelement("setosa"); dataclasses.addelement("versicolour"); dataclasses.addelement("virginica"); species = new attribute("species", dataclasses); // create object classify on. (enumeration<string> keys = measures.keys(); keys.hasmoreelements(); ) { string key = keys.nextelement(); double val = double.parsedouble(measures.get(key)); dataattribs.addelement(new attribute(key)); values[i++] = val; } dataattribs.addelement(species); datamodel = new instances("iris-test", dataattribs, 0);//"classify" name of relationship of test file. arbitrary datamodel.setclass(species); instance ins=new denseinstance(1, values); //datamodel.add(new instance(1, values) {}); datamodel.add(ins); datamodel.instance(0).setclassmissing(); // find class highest estimated likelihood double cl[] = classmodel.distributionforinstance(datamodel.instance(0)); for(i = 0; < cl.length; i++){ if(cl[i] > cl[maxindex]){ maxindex = i; } } return datamodel.classattribute().value(maxindex); } }
here training , testing set:
@relation iris-train @attribute sepallength real @attribute sepalwidth real @attribute petallength real @attribute petalwidth real @attribute species {setosa,versicolour,virginica} @data 5.1,3.5,1.4,0.2,setosa 4.9,3.0,1.4,0.2,setosa 4.7,3.2,1.3,0.2,setosa 4.6,3.1,1.5,0.2,setosa 5.0,3.6,1.4,0.2,setosa 5.4,3.9,1.7,0.4,setosa 4.6,3.4,1.4,0.3,setosa 5.0,3.4,1.5,0.2,setosa 4.4,2.9,1.4,0.2,setosa 4.9,3.1,1.5,0.1,setosa 5.4,3.7,1.5,0.2,setosa 4.8,3.4,1.6,0.2,setosa 4.8,3.0,1.4,0.1,setosa 4.3,3.0,1.1,0.1,setosa 5.8,4.0,1.2,0.2,setosa 5.7,4.4,1.5,0.4,setosa 5.4,3.9,1.3,0.4,setosa 5.1,3.5,1.4,0.3,setosa 5.7,3.8,1.7,0.3,setosa 5.1,3.8,1.5,0.3,setosa 5.4,3.4,1.7,0.2,setosa 5.1,3.7,1.5,0.4,setosa 4.6,3.6,1.0,0.2,setosa 5.1,3.3,1.7,0.5,setosa 4.8,3.4,1.9,0.2,setosa 5.0,3.0,1.6,0.2,setosa 5.0,3.4,1.6,0.4,setosa 5.2,3.5,1.5,0.2,setosa 5.2,3.4,1.4,0.2,setosa 4.7,3.2,1.6,0.2,setosa 4.8,3.1,1.6,0.2,setosa 5.4,3.4,1.5,0.4,setosa 5.2,4.1,1.5,0.1,setosa 5.5,4.2,1.4,0.2,setosa 4.9,3.1,1.5,0.1,setosa 5.0,3.2,1.2,0.2,setosa 5.5,3.5,1.3,0.2,setosa 4.9,3.1,1.5,0.1,setosa 4.4,3.0,1.3,0.2,setosa 5.1,3.4,1.5,0.2,setosa 5.0,3.5,1.3,0.3,setosa 4.5,2.3,1.3,0.3,setosa 4.4,3.2,1.3,0.2,setosa 5.0,3.5,1.6,0.6,setosa 5.1,3.8,1.9,0.4,setosa 4.8,3.0,1.4,0.3,setosa 5.1,3.8,1.6,0.2,setosa 4.6,3.2,1.4,0.2,setosa 5.3,3.7,1.5,0.2,setosa 5.0,3.3,1.4,0.2,setosa 7.0,3.2,4.7,1.4,versicolour 6.4,3.2,4.5,1.5,versicolour 6.9,3.1,4.9,1.5,versicolour 5.5,2.3,4.0,1.3,versicolour 6.5,2.8,4.6,1.5,versicolour 5.7,2.8,4.5,1.3,versicolour 6.3,3.3,4.7,1.6,versicolour 4.9,2.4,3.3,1.0,versicolour 6.6,2.9,4.6,1.3,versicolour 5.2,2.7,3.9,1.4,versicolour 5.0,2.0,3.5,1.0,versicolour 5.9,3.0,4.2,1.5,versicolour 6.0,2.2,4.0,1.0,versicolour 6.1,2.9,4.7,1.4,versicolour 5.6,2.9,3.6,1.3,versicolour 6.7,3.1,4.4,1.4,versicolour 5.6,3.0,4.5,1.5,versicolour 5.8,2.7,4.1,1.0,versicolour 6.2,2.2,4.5,1.5,versicolour 5.6,2.5,3.9,1.1,versicolour 5.9,3.2,4.8,1.8,versicolour 6.1,2.8,4.0,1.3,versicolour 6.3,2.5,4.9,1.5,versicolour 6.1,2.8,4.7,1.2,versicolour 6.4,2.9,4.3,1.3,versicolour 6.6,3.0,4.4,1.4,versicolour 6.8,2.8,4.8,1.4,versicolour 6.7,3.0,5.0,1.7,versicolour 6.0,2.9,4.5,1.5,versicolour 5.7,2.6,3.5,1.0,versicolour 5.5,2.4,3.8,1.1,versicolour 5.5,2.4,3.7,1.0,versicolour 5.8,2.7,3.9,1.2,versicolour 6.0,2.7,5.1,1.6,versicolour 5.4,3.0,4.5,1.5,versicolour 6.0,3.4,4.5,1.6,versicolour 6.7,3.1,4.7,1.5,versicolour 6.3,2.3,4.4,1.3,versicolour 5.6,3.0,4.1,1.3,versicolour 5.5,2.5,4.0,1.3,versicolour 5.5,2.6,4.4,1.2,versicolour 6.1,3.0,4.6,1.4,versicolour 5.8,2.6,4.0,1.2,versicolour 5.0,2.3,3.3,1.0,versicolour 5.6,2.7,4.2,1.3,versicolour 5.7,3.0,4.2,1.2,versicolour 5.7,2.9,4.2,1.3,versicolour 6.2,2.9,4.3,1.3,versicolour 5.1,2.5,3.0,1.1,versicolour 5.7,2.8,4.1,1.3,versicolour 6.3,3.3,6.0,2.5,virginica 5.8,2.7,5.1,1.9,virginica 7.1,3.0,5.9,2.1,virginica 6.3,2.9,5.6,1.8,virginica 6.5,3.0,5.8,2.2,virginica 7.6,3.0,6.6,2.1,virginica 4.9,2.5,4.5,1.7,virginica 7.3,2.9,6.3,1.8,virginica 6.7,2.5,5.8,1.8,virginica 7.2,3.6,6.1,2.5,virginica 6.5,3.2,5.1,2.0,virginica 6.4,2.7,5.3,1.9,virginica 6.8,3.0,5.5,2.1,virginica 5.7,2.5,5.0,2.0,virginica 5.8,2.8,5.1,2.4,virginica 6.4,3.2,5.3,2.3,virginica 6.5,3.0,5.5,1.8,virginica 7.7,3.8,6.7,2.2,virginica 7.7,2.6,6.9,2.3,virginica 6.0,2.2,5.0,1.5,virginica 6.9,3.2,5.7,2.3,virginica 5.6,2.8,4.9,2.0,virginica 7.7,2.8,6.7,2.0,virginica 6.3,2.7,4.9,1.8,virginica 6.7,3.3,5.7,2.1,virginica 7.2,3.2,6.0,1.8,virginica 6.2,2.8,4.8,1.8,virginica 6.1,3.0,4.9,1.8,virginica 6.4,2.8,5.6,2.1,virginica 7.2,3.0,5.8,1.6,virginica 7.4,2.8,6.1,1.9,virginica 7.9,3.8,6.4,2.0,virginica 6.4,2.8,5.6,2.2,virginica 6.3,2.8,5.1,1.5,virginica 6.1,2.6,5.6,1.4,virginica 7.7,3.0,6.1,2.3,virginica 6.3,3.4,5.6,2.4,virginica 6.4,3.1,5.5,1.8,virginica 6.0,3.0,4.8,1.8,virginica 6.9,3.1,5.4,2.1,virginica 6.7,3.1,5.6,2.4,virginica 6.9,3.1,5.1,2.3,virginica 5.8,2.7,5.1,1.9,virginica 6.8,3.2,5.9,2.3,virginica 6.7,3.3,5.7,2.5,virginica 6.7,3.0,5.2,2.3,virginica 6.3,2.5,5.0,1.9,virginica 6.5,3.0,5.2,2.0,virginica 6.2,3.4,5.4,2.3,virginica 5.9,3.0,5.1,1.8,virginica
and
@relation iris-test @attribute sepallength real @attribute sepalwidth real @attribute petallength real @attribute petalwidth real @data 5.1,3.5,1.4,0.2 4.9,3.0,1.4,0.2 6.6,3.0,4.4,1.4 6.8,2.8,4.8,1.4 6.4,3.1,5.5,1.8 6.0,3.0,4.8,1.8
thanks lot help.
i think it's normal have less accuracy when applying classifier model testset when checking using training set feature file. try using weka gui test set, maybe obtain same result. it's not problem of gui vs java
i have put comment, can't comment due lack of reputation.