Hi all,I have a dataset with 10000 features (terms) with tf_idf values.
when I try to load the data (which is in C45 format) I get the following error:
data = orange.ExampleTable(file)
SystemError: C45ExampleGenerator: line 1 of file '../features/pubmed.data' too long
So I reduced the feature space to 5000 by use of term frequency treshold (>2).
But it doesn't work neather.
Does Orange have a problem with 5000 features and if so, where is its limit?
2 posts • Page 1 of 1
you can have much more than 10000 features. The problem is that the code for parsing the files in C4.5 is old and ugly, and limits the length of lines to 10000 characters. This shouldn't be difficult to fix, so we'll do it some day soon. Till then, you can just convert your data to tab-delimited format