


Weka schemes that implement the interface, such as classifiers, clusterers, and filters, offer the following methods for setting and retrieving options: Use the NominalToString or StringToNominal filter (package ) to convert the attributes into the correct type. Thus it will fail to tokenize and mine that text. So if you use InstanceQuery to do text mining against text that appears in a VARCHAR column, Weka will regard such text as nominal values.

* InstanceQuery automatically converts VARCHAR database columns to NOMINAL attributes, and long TEXT database columns to STRING attributes. The Windows databases article explains how to do this. * For MS Access, you must use the JDBC-ODBC-bridge that is part of a JDK. * Don't forget to add the JDBC driver to your CLASSPATH. Instances data = query.retrieveInstances() You can declare that your data set is sparse InstanceQuery query = new InstanceQuery() Secondly, your Java code needs to look like this to load the data from the database: import JdbcURL=jdbc:mysql://localhost:3306/some_database Your props file must contain the following lines: jdbcDriver=.mysql.Driver Since you're only reading, you can use the default user nobody without a password. (The driver class is .mysql.Driver.) The database where your target data resides is called some_database. The MySQL JDBC driver is called Connector/J. Suppose you want to connect to a MySQL server that is running on the local machine on the default port 3306. First, you'll have to modify your DatabaseUtils.props file to reflect your database connection. Reading from Databases is slightly more complicated, but still very easy. For example, the XRFF format saves the class attribute information as wellĭata.setClassIndex(data.numAttributes() - 1) setting class attribute if the data format does not provide this information import .DataSource ĭataSource source = new DataSource("/some/where/data.arff") It can also read CSV files and other formats (basically all file formats that Weka can import via its converters it uses the file extension to determine the associated loader). The DataSource class is not limited to ARFF files. The classifiers and filters always list their options in the Javadoc API ( stable, developer version) specification.Ī comprehensive source of information is the chapter Using the API of the Weka manual. A link to an example class can be found at the end of this page, under the Links section.

The following sections explain how to use them in your own code. Attribute selection - removing irrelevant attributes from your data.Evaluating - how good is the classifier/clusterer?.Classifier/Clusterer - built on the processed data.The most common components you might want to use are
