|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectweka.clusterers.AbstractClusterer
weka.clusterers.AbstractDensityBasedClusterer
weka.clusterers.RandomizableDensityBasedClusterer
weka.clusterers.EM
public class EM
Simple EM (expectation maximisation) class.
EM assigns a probability distribution to each instance which indicates the probability of it belonging to each of the clusters. EM can decide how many clusters to create by cross validation, or you may specify apriori how many clusters to generate.
The cross validation performed to determine the number of clusters is done in the following steps:
1. the number of clusters is set to 1
2. the training set is split randomly into 10 folds.
3. EM is performed 10 times using the 10 folds the usual CV way.
4. the loglikelihood is averaged over all 10 results.
5. if loglikelihood has increased the number of clusters is increased by 1 and the program continues at step 2.
The number of folds is fixed to 10, as long as the number of instances in the training set is not smaller 10. If this is the case the number of folds is set equal to the number of instances.
-N <num> number of clusters. If omitted or -1 specified, then cross validation is used to select the number of clusters.
-I <num> max iterations. (default 100)
-V verbose.
-M <num> minimum allowable standard deviation for normal density computation (default 1e-6)
-O Display model in old format (good when there are many clusters)
-S <num> Random number seed. (default 100)
Constructor Summary | |
---|---|
EM()
Constructor. |
Method Summary | |
---|---|
void |
buildClusterer(Instances data)
Generates a clusterer. |
double[] |
clusterPriors()
Returns the cluster priors. |
java.lang.String |
debugTipText()
Returns the tip text for this property |
java.lang.String |
displayModelInOldFormatTipText()
Returns the tip text for this property |
Capabilities |
getCapabilities()
Returns default capabilities of the clusterer (i.e., the ones of SimpleKMeans). |
double[][][] |
getClusterModelsNumericAtts()
Return the normal distributions for the cluster models |
double[] |
getClusterPriors()
Return the priors for the clusters |
boolean |
getDebug()
Get debug mode |
boolean |
getDisplayModelInOldFormat()
Get whether to display model output in the old, original format. |
int |
getMaxIterations()
Get the maximum number of iterations |
double |
getMinStdDev()
Get the minimum allowable standard deviation. |
int |
getNumClusters()
Get the number of clusters |
java.lang.String[] |
getOptions()
Gets the current settings of EM. |
java.lang.String |
getRevision()
Returns the revision string. |
java.lang.String |
globalInfo()
Returns a string describing this clusterer |
java.util.Enumeration |
listOptions()
Returns an enumeration describing the available options. |
double[] |
logDensityPerClusterForInstance(Instance inst)
Computes the log of the conditional density (per cluster) for a given instance. |
static void |
main(java.lang.String[] argv)
Main method for testing this class. |
java.lang.String |
maxIterationsTipText()
Returns the tip text for this property |
java.lang.String |
minStdDevTipText()
Returns the tip text for this property |
int |
numberOfClusters()
Returns the number of clusters. |
java.lang.String |
numClustersTipText()
Returns the tip text for this property |
void |
setDebug(boolean v)
Set debug mode - verbose output |
void |
setDisplayModelInOldFormat(boolean d)
Set whether to display model output in the old, original format. |
void |
setMaxIterations(int i)
Set the maximum number of iterations to perform |
void |
setMinStdDev(double m)
Set the minimum value for standard deviation when calculating normal density. |
void |
setMinStdDevPerAtt(double[] m)
|
void |
setNumClusters(int n)
Set the number of clusters (-1 to select by CV). |
void |
setOptions(java.lang.String[] options)
Parses a given list of options. |
java.lang.String |
toString()
Outputs the generated clusters into a string. |
Methods inherited from class weka.clusterers.RandomizableDensityBasedClusterer |
---|
getSeed, seedTipText, setSeed |
Methods inherited from class weka.clusterers.AbstractDensityBasedClusterer |
---|
distributionForInstance, logDensityForInstance, logJointDensitiesForInstance, makeCopies |
Methods inherited from class weka.clusterers.AbstractClusterer |
---|
clusterInstance, forName, makeCopies, makeCopy |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Methods inherited from interface weka.clusterers.Clusterer |
---|
clusterInstance |
Constructor Detail |
---|
public EM()
Method Detail |
---|
public java.lang.String globalInfo()
public java.util.Enumeration listOptions()
listOptions
in interface OptionHandler
listOptions
in class RandomizableDensityBasedClusterer
public void setOptions(java.lang.String[] options) throws java.lang.Exception
-N <num> number of clusters. If omitted or -1 specified, then cross validation is used to select the number of clusters.
-I <num> max iterations. (default 100)
-V verbose.
-M <num> minimum allowable standard deviation for normal density computation (default 1e-6)
-O Display model in old format (good when there are many clusters)
-S <num> Random number seed. (default 100)
setOptions
in interface OptionHandler
setOptions
in class RandomizableDensityBasedClusterer
options
- the list of options as an array of strings
java.lang.Exception
- if an option is not supportedpublic java.lang.String displayModelInOldFormatTipText()
public void setDisplayModelInOldFormat(boolean d)
d
- true if model ouput is to be shown in the old formatpublic boolean getDisplayModelInOldFormat()
public java.lang.String minStdDevTipText()
public void setMinStdDev(double m)
m
- minimum value for standard deviationpublic void setMinStdDevPerAtt(double[] m)
public double getMinStdDev()
public java.lang.String numClustersTipText()
public void setNumClusters(int n) throws java.lang.Exception
setNumClusters
in interface NumberOfClustersRequestable
n
- the number of clusters
java.lang.Exception
- if n is 0public int getNumClusters()
public java.lang.String maxIterationsTipText()
public void setMaxIterations(int i) throws java.lang.Exception
i
- the number of iterations
java.lang.Exception
- if i is less than 1public int getMaxIterations()
public java.lang.String debugTipText()
public void setDebug(boolean v)
v
- true for verbose outputpublic boolean getDebug()
public java.lang.String[] getOptions()
getOptions
in interface OptionHandler
getOptions
in class RandomizableDensityBasedClusterer
public double[][][] getClusterModelsNumericAtts()
double[][][]
valuepublic double[] getClusterPriors()
double[]
valuepublic java.lang.String toString()
toString
in class java.lang.Object
public int numberOfClusters() throws java.lang.Exception
numberOfClusters
in interface Clusterer
numberOfClusters
in class AbstractClusterer
java.lang.Exception
- if number of clusters could not be returned
successfullypublic Capabilities getCapabilities()
getCapabilities
in interface Clusterer
getCapabilities
in interface CapabilitiesHandler
getCapabilities
in class AbstractClusterer
Capabilities
public void buildClusterer(Instances data) throws java.lang.Exception
buildClusterer
in interface Clusterer
buildClusterer
in class AbstractClusterer
data
- set of instances serving as training data
java.lang.Exception
- if the clusterer has not been
generated successfullypublic double[] clusterPriors()
clusterPriors
in interface DensityBasedClusterer
clusterPriors
in class AbstractDensityBasedClusterer
public double[] logDensityPerClusterForInstance(Instance inst) throws java.lang.Exception
logDensityPerClusterForInstance
in interface DensityBasedClusterer
logDensityPerClusterForInstance
in class AbstractDensityBasedClusterer
inst
- the instance to compute the density for
java.lang.Exception
- if the density could not be computed
successfullypublic java.lang.String getRevision()
getRevision
in interface RevisionHandler
getRevision
in class AbstractClusterer
public static void main(java.lang.String[] argv)
argv
- should contain the following arguments: -t training file [-T test file] [-N number of clusters] [-S random seed]
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |