|
CALD Software Repository |
|||
CALD has gathered together a wide variety of commercial and publically available software that contains algorithms for learning classifications and for clustering. The commercial software ( Clementine, Darwin, IBM Intelligent Miner, Model 1, SAS Enterprise Miner, and SGI Mineset) has been donated by Software Affiliates of CALD. (Darwin and SGI Mineset have not yet arrived, but will be available soon.) These programs may be used by anyone in the CALD community for educational or research purposes; they should not be used for any commercial purpose (although they may be used for blitz projects with CALD Corporate Partners.) The software that has been gathered and the basic functions and features of the software are listed in the tables appearing below, which are followed by brief descriptions, which may be reached by clicking on the name of a program. The features that are present are marked with an X in the table, and some of the entries in the table also contain letters referring to footnotes below the table which contain further information. The table also indicates whether the software runs under Unix or Windows NT (abbreviated as NT.) The Software Affiliates hope that they will gain useful information about improvements to their software, and some information about successful applications to advertise. If you have suggestions or found a program useful, please send e-mail to ps7z@andrew.cmu.edu. |
||||
|
||||
Unix Access |
|
The software that runs under UNIX has been installed in CALD directories on the afs file system. The specific location for each program can be found by clicking on the name of the program. Anyone with a CS account or an Andrew account can execute them. Those using an Andrew account must type "cklog cs.cmu.edu" before executing the program. |
||
Windows NT Access |
|
The software that runs on Windows NT has been installed in the CALD laboratory (Wean Hall 4616) on machines CALD-1 through CALD-4. These machines are all dual boot (LINUX and Windows NT.) CALD-2 and CALD-4 have been designated as machines that can be switched to Windows NT by the user at the console at any time, and should be used before CALD-1 and CALD-3. The programs can be executed from the Start menu (except for Intelligent Miner, whose execution instructions can be found by clicking on the name of the program.) To get an account on a CALD-1 through CALD-4, write diane@cs.cmu.edu. |
||
Documentation |
|
The hard-copy documentation that is available (which includes documentation of all the commercial software) is in the CALD laboratory (Wean Hall 4616.) Those programs without hard-copy documentation (as well as those with hard-copy documentation) have on-line help. |
||
Demonstrations and Help |
|
If you would be interested in a demonstration of Clementine, IBM Intelligent Miner, Model 1, or SAS Enterprise Miner, (or when they arrive, SGI Mineset or Darwin), please write me at ps7z@andrew.cmu.edu. I am familiar with the basics of these programs, and would be happy to try and answer questions or to give information about the technical support services of the individual software vendors. |
||
|
Classification Software
Software:
Features |
Bayes Knowledge Discoverer (Unix) |
Clementine (NT) |
Darwin (NT) |
MLC++ (Unix) |
Model 1 (NT) |
SAS Enterprise Miner (NT) |
SGI Mineset (NT) |
SNNS (Unix) |
Tetrad (Unix) |
||
Simple Bayes |
|
|
|
|
|
X |
X |
|
|
|
|
Decision Trees |
|
X a |
X b |
X |
|
X c |
X d |
X |
X e |
|
|
Logistic Regression |
|
|
|
|
|
|
X |
X |
|
|
|
Linear Regression |
|
X |
|
|
|
X f |
X |
X |
X f |
|
|
Neural Networks |
|
X |
X |
X |
|
X |
X |
X |
|
X |
|
Rule Builders |
|
X |
|
|
|
X g |
|
|
|
|
|
Association Rule Builder |
|
X |
|
X |
|
|
|
|
|
|
|
Decision Table |
|
|
|
|
|
X |
|
|
X |
|
|
Radial Basis Functions |
|
X |
|
X |
|
|
|
|
|
|
|
Instance Based |
|
|
|
|
|
X h |
|
|
|
|
|
Linear Discriminators |
|
|
|
|
|
X i |
|
|
|
|
|
Memory-based |
|
|
X |
|
|
|
|
|
|
|
|
Bayesian Networks |
X |
|
|
|
X |
|
|
|
|
|
X |
Time Series |
|
|
|
X |
|
|
|
|
|
|
|
Features Software: |
Bayes Knowledge Discoverer (Unix) |
Clementine (NT) |
Darwin (NT) |
MLC++ (Unix) |
Model 1 (NT) |
SAS Enterprise Miner (NT) |
SGI Mineset (NT) |
SNNS (Unix) |
Tetrad (Unix) |
Clustering Software
Software: Features |
Clementine (NT) |
Darwin (NT) |
Model 1 (NT) |
SAS Enterprise Miner (NT) |
SGI Mineset (NT) |
||
Miscellaneous |
|
|
|
|
X a |
X a |
|
Kohonen Networks |
|
X |
|
X |
|
|
|
Kmeans clustering |
|
X |
X |
|
|
|
X b |
Bayesian |
X |
|
|
|
|
|
|
a. unspecified method b. single and iterative
Miscellaneous Software
Software: Features |
SAS |
|
Statistical |
|
X |
Database |
X |
|
CALD Location: /afs/cs.cmu.edu/project/cald-1/autoclass-c/autoclass
"The program AUTOCLASS III, Automatic Class Discovery from Data, uses Bayesian probability theory to provide a simple and extensible approach to problems such as classification and general mixture separation. Its theoretical basis is free from ad hoc quantities, and in particular free of any measures which alter the data to suit the needs of the program. As a result, the elementary classification model used lends itself easily to extensions."
Link to Autoclass III Web Page
CALD Location: /afs/cs.cmu.edu/project/cald-1/ramoni/bkd/bin/bkd
"Bayesian Knowledge Discoverer (BKD) is a computer program able to learn Bayesian Belief Networks from (possibly incomplete) databases. BKD is based on a new estimation method called Bound and Collapse and it has been developed within the Bayesian Knowledge Discovery project." It is capable of parameter estimation, model selection, goal-oriented propagation, discretization, handling missing values, and has a graphic user interface.
Link to Bayes Knowledge Discoverer Web Page
CALD Location: CALD-1, CALD-2, CALD-3, CALD-4 (when running Windows NT)
Clementine is based on a visual programming interface which links data access, manipulation and visualisation together with machine learning (decision tree induction and neural networks). Trained rules and networks can be exported as C source code. It uses a graphical 'building block' approach to develop applications. Clementine can acess data from files or databases, provides a graphical interface for data manipulation and data visualization, and has a number of machine learning algorithms. It contains the C5.0 algorithm for creating decision trees, and allows for using boosting. Clementine also has a standard which allows users to write code to use the Clementine interface on programs written outside of Clementine.
CALD Location: Coming Soon
Darwin uses wizards to guide users through the process of assembling the data, building models and interpreting results. The models can also be generated in C++ or Java to be used outside of Darwin. It allows users to integrate features of Darwin into other applications and decision support tools.
DB2 is IBM's database manager.
CALD Location: CALD-1, CALD-2, CALD-3, CALD-4 (when running Windows NT)
CALD Location: CALD-2, CALD-4 (when running Windows NT)
Intelligent Miner provides a number of different tools for creating classification and clustering models, for preprocessing data, for applying models. It can read data from a db2 data bas or an ASCII file. To access this program you will need to get instructions and a password from ps7z@andrew.cmu.edu.
Link to IBM Enterprise Miner Web Page
CALD Location: CALD-1, CALD-2, CALD-3, CALD-4 (when running Windows NT)
"Microsoft Bayes Network allows the creation, assessment and evaluation of Bayesian belief networks… It supports the following operations: loading and storing of belief networks in textual form, creation and modification of networks through the addition of nodes and arcs, assessment of discrete probabilities, evaluation of belief networks using exact clique-tree propagation methods , decision-theoretic troubleshooting and recommendations, asymmetric assessment, and single-decision influence diagrams."
Link to Microsoft Bayes Network Web Page
CALD Location: /afs/cs.cmu.edu/project/cald-4/mlc++/mlclogout
MLC++ provides a wide variety of learning algorithms with a common interface. It allows for discretization, bagging, variable selection and boosting. It provides the algorithms for SGI Mineset. The advantage over Mineset is that it is more flexible and has more algorithms. The disadvantage is that the interface is much more primitive, and it lacks Minesets powerful visualization tools.
CALD Location: CALD-1, CALD-2, CALD-3, CALD-4 (when running Windows NT)
Model 1 provides a wide variety of methods for data access, pre-processing, analysis, optimization and validation. Model1 automatically applies many different classification algorithms to a given problem. It also automatically tries many different subsets of variables, and bins variables in a variety of different ways. It provides wizards for guiding the user through a problem. It can read either ASCII files, or read from a data base.
CALD Location: CALD-1, CALD-2, CALD-3, CALD-4 (when running Windows NT)
SAS is a powerful statistical program that provides a large suite of statistical and visualization tools, as well as supporting SQL. SAS Enterprise Miner provides a graphical interface to some of the tools that are in SAS.
CALD Location: CALD-1, CALD-2, CALD-3, CALD-4 (when running Windows NT)
Enterprise Miner provides a graphical interface to the SAS statistical program. It allows the user to pre-process variables in a variety of ways, and then create and test models of the data.
Link to SAS Enterprise Miner Web Page
CALD Location: Coming Soon.
Mineset provides many powerful visualization tools, and a simple interface. It uses interactive 3 dimensional images, colors, shapes, and animation to allows the user to visually explore both complex models and data. It can read data from ASCII files and from databases. and to Direct access to Oracle, Informix, Sybase, and flat file data. It also provides an easy to user interface to many of the algorithms in MLC++. It also allows boosting
CALD Location: /afs/cs.cmu.edu/project/cald-1/snn/SNNSv4.1
SNNS is a very flexible and powerful tool for creating neural networks. It includes a graphical interface, a wide variety of propagation algorithms, and allows the user to set many different parameters.
Location: /afs/cs.cmu.edu/project/cald-1/tetrad/tetrad
Tetrad II is a multi-module program that assists in the constructionof causal explanations for sample data and their use in prediction. Withcontinuous variables the program will aid in the search for "pathmodels" or "structural equation models;" with discrete datathe program will construct and update a Bayes network from sample dataand user knowledge of the domain; the program includes Monte Carlo facilities.Proofs of the asymptotic correctness of all but one of the search modulesare available in P. Spirtes, C. Glymour and R. Scheines, Causation,Prediction and Search, Springer Lecture Notes in Statistics, 1993.