A Data Science Central Community
R has packages for creating decision trees. The most notable one is rpart. For a variety of add-on packages that R can use for data mining, including decision trees, look at the Machine Learning task views page.
C4.5 is a software extension of the basic ID3 algorithm designed by Quinlan to address the following issues not dealt with by ID3:
It is installed for use on Grendel (grendel.icd.uregina.ca), but it may be set up on a local machine as follows:
Click on the links below for examples of C4.5 usage:
You can also use WEKA. It is open source data mining software with wide variety of machine learning algorithms for data mining tasks. It's easily installable and GUI based easy to use.
Google it or here is the link:
I'm using WEKA for my data mining course work, so let me know in case you need any assistance.
thanks Abhinav ....
I will take a look and get back to you in case of any clarifications..
I am able to successfully download and install WEKA.
Now, I have CSV file having variables and there corresponding data. I want to build a decision tree in which in which one variable will be performance variable and other will be independent variables.
Could you please guide me steps in creating this kind of decision tree in WEKA....?
I am able to convert the CSV into ARFF file.
Could you please guide me the steps that i need to follow to create decision tree. My file has 4 variables in which 3 are categorical and one is numeric (dependent variable).
I just want to split the 3 categorical variables based on this fourth numeric dependent variable
Please advice. Thanks in advance.
Actually you dont have to convert csv file at all.
When selecting the file to load just select "csv" and program will pop only csv files.
So you have 3 independent var and 1 dependent which is numeric. Regression Tree might help.
You can use KNIME. It has a graphical interface, easy to use for many data mining tasks including decision trees.
It also includes a simplified graphical access to weka and an R integration.
It is open source. You can download it for free from https://www.knime.org/downloads/overview
KNIME is my favorite tooling, especially for new users the graphical interface en the IO read are super.
If you need to repeat your job many times and your data is stored in a database instead of just a csv file, then I recommend to write a script for this. It can also output the resulting tree in a pdf which is quite nice.