Subscribe to DSC Newsletter

Is there any Open Source Data Mining Tool for Creating Decision Trees ?

Is there any Open Source Data Mining Tool for Creating Decision Trees ?

Views: 6176

Reply to This

Replies to This Discussion

R has packages for creating decision trees.  The most notable one is rpart.  For a variety of add-on packages that R can use for data mining, including decision trees, look at the Machine Learning task views page.

 

http://cran.r-project.org/web/views/MachineLearning.html

 

 

Check C4.5. 

 

C4.5 is a software extension of the basic ID3 algorithm designed by Quinlan to address the following issues not dealt with by ID3:

  • Avoiding overfitting the data
    • Determining how deeply to grow a decision tree.
  • Reduced error pruning.
  • Rule post-pruning.
  • Handling continuous attributes.
    • e.g., temperature
  • Choosing an appropriate attribute selection measure.
  • Handling training data with missing attribute values.
  • Handling attributes with differing costs.
  • Improving computational efficiency.

It is installed for use on Grendel (grendel.icd.uregina.ca), but it may be set up on a local machine as follows:

C4.5 Release 8 Installation Instructions for UNIX

  1. Download the C4.5 source code.
  2. Decompress the archive:
    1. Type "tar xvzf c4.5r8.tar" (not universally supported), or, alternatively,
    2. Type "gunzip c4.5r8.tar.gz" to decompress the gzip archive, and then
      Type "tar xvf c4.5r8.tar" to decompress the tar archive.
  3. Change to ./R8/Src
  4. Type "make all" to compile the executables.
  5. Put the executables into a "bin" subdirectory and include it in the path for command-line usage.

Manual Pages

  • c4.5: using the c4.5 decision tree generator.
  • verbose c4.5: interpreting output generated by c4.5.
  • consult: uses a decision tree to classify items.
  • consultr: uses a rule set to classify items.

Examples

Click on the links below for examples of C4.5 usage:

Source: http://www2.cs.uregina.ca/~dbd/cs831/notes/ml/dtrees/c4.5/tutorial....

You can also use WEKA. It is open source data mining software with wide variety of machine learning algorithms for data mining tasks. It's easily installable and GUI based easy to use.

Google it or here is the link:

http://www.cs.waikato.ac.nz/ml/weka/

 

I'm using WEKA for my data mining course work, so let me know in case you need any assistance.

 

thanks Abhinav ....

I will take a look and get back to you in case of any clarifications..

Hi Abhinav,

 

I am able to successfully download and install WEKA.

Now, I have CSV file having variables and there corresponding data. I want to build a decision tree in which in which one variable will be performance variable and other will be independent variables.

Could you please guide me steps in creating this kind of decision tree in WEKA....?

weka is the best, from learning perspective. The best thing is, it operates on text files, so from learning perspective you just need the sample file and you are good to go.,

Hi Abhinav,

 

I am able to convert the CSV into ARFF file.

Could you please guide me the steps that i need to follow to create decision tree. My file has 4 variables in which 3 are categorical and one is numeric (dependent variable).

 

I just want to split the 3 categorical variables based on this fourth numeric dependent variable

 

Please advice. Thanks in advance.

 

Regards,

Yashu

Hi Yashu,

Actually you dont have to convert csv file at all.

When selecting the file to load just select "csv" and program will pop only csv files.

So you have 3 independent var and 1 dependent which is numeric. Regression Tree might help.

Best,

Bhupendrasinh Thakre

You can use KNIME. It has a graphical interface, easy to use for many data mining tasks including decision trees.

It also includes a simplified graphical access to weka and an R integration.

It is open source. You can download it for free from https://www.knime.org/downloads/overview

KNIME is my favorite tooling, especially for new users the graphical interface en the IO read are super.

 

scikit-learn is a machine learning library in python. 

If you need to repeat your job many times and your data is stored in a database instead of just a csv file, then I recommend to write a script for this. It can also output the resulting tree in a pdf which is quite nice.

RSS

On Data Science Central

© 2019   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service