A Data Science Central Community
Hi all, recently, i’m working on my thesis about classification for child labor using decision tree C5.0 algorithm compare with multivariate adaptive regression spline (MARS). I have imbalanced data for child labor (total 2402 sample, with 96% child labor and 4% not child labor)and 16 predictor variables.
Using decision tree for imbalanced data is not quite problem because of many techniques for balancing data, but i’m very confused with MARS(MARS with logit function). i have a few question:
1. could i just use MARS without balancing data? or
2. could 1 use sampling method(Oversampling,undersampling, SMOTE) for balancing data? or
3. could you proposing me some methods for me? Thank you for the advices