A Data Science Central Community
I have couple queries related my research in areas of data mining and machine learning.
I have software development life cycle data set containing features like:
-project id [alphanumeric]
-client details [alphanumeric]
-technology / domain used for development [alphanumeric]
-planned and actual no.of days spent for project development all phases individually [number]
-planned and actual cost spent for project development all phases individually [currency and number]
-total members of team for all phases individually [number]
1) If this historical data about software project development life cycle is made available in clustered form [groups] do you think it will be useful to handle new upcoming projects easily?
2) If yes in what sense?
I have developed a new statistical algorithm for clustering [rather incremental clustering] only numeric dataset and worked with data from other domains as well. Now the task is to cluster SDLC data. So need to know:
1) How fruitful it is to store SDLC data for forecasting / estimation / incremental learning and knowledge augmentation?
2) And how to convert alphanumeric software project data [as mentioned above] into completely numeric so as to perform statistical computations for clustering. Any suggestions. Thx.