Subscribe to DSC Newsletter

Number of rows? Number of fields? Size (in GB or TB)? What is web mining, fraud detection, text mining, business intelligence related (or other)? How did you process the data (TeraData, SAS/Enterprise Miner, Ad-Hoc scripts such as Perl or Python)? How long did it take to process the data? What kind of analysis was it? Did you do it in a distributed environment?

Views: 873

Reply to This

Replies to This Discussion

Hi, Vincent... First of all thanks a lot for starting this analytics group (i am excited to post messages and take challenges). I really appreciate this, and recommend others to join.

I worked with about 22.5 Million records dataset, churning the telecom database. COming to the number of fields would be about 400 odd.
88 million observations (above 10 GB post compression in SAS) and about 50 odd variables ... processed entirely in SAS. The project ran over three months ...
88 million observations (above 10 GB post compression in SAS) and about 50 odd variables ... processed entirely in SAS.
Hi Vincent,
Good forum to reflect upon the past work... My maximum rows were 15 million with 100 variables. Worked in SAS as well as SPSS environment for this Healthcare fraud project . Repeated measures ANOVA, Discriminant analysis and clustering were used to extract variables of importance. Took a month of going at it and found 17 variables that mattered the most.
360М rows * 450 columns. Predictive analytics. Marketing for Kraft USA. SAS. Distributed environment.
171 fields, 2 petabytes - this was a real estate database maintained by LexisNexis. My team provided content and ran extensive QC using proprietary s/w.

RSS

On Data Science Central

© 2019   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service