A Data Science Central Community

Started this discussion. Last reply by Jason Monte Aug 29, 2012. 1 Reply 0 Likes

Suppose we want to look at which airlines have similar pricing strategies. The data set looks like this: Variables: Flight Origination, Flight Destination, Airline1 Price, Airline2 Price,…Continue

Started this discussion. Last reply by Ingo Mierswa Aug 15, 2012. 4 Replies 0 Likes

Hadoop and Big Data are buzzwords these days. How does it affect data mining workers? Should it be completely transparent for people only using analytical tools such as R, SPSS, SAS etc. in their…Continue

Started this discussion. Last reply by Sean Flanigan Jul 20, 2012. 5 Replies 0 Likes

Below is a quote regarding logistic regression. It seems it is saying OLS regression requires independent variables to be normally distributed. Based on my past experience, most independent…Continue

Started this discussion. Last reply by Varun Bhargava May 24, 2017. 7 Replies 0 Likes

If the sample is obtained through simple random sampling, would it be automatically representative of the population? If not, what is the way to determine if it is representative.Continue

Varun Bhargava replied to Jason Monte's discussion How To Determine If A Sample Is Representative

"Or even the mean of sample and population can work I guess"

May 24, 2017

Varun Bhargava replied to Jason Monte's discussion How To Determine If A Sample Is Representative

"Can't we just plot a scatter plot of population and samples and visually confirm that sample selected for training is close to the population.
And is it necessary to get rid of the outliers ?"

May 24, 2017

Jason Monte replied to Jason Monte's discussion Grouping Similar Competitors

"More info:
What I mean by "price similarly" is when Airline1 prices high on a origination and destination pair, Airline3 and Airline8 also prices high."

Aug 29, 2012

Jason Monte's discussion was featured### Grouping Similar Competitors

Suppose we want to look at which airlines have similar pricing strategies. The data set looks like this: Variables: Flight Origination, Flight Destination, Airline1 Price, Airline2 Price, ....Airline10 Price. Data:Origination: A, Destination: B, Airline1 Price=100, Airline2 = 120, ...., Airline10=95Origination: A, Destination: C, Airline1 Price=500, Airline2 = 450, ...., Airline10=505...... The expected outcome is like:Airline1, Airline3, Airline8 price similar.Airline2, Airline4 price…See More

Aug 28, 2012

Jason Monte posted a discussion### Grouping Similar Competitors

Suppose we want to look at which airlines have similar pricing strategies. The data set looks like this: Variables: Flight Origination, Flight Destination, Airline1 Price, Airline2 Price, ....Airline10 Price. Data:Origination: A, Destination: B, Airline1 Price=100, Airline2 = 120, ...., Airline10=95Origination: A, Destination: C, Airline1 Price=500, Airline2 = 450, ...., Airline10=505...... The expected outcome is like:Airline1, Airline3, Airline8 price similar.Airline2, Airline4 price…See More

Aug 28, 2012

Ingo Mierswa replied to Jason Monte's discussion Hadoop and Data Mining

"Hi Jason,
Hadoop and Big Data on itself does not really help anyone, especially not if it used on a data-management level only. So we could now store even larger data sets and we are able to retrieve them faster than before. Nice, but in principle…"

Aug 15, 2012

Manish Bhoge replied to Jason Monte's discussion Hadoop and Data Mining

"Jason,
You are completely right in your statement about Hadoop that it is makes data retrieval fater. But it does more than that actually. It has power of distributed computing where you have large number of CPU power to run your…"

Aug 13, 2012

Lynne Mysliwiec replied to Jason Monte's discussion Web Analytics vs Data Mining/Predictive Modeling

"Yes - there is a great deal of demand for data mining/predictive modeling people. Not only that, but there's competition between employers for the best talent & salaries are much better for heavy-duty quant people than they are for entry…"

Jul 24, 2012

Lynne Mysliwiec replied to Jason Monte's discussion How To Determine If A Sample Is Representative

"The answer is: No. No sample is guaranteed to be representative of the entire population, although the risk of non-representative samples is reduced as sample size / total N gets larger. The larger the % of the total population, the lower the…"

Jul 24, 2012

Ralph Winters replied to Jason Monte's discussion Hadoop and Data Mining

"I think it will foster different ways of operating on the data, to perform equivalent results. For example if you are used to doing a regression on a large sample base, you may be forced to perform separate analyses on the various subsets of…"

Jul 22, 2012

Name Withheld replied to Jason Monte's discussion Hadoop and Data Mining

"Although I don't have any great experience in the big data area, it looks like an exciting time to me. There are few current solutions which allow data scientists to effectively leverage big data without extensive understanding of the…"

Jul 22, 2012

Jason Monte posted a discussion### Hadoop and Data Mining

Hadoop and Big Data are buzzwords these days. How does it affect data mining workers? Should it be completely transparent for people only using analytical tools such as R, SPSS, SAS etc. in their life? I guess Hadoop and Big Data is more at the data-management level. It just makes data retrieval faster and has nothing to do with analytics.See More

Jul 21, 2012

Sean Flanigan replied to Jason Monte's discussion Independent variables need to be normally distributed in multiple regression?

"Sorry, I mean't the within category means of the DV, not the IV. "

Jul 20, 2012

Sean Flanigan replied to Jason Monte's discussion Web Analytics vs Data Mining/Predictive Modeling

"The fractional factorial is a higher end breed, more of a GLM traditionally , but there are choice modeling approaches as well (http://www.nobelprize.org/nobel_prizes/economics/laureates/2000/press.html) for these designs.
So, the research design…"

Jul 20, 2012

Sean Flanigan replied to Jason Monte's discussion How To Determine If A Sample Is Representative

"Social Research, where measures are not massive in certain studies, such as jury bias etc. Pharmaceutical research, where cost of data collection can be astronomical."

Jul 19, 2012

Sean Flanigan replied to Jason Monte's discussion Independent variables need to be normally distributed in multiple regression?

"Given dummy codes are not normal, would this generalize to impact the business presentation of "on average a unit increase in x produces an increase in y" if both are not normally distributed, or at least based on some fundamental…"

Jul 19, 2012

- Short Bio:
- A SAS Programmer

- Field of Expertise:
- Other

- Years of Experience in Analytical Role:
- 10

- Professional Status:
- Technical

- Interests:
- Networking

- No comments yet!

© 2019 AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

**Technical**

- Free Books and Resources for DSC Members
- Learn Machine Learning Coding Basics in a weekend
- New Machine Learning Cheat Sheet | Old one
- Advanced Machine Learning with Basic Excel
- 12 Algorithms Every Data Scientist Should Know
- Hitchhiker's Guide to Data Science, Machine Learning, R, Python
- Visualizations: Comparing Tableau, SPSS, R, Excel, Matlab, JS, Pyth...
- How to Automatically Determine the Number of Clusters in your Data
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- Fast Combinatorial Feature Selection with New Definition of Predict...
- 10 types of regressions. Which one to use?
- 40 Techniques Used by Data Scientists
- 15 Deep Learning Tutorials
- R: a survival guide to data science with R

**Non Technical**

- Advanced Analytic Platforms - Incumbents Fall - Challengers Rise
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- How to Become a Data Scientist - On your own
- 16 analytic disciplines compared to data science
- Six categories of Data Scientists
- 21 data science systems used by Amazon to operate its business
- 24 Uses of Statistical Modeling
- 33 unusual problems that can be solved with data science
- 22 Differences Between Junior and Senior Data Scientists
- Why You Should be a Data Science Generalist - and How to Become One
- Becoming a Billionaire Data Scientist vs Struggling to Get a $100k Job
- Why do people with no experience want to become data scientists?

**Articles from top bloggers**

- Kirk Borne | Stephanie Glen | Vincent Granville
- Ajit Jaokar | Ronald van Loon | Bernard Marr
- Steve Miller | Bill Schmarzo | Bill Vorhies

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives**: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions