Subscribe to DSC Newsletter

I have two population means and i am facing a problem.
1. I have alot of missing values.what amount of missing values can we work with. Is there a limit of missing values that i should nt exceed?

Views: 92

Reply to This

Replies to This Discussion

Here are some crumbs to think about, so you may answer the question yourself:
1. Do the missing values appear at random or is there a pattern
2. Is the missing-value-pattern (if exists) the same for both populations
3. Less data means a harder time to reach a certain level of significance. So ... what do you think the limit is ?

There is no definite answer to your general question.

- steffen

PS: As a helpful suggestion, not meant as offense:
It would be better if you could say more about your problem. What is the population? How you got your data? What is expected to be done with the data? If in a problem, you have missing values and you choose to work with available data; then it is one problem. On the other hand if you choose to estimate them before proceeding with your analysis it will be a different problem. The situation has to be seen before providing a solution.
There is no hard and fast rule. Depends upon the distribution of the data, variance and number of observations. You also did not state whether the missing data is in the dependent or independent variables. You can try to impute missing values, but these are generally CPU intensive.

-Ralph Winters


On Data Science Central

© 2020   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service