Subscribe to DSC Newsletter

When I do a random sampling from a population and find that the original distribution (as in population) of the variable is not maintained in the sample, what are possible ways in which I can make the distribution of the variable in sample consistent with that of population?


For eg. say there is a variable income which follows a normal distribution in the population but follows a skewed distribution in the random sample.One of the solutions that I know of, and correct me if I am wrong, is to transform sample data by applying suitable transformations to achieve consistency between sample and population.But isn't this a trial and error method - in terms of finding the most suitable transformation?Are there any other ways by which I can make sample data mimic the population especially when the number of variables (continuous) involved in the dataset is huge?


Hope my question is not redundant with any of my previous questions on this forum.




Views: 140

Reply to This

Replies to This Discussion


It depends on what the purpose of the sample is. If you are only looking to extract a representative sample, I would look at resampling from the original population    If it is not practical to resample, look at how skewed your sample really is.  Many parametric tests will tolerate some skewness, otherwise I would suggest using distribution free tests.

I do not suggest transforming the data, unless the transform exists in the original population.


-Ralph Winters



If you following the sampling norms correctly, then diagnose why there is a skewness in your sample data. Most of the time the skewness is due to the fact that sample drawn do not follow simple random sampling assumptions. Based on the diagnosis, and appropriate calibration method could also be used to overcome any identified biasness and make the sample representative enough to the population.


On Data Science Central

© 2020   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service