A Data Science Central Community
When I do a random sampling from a population and find that the original distribution (as in population) of the variable is not maintained in the sample, what are possible ways in which I can make the distribution of the variable in sample consistent with that of population?
For eg. say there is a variable income which follows a normal distribution in the population but follows a skewed distribution in the random sample.One of the solutions that I know of, and correct me if I am wrong, is to transform sample data by applying suitable transformations to achieve consistency between sample and population.But isn't this a trial and error method - in terms of finding the most suitable transformation?Are there any other ways by which I can make sample data mimic the population especially when the number of variables (continuous) involved in the dataset is huge?
Hope my question is not redundant with any of my previous questions on this forum.
It depends on what the purpose of the sample is. If you are only looking to extract a representative sample, I would look at resampling from the original population If it is not practical to resample, look at how skewed your sample really is. Many parametric tests will tolerate some skewness, otherwise I would suggest using distribution free tests.
I do not suggest transforming the data, unless the transform exists in the original population.