I have some doubts regarding oversampling. I am trying to predict the probability of a person to unsubscribe from emails for an online retailer. I plan to run a Logistic regression model for this purpose on 5% of my total user base which is around 3 million.
In my total population the % of unsubscribers is around 10%. I have Oversampled my 5% Random sample such that it has 25% unsubscribers. Now to run any cross-tabs or test some hypothesis should I assign weights as my data is Oversampled and if so how to assign Weights??
Can I assign a weights of 10% / 25% for Unsubscribers and 90% / 75% for subscribers??