A Data Science Central Community
With a data analysis plan, you know what you’re going to do when you actually sit down to do the analysis of the data you’ve gathered. It’s a vitally important thing for you to have, as it will guide how you’re going to collect your data. After all, it’s very difficult to add in new variables afterward.
For that reason, you want to make sure you’ve created your plan beforehand so that you can be sure that you’re asking all the questions you need to and you know what you’re going to do. Sure, as they say, a plan only lasts until the first shot is fired. And yes, that’s also true in data analysis. Nonetheless, having a good plan can save you a great deal of time, while having a bad one (or even worse, none at all) at best means you’ll be struggling to make sense of the data and at worst will make you realize your data is worthless as you forgot to collect a crucial variable.
To make sure your plan rocks, follow these hints and tips.
As they say, you need a minimum of about 20 participants per cell to register any kind of effect. So if you’re doing a 2 X 2 design (Which is really quite common) you’ll need at least 80 participants.
So if you’re looking for a gender effect (which is the first ‘2’) and you expect it to be moderated by whether they went to college or not (the other ‘2’) you’ve got yourself a good old 2 X 2 design and will need at least 80 participants.
And, truth be told, that still will mean you’ll miss most of the time.
A much better bet is to go with double that number per cell. More if you’ve got the time and the inclination. In that way, you’ll be far more likely to actually find some kind of an effect. And that’s a good thing, as it’s far more fun to find something as that will give you something to write about (and possibly might give you a reason to publish).
Some people think it’s silly to draw up dummy tables, seeing as they don’t have any statistics. I disagree. The tables and the figures can be immensely helpful in that they can unearth assumptions that you may be making in your model that you weren’t aware of.
And that’s vital, as these assumptions might lead you down the garden path if not addressed, leading to your data collection not creating any significant results, because you forgot to measure some dimension or because you didn’t think carefully enough about what was going on.
So draw up the figures and don’t just put nonsense into them. Instead, try to draw them up in a realistic manner and work out what you would need for them to work out in that way. Chances are, that will help a lot.
Go crazy. Write down everything that might in some way be related to the variables you want to collect. You think hair color might affect IQ? Then write it down. You think pet ownership might play a role in how creative people are? Write it down. Go a little bit crazy if you have to. Think laterally (something that will help you improve your writing and be more creative besides)
Now, the best thing you can do is write down the connection, the direction and the role of these variables. If you want to do it really well, make sure you quote sources where such a connection has been previously established. That will make it easier, in the end, to choose which variables you actually want to include in your design.
Of course, you won’t be able to measure them all, but you’ll thank yourself for putting everything and the kitchen sink on paper as you’ll often realize you’ll need a variable that you might otherwise have ignored.
And these can be life savers if the effect doesn’t work as well as you hoped and you’ve got these extra variables to suck some of the variability out of your dependent variable.
Yes, I know, it’s hard to know the difference between the two.In short, mediators are essential, while moderators you can do without. So when I go to the bar, the number of drinks I have there will decide my hangover. If I went to the bar and didn’t drink anything, there wouldn’t be a hangover. Therefore a number of drinks are a mediator.
The number of friends that go with me to the bar, on the other hand, will only influence how much I drink. Therefore, they’re a moderator. I can still get hungover without them, but the more mates are there, the drunker I’ll likely become.
You want to make sure you map these out as best as possible before you start in on your analysis and your data collection. Personally, I’ll often draw several different models, with arrows pointing this way and that. In this way, I can see what goes through where and why I think that’s the case. Then, afterward, you can try your main model as well as some of the alternative models.
A lot of people seem to think that the best variables are yes or no, on or off, 0 or 1.
Those variables are terrible as there is near to no granulation in them. It’s much better to have a variable which you can measure on a scale (from 1 to 7 normally). Why? Because in doing so, you’re creating far more nuance and allowing a much bigger chance that if your effect is real you’ll actually find something.
Think of it like this. If I divide hair color into ‘blond’ and ‘not’ that costs me a heck of a lot of variation. I’m grouping together the dirty blonds with the angel-haired blonds and the redheads with the blackheads.
Wouldn’t it be much better if I granulated the affair from really light to really dark? Then the graduation is much smoother and I’m far more likely to find an effect if it is there.
For that reason, give yourself as big of a chance as possible by having a scale for your dependent variable (and for your independent variables as well, while you’re at it).
A good data plan can save your research. And it’s not even that hard to draw up. Just sit down and think logically about what you’re measuring and why, as well as where it belongs in your design. In this way, you can avoid the errors before they happen. And that’s important, as it can be incredibly hard to fix them afterward.
I kid you not, I’ve seen so many students collect data, only to realize afterward that they forgot their most important variable and that the data they collected was absolutely useless. They could have avoided that (and all the extra work of having to collect the data again) if they’d drawn up a better data plan and been better prepared.
So don’t be one of those people. Draw up a plan. You’ll thank me for it.