Hello fellow Miners
I am currently reading "Getting Things done"
by David Allen and I am still impressed by his simple but powerful system. Now I wondered whether there is a similar system (i.e. best practices) for daily data analysis.
In such a analysis a huge number of files is created: code snippets, more extensive parameterizable scripts, reports, model-files, remarks etc.. One the one side you want to be as flexible as possible, e.g. trying out parameters of complex models to get a feeling for the data, on the other side you have to focus on reproducible results. Nothing is more embarrassing then one these moments where you remember a remarkable result, but you cannot reproduce it because you have changed a dependant file (e.g. the sql-script which loads the data initially).
I am glad using RapidMiner which supports complete process descriptions in xml consisting of closed-code functions (by closed code I mean you cannot change the behaviour of functions on the fly in opposite to perform the analysis with a bunch of losely coupled python-scripts, RapidMiner is of course Open Source). But this is not enough because I still produce a lot of files.
David Allen says, that the key is to trick oneself in using defined and reliable procedures. What is your trick ?