A Data Science Central Community
This is a very challenging problem, that even as of today, is still very challenging. For the closest few hundred stars (and the focus on the discussion is on those close stars exclusively), the methodology to measure the distance is based on parallax measurements. You measure the angle to the star using 2 other reference points (the sun being one of them), and six months later, when you have traveled 300 million kilometers in the sky and are just opposite to the sun, you measure the angle again. Based on this triangulation, you can then estimate the distance to the star in question. See illustration below, and you will immediately understand the mechanism.
The big issue is that the 300,000,000 kilometers that separate the two observation points (blue circles 6 months apart) is at least 600,000 times smaller than the distance to the (closest) start. In other words, the angles in question are extremely acute, less than 1/4,000 of a degree. You first need very accurate instruments to measure the parallax, perfectly calibrated, unbiased, and be located in a very stable location to eliminate tiny side effects that will impact the measurement.
The first astronomer to make this measurement (Bessel) repeated his calculations 16 times, and averaged the 16 measurements. This significantly reduces meaurement errors. But what if the resolution of your device is not granular enough (for a distant star)? For instance you can measure a deviation of 1/1,000 of a degree, but not one of 1/4,000 of a degree.
That's where the data science magic kicks in:
Instead of making 16 repetitions of a single measurement, do 10 measurements each day at exactly the same time, over a period of two years. For each measurement, repeat the process 16 times just like Bessel did. You now have 58,400 pairs of measurements (6-month apart) that you can analyse to identify patterns, trend and compute a much more accurate parallax (with small enough confidence interval) and thus a solution to the problem, by averaging grouping / measurements after removing outliers. The real magic in this is that you can in fact, thanks to good design of experiment and statistical inference, measure a distance that your instrument can not technically do due to its too low resolution. Isn't this amazing?
Note: Variations in these 58,400 measurements should be white noise: if you see patterns, something is wrong with the experiment or the device, and need to be fixed.
Could a similar methodology be used to detect very rare occurrences of fraud (occurring say one in 100,000 transactions), by magnifying the imperceptible signal, just as used in the distance-to-star problem?
Looks like you use a software rather than an hardware solution to fix a problem with an instrument (to boost resolution). Interesting and new approach, much less costly than an hardware fix.
A good analogy is anti-missile systems: due to inaccuracies, Americans launched 10 rockets (not just one) against each single Iraqi missile sent by Saddam Hussein, targeted to Israel. The reason is to make multiple inaccurate hits in order to significantly increase the odds of having a successful hit and destroy the missile before it reaches its target.