Our University colleague Professor Sir David Spiegelhalter has written a brief opinion piece in the latest issue of Science on the future of probabilistic models, particularly for big datasets (think images or genomes).
Two points jumped out at me:
(1) Statistical problems have shifted from many observations (large n) and few parameters (small p) to small n and large p, creating pitfalls when testing large numbers of hypotheses. This is because the standard “p-value”, which we’ve griped about in the past here, will declare 1 in 20 non-existent relationships “significant” simply by chance. So procedures are needed to reduce false discoveries. The bit that I didn’t really follow was why even bother minimizing false discoveries? Wouldn’t an interpretation of effect sizes be more meaningful?
(2) Inferring causation from observational data will continue to be a challenge, especially when n gets cheap and p remains large. Statistical theory to deal with causality will be needed more than ever, and thankfully, it is improving. This is something we’re quite fond of having thought a fair bit about causality in the context of path analysis, structural equation modelling, and directed acyclic graphs (see our J Appl Ecol paper that just came out). The problem, however, is that these approaches don’t come easy and I struggle to see how they can be used by non-statisticians (the models in our paper took years of faffing!). Finding ways to make causal inference more accessible is going to be critical in the future.