Many measurements and few observations

Our University colleague Professor Sir David Spiegelhalter has written a brief opinion piece in the latest issue of Science on the future of probabilistic models, particularly for big datasets (think images or genomes).

Two points jumped out at me:

(1) Statistical problems have shifted from many observations (large n) and few parameters (small p) to small n and large p, creating pitfalls when testing large numbers of hypotheses.  This is because the standard “p-value”, which we’ve griped about in the past here, will declare 1 in 20 non-existent relationships “significant” simply by chance.  So procedures are needed to reduce false discoveries.  The bit that I didn’t really follow was why even bother minimizing false discoveries?  Wouldn’t an interpretation of effect sizes be more meaningful?

(2) Inferring causation from observational data will continue to be a challenge, especially when n gets cheap and p remains large.  Statistical theory to deal with causality will be needed more than ever, and thankfully, it is improving.  This is something we’re quite fond of having thought a fair bit about causality in the context of path analysis, structural equation modelling, and directed acyclic graphs (see our J Appl Ecol paper that just came out).  The problem, however, is that these approaches don’t come easy and I struggle to see how they can be used by non-statisticians (the models in our paper took years of faffing!).  Finding ways to make causal inference more accessible is going to be critical in the future.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s