Dealing with non-normal data: are you skewed?

I was recently trying to model some data from a normal distribution but the data were right-skewed.  No amount of transformation could eliminate this.  In the past, I’ve dealt with this by using the skew normal distribution.  But rather than match distributions to data, we should be asking whether skewness makes sense (in our case) biologically?  There has been good reason to expect this in some cases, such as where environmental filters might push a trait into a certain direction.  But what about where there isn’t a good rationale for skewness?  Why might it arise and what can we do about it?

This is where the old Student’s t-distribution comes in.  The t-distribution has a bell shaped curve just like the normal but it has heavy tails that become shorter as the ‘degrees of freedom’ parameter ν approaches infinity.

500px-Student_t_pdf.svg

What this means is that if we have a small-sample size (n = 8 in the case of my data from earlier this week), drawing samples from a symmetric, long-tailed distribution can often create the impression of asymmetric skewness.  Gelman (via Rubin in fact) recommended that:

“if you want to model an asymmetric distribution with outliers, you can use a symmetric long-tailed model”

He’s even gone so far as to say that we should always be defaulting to t-distributions but that it hasn’t permeated practice because of computational issues.  He predicted these would be eliminated with our saviour Stan.  In fact, you can use t-distributions in JAGS.

The real problem that I’ve had with the t-distribution is that it requires a value for the degrees of freedom ν.  I wanted to estimate this because its ‘true’ value should really be unknown despite my 8 observations.  And this can be difficult in a Bayesian context because of the prior we place on ν.  After some reading and fiddling, I found that the recommendation of Gelman and Hill from pg 372 of ARM (and described here) worked well and converged onto a relatively tight posterior for ν despite the uninformative prior.  I think I’ll be using this approach a lot more when I have small sample sizes!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s