Analyze everything: Paper of the Day

Ellison, A.M. Pre-print. Repeatability and transparency in ecological research. Ecology.

With a title like that, this journal was definitely going to go into my reading file. Oddly enough, my source for pay-only journals sent them to me with just numbers as the file names, so I just opened one at random to start, and this was first.

So today's paper has easily the most intriguing title I've read recently. I think ecology is difficult to understand because there are always so many unknown and unknowable variables at play that it is impossible to replicate the results in the same way that you could replicate, for instance, the Michelson-Morley experiment in physics. So when someone discovers that P enrichment is directly linked to eutrophication in lakes, it isn't actually that hard to find another system where that simply isn't the case.

Regardless, I was expecting a largely philosophical discussion about the kind of knowledge you can achieve in ecology, and how we should understand that. Instead, this paper is really more of a discussion of repeatability in ecological synthesis (i.e., meta-analysis), and seems focused primarily on a particular controversy stemming from a 2001 paper by Mittelbach et al. on the relationship between species richness and productivity. Since I am only marginally aware of this controversy (I've read the 2001 paper, but none of the followup commentary that has ensued), a fair amount of this discussion is going over my head (around my head?).

Basically the author posits that because meta-analysis is using a set of static data, if someone else repeats the analysis, they should get the same result (whether or not they think it means the same thing is a totally different story). Apparently, in this chain of articles, that hasn't happened.

Let's focus on the widely applicable portions of this discussion.

1. Raw data is scant. Despite increasing mandates from funding agencies, and a variety of repositories, the amount of raw data being stored is small. This is a big problem for both repeatability and for meta-analysis in general.

2. Data must be formatted. Even if you have the data, most researchers are converting to common units for comparison to other studies, are interpolating missing values, and further converting for the purpose of statistical analysis. There are, predictably, a million ways of doing this (Just ask the scientists of the Climatic Research Unit). Documentation of methods is often scant, which makes it impossible to reproduce results.

3. The statistical tools should be archived. This is an interesting point and a major obstacle. For instance, in the Mittelbach et al. 2001 paper referred to above, the statistics were done using SYSTAT 8.0. SYSTAT 8.0 isn't the current version anymore (12.0 is; although you can buy the old software here). More troubling is that some of the stats used in the Mittelbach et al. 2001 paper were done with software that is no longer available or with algorithms that were not specified appropriately. There's absolutely no reason to believe this is an isolated example of this kind of problem. This shouldn't be a problem, since the different programs should be running approximately equivalent tests: But who knows?

Table 1 from this manuscript is particularly damming for everyone involved in this controversial series of papers. I'm not sure any of these papers don't look idiotic when you're looking at the methods.

Money quote:

It may seem wasteful to archive software, but numerical precision of arithmetic operations changes with new integrated circuit chips and different operating systems, functions work differently in different versions of software, and implementation of even “standard”” statistical routines differ among software packages (a widely unappreciated example of relevance to ecologists is the different sums-of- squares reported by SAS, S-Plus, and R for analysis of variance and other linear models (Venables 1998)).

Actually figuring out a way to do all this documentation is not easy. Ellison suggests some different software programs that may help with some of this: Morpho, Kepler, and Analytic Web. Unfortunately, I haven't had a chance yet to dive into what each of these does and which is more appropriate for ecologists.

Summary:

This is a well-written paper that addresses a major concern for all ecologists. Unfortunately, it doesn't go as far as the title had lead me to believe, but hey, Ellison is definitely going to get more people to read it with a title like this. Considering I am increasingly doing research on static databases, I will need to investigate the statistical and software tools that are available to me to better document what exactly it is that I'm doing.

Ellison, A. (2010). Repeatability and transparency in ecological research Ecology DOI: 10.1890/09-0032

Pages

Monday, March 8, 2010

Paper of the Day

No comments: