Note from Executive Editor: The following post is written by *Journal of Ecology* Associate Editor Caroline Brophy in response to an announcement in a psychology journal regarding the use of p-values. The* Journal of Ecology* will continue to judge the appropriateness of the statistics in submitted manuscripts on a case-by-case basis.

When I saw that a journal was banning p-values and hypothesis testing I felt a momentary fear that my career as a frequentist statistician might be nearing an end. I then paused and reflected on what was really going on here. Are there problems with p-values and confidence intervals in the context of hypothesis testing? Yes, there can be, however, these problems often stem from misguided usage. So please hold off throwing all the statistics books you have lying around your office on the bonfire, for the moment at least.

The journal in question is Basic and Applied Social Psychology and they have banned the use of null hypothesis significance testing procedure (NHSTP, see http://www.tandfonline.com/doi/full/10.1080/01973533.2015.1012991#abstract). Since this is a Psychology journal are there any implications for Ecologists?

Ecologists regularly perform experiments and the statistical analyses they perform will help to tell the story of their data. If a person who knows little about statistics and inferential theory chooses a statistical test for their data arbitrarily, finds a significant p-value and uses this p-value as support for what they were trying to prove (even if a different hypothesis altogether was actually tested) then their conclusions are not likely to be valid. However, if the person understands their data, carries out a preliminary screening of their data through graphical or other summary methods, understands the test they choose to apply (i.e. knows it is appropriate for their data, has validated its assumptions and is aware of its limitations) and presents the results in graphical or tabular form to illustrate the story of the data, then the p-value is a useful tool to quantify the probability of getting a test statistic as extreme or more extreme than what was observed, given the null hypothesis. Are there problems with this? Generally not!

Going back to the question: what are the implications for Ecologists? In summary, if you have good understanding of the statistical tools you are using then there are no implications because you already know that a p-value is just part of a data analysis package, not the be all and end all. If however, you know that you want p<0.05 but not why or what that means, or whether or not your test is appropriate for your data and your hypothesis, then perhaps you should consider a self-imposed ban from using p-values! At least until you have signed up for some statistical courses to improve your basic understanding.

This is of course quite a simplistic view and there are many well documented deeper discussions on the usage of p-values and other outputs of hypothesis testing (see for example the p-value and model selection forum in Ecology at http://www.esajournals.org/toc/ecol/95/3) and your opinion on using (or not using) NHSTP may also be related to your personal statistical philosophy (e.g. Bayesian or frequentist). One thing is for sure, this recent ban has generated a lot of discussion (see for example http://andrewgelman.com/2015/02/26/psych-journal-bans-significance-tests-stat-blogger-inundated-with-emails/) and perhaps this was one of the goals of the journal?! Are ecological journals likely to follow suit? Personally I see the move by BASP as a bad one and think ecological journals are unlikely to make similar bans but rather will continue to respect their contributors’ judgement of their own statistical capabilities.

Caroline Brophy

Associate Editor, *Journal of Ecology*

Maynooth University

caroline.brophy@nuim.ie

@carolinebrophy5

The main reason for the apparent high false discovery rates in psychology doesn’t seem to be the use of inappropriate models or false interpretations of p-values. Rather, the problem is that people try out various things during the data analysis without correcting for multiple testing, which renders p-values close to meaningless. A great demonstration is of these ‘Researcher Degrees of Freedom’ is

Simmons, J. P.; Nelson, L. D. & Simonsohn, U. (2011) False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant. Psychol Sci., 22, 1359-1366.

I was missing this aspect in the above discussion. Specifically, the advise to carry out a

“… preliminary screening of their data through graphical or other summary methods …”

is a good one if the hypothesis was fixed before. However, if such a screening is used to

* Decide which comparisons to make or which groups to test

* Which predictors to include in the model

etc., p-values will not deliver what they promise, i.e. controlled type I error rates. Hence, for any post-hoc / exploratory analyses where people play around with what they want to show, I can see good arguments for not reporting them.

Thank you for your response, I didn’t specifically mention your point but I believe it was indirectly included under my reference to people needing to understand their statistical tools.

As for my comment “… preliminary screening of their data through graphical or other summary methods…”, I certainly did not intend this to advocate the use of preliminary screening as a guide to the choice of hypothesis to be tested! I firmly believe that preliminary screening is an important step in the data analysis process to check the validity of the data, help identify outlier values and to help understand the distributions of the data variables collected which can aid decisions about what tests are the most appropriate to use for testing the a priori hypotheses.

Well said, Caroline! Sure, we can use P-values wrong. We can use any tool wrong; that doesn’t mean we should empty the toolbox. I wrote about this, by the way, at http://bit.ly/1M6ky1M (posted before the BASP brouhaha).

Just FTR, I cross-referenced you post over at Bob O’Hara’s blog who commented on “The Decision” from a statisticians POV, eloquently.

Well, it is if you want to publish in BASP. They basically said that frequentist stats is a load of rubbish. And tehy’re not too fond of Bayesian methods either, so converting to the One True Way won’t help either.

Few people know, but it ‘s also possible to calculate Bayesian p-values, so the problem isn’t in statistical philosophies, it’s in the way they’ve been develope and used.

Pingback: Weekly links round-up: 21/03/2015 | BES Quantitative Ecology Blog

I started my research career in 1973-74.

I didn’t have a simple calculator until 1976-77.

I used PC in 1995. My observations, analysis (brainstorm) and inferences were simple. Everybody could understand and even translate to action in the field. Now I feel ‘honest biology’ is getting overshadowed and analysis with hidden inferences less meaningful or intelligible for an old timer like me. Anyway, if modern science wants to be that way, my days are numbered in any case.

Pingback: Happy International Women’s Day | Journal of Ecology blog