Tuesday, September 26, 2017

Abandon Statistical Significance — Blakeley B. McShane, David Gal, Andrew Gelman, Christian Robert, and Jennifer L. Tacket


In science publishing and many areas of research, the status quo is a lexicographic decision rule in which any result is first required to have a p-value that surpasses the 0.05 threshold and only then is consideration—often scant—given to such factors as prior and related evidence, plausibility of mechanism, study design and data quality, real world costs and benefits, novelty of finding, and other factors that vary by research domain. There have been recent proposals to change the p-value threshold, but instead we recommend abandoning the null hypothesis significance testing paradigm entirely, leaving p-values as just one of many pieces of information with no privileged role in scientific publication and decision making. We argue that this radical approach is both practical and sensible.
Uncritically adopting universal rules and criteria is a sign of lazy thinking and likely ideological thinking aka dogmatism as well.

This move would overturn the existing scientific publishing model, it is unlikely to happen without considerable opposition. This model is key in establishing reputational credibility and advancement in the profession. Players like set rules. This is especially true in formal subjects, where training focuses on producing "the right answer" based on customary application of formal methods. The downside is group think and imposition of a consensus reality.

Abandon Statistical Significance
Blakeley B. McShane, David Gal, Andrew Gelman, Christian Robert, and Jennifer L. Tackett


Unknown said...
This comment has been removed by the author.
Dan Lynch said...

Well, the author seems focused on academic journals, but much technical research is done on the job and never sees an academic journal.

A simple way to avoid the yes-or-no thinking the author refers to is to report test results as a probability rather than yes-or-no. I.e., there is an X% probability that the difference is real. That was how I was encouraged to report my results when conducting process qualifications that had to meet FDA, milspec, ISO, etc. standards.
[Yet the same statistician who encouraged me to report my experiment results as probabilities also insisted that I use a black-and-white mathematical rule to set process control limits, as I have ranted about previously. :-) ]
In many endeavors, the things that trip you up are the things that you fail to take into consideration. You try your best to design an apples-to-apples test but Mr. Murphy throws in some bananas and oranges. In the short run all the experimenter can do is point out those bananas and oranges and keep an open mind. In the longer run more experiments can be arranged to take into account those bananas and oranges.

Another big problem in technical research is cherry-picking data. Say several tests are conducted, the favorable data is included in the final report while the unfavorable data goes in the trash. Statistics cannot fix dishonesty.

[i]A critical first step forward is to begin accepting uncertainty and embracing
variation in effects[/i]

Ummm ... statistics is a way to get a handle on uncertainty. Statistics is not the enemy, the enemy is bullshit, and statistics is a defense against bullshit. When someone makes a bullshit claim, you respond "Prove it! Show me the data!"

Tom Hickey said...

Dan, I think that a crucial difference may be in accountability. When using formal methods to actually get stuff done, outcomes are critical. In academia, getting published is the desired outcome in a "publish or perish" environment, and there is no accountability if one follows the rules, which are arbitrary as the authors observe and has been pointed out elsewhere in the debate.