Wednesday, September 27, 2017

Lars P. Syll — Time to abandon statistical significance

As shown over and over again when significance tests are applied, people have a tendency to read ‘not disconfirmed’ as ‘probably confirmed.’ Standard scientific methodology tells us that when there is only say a 10 % probability that pure sampling error could account for the observed difference between the data and the null hypothesis, it would be more ‘reasonable’ to conclude that we have a case of disconfirmation. Especially if we perform many independent tests of our hypothesis and they all give about the same 10 % result as our reported one, I guess most researchers would count the hypothesis as even more disconfirmed.
We should never forget that the underlying parameters we use when performing significance tests are model constructions. Our p-values mean nothing if the model is wrong. And most importantly — statistical significance tests DO NOT validate models!
Lars P. Syll’s Blog
Time to abandon statistical significance
Lars P. Syll | Professor, Malmo University


Dan Lynch said...

Lars does not put forth a viable alternative. Should we ignore data and instead rely on bullshit?

All studies are flawed in some way, but that does not mean they have no value.

For example, there are flaws and uncertainties in many climate change studies, but that does not mean that we should dismiss those studies. Instead, we should ask questions and do more studies to answer those questions. In the meantime, we should act on the best available data, flawed though it may be.

I am constantly conducting experiments, collecting data, and trying to make sense of the data, to arrive at an understanding of how certain systems actually work. I am well aware of the limitations of data analysis, but the problem is, there is nothing better out there.

Tom Hickey said...

Lars has consistently been arguing for more realism and less reliance on formalism. This is especially evident in economics.

In addition to all the points that Dan makes, it is also necessary to critique models for realism. e.g., of assumptions.

Science has two functions. One is epistemological and the other is pragmatic. The epistemological function of science is to provide causal explanations that can be subjected to testing. The pragmatic function is making predictions that are useful other than in testing hypothesis as a way of assessing the explanatory power of a causal explanation embedded in a model.

Some (Milton Friedman) hold that the pragmatic function is primary or even exclusive. and if the model is useful in making predictions that are correct, the explanation is more or less irrelevant, so the realism of the assumptions is not an important factor let alone a crucial one.

Opponents argue that a causal explanation grounds a scientific model and provides rational justification for accepting the results as dependable. Otherwise, correlation is not causation and one may been the position of the turkey that expects food as usual on Thanksgiving morning.

Science if based on causal models subject to testing against evidence. Otherwise, it's philosophy as rational inquiry independent of taxing against evidence, or storytelling — or handwaving.

Dan Lynch said...

What Lars and McShane seem to be hinting at, but don't actually say, is that rigid dogmatic thinking about data analysis is a bad thing. If they would phrase it that way, then I would agree with them.

I don't have any experience with academic publishing, but I have plenty of experience with rigid dogmatic thinking.

For example, a CEO I once worked for insisted on a certain ROI (return on investment) and/or a certain payback period for all capital expenditures. While I am not opposed to economic analysis of business decisions, this CEO was very rigid and dogmatic in his thinking. For example, the plant's chemical storage was not code compliant and I was tasked with upgrading it to comply with code. Well, the CEO insisted that the chemical storage upgrade pay for itself in a year just like any other capital project. It made no sense because we were not doing it for economic reasons, we were doing it because the law said we had to do it.

Rigid, dogmatic thinking seems to be linked to authoritarian personalities and those types of personalities gravitate toward management positions, so you often end up working under them and having to deal with their baloney.

NeilW said...

"Should we ignore data and instead rely on bullshit?"

How do you know the data isn't bullshit?

Economic data is collected in the context of the policy environment in which it was created. It is by definition tainted by association. So then using it to 'prove' something is really just a back door way of re-enforcing the policy environment in which it was created.

It's entirely different to hard science where a Boron atom will behave exactly the same today as it did yesterday regardless what equipment you wrap it in.

Similarly with climate change.

The most useful analogy to economics is in dietary science, where the entire profession was hoodwinked into naming fat as the number one enemy for something like fifty years until the "that doesn't stack up" evidence became overwhelming and the house of cards collapsed.

Now sugar is the #1 enemy and the cycle repeats.

Dan Lynch said...

"How do you know the data isn't bullshit?"

The technical answer is by having lots of people collect lots of data and using statistics to analyze the data for significance.

But ... statistics cannot compensate for dishonest data. It's not that data is actually invented out of thin air, but it is often cherry picked. Favorable data is included while unfavorable data is ignored. I witnessed this first hand in the semiconductor industry, and we read stories about it happening in the nuclear industry and the pharmaceutical industry.

No young person goes off to college planning to grow up to be an unethical scientist, but once they are on the job they are under pressure to produce certain outcomes.

All real life data that I deal with is flawed and that includes basic science. There is error and uncertainty in measurements, and other variables are present that aren't accounted for. You compensate by collecting lots of data and repeating the experiment. Bonus points if other people repeat the experiment. The more eyes looking at the issue, the better. The gold standard is for your peers all over the world to repeat the experiment, preferably using different methods, and arrive and the same conclusion. Only then is it accepted as true. No debate has ever been settled by a statistical test on a single study.

In your dietary examples, my guess is that the problem was not enough data and not enough data analysis.

My takeaway from the Gelman and Syll is they are barking up the wrong tree. Apparently there is a real problem with rigid, dogmatic thinking in academic publishing (though I wonder if the root cause is monopolistic control of academic media?) but instead of attacking monopolistic media and rigid, dogmatic thinking, they attack statistics. Syll even calls for statistical testing to be banned -- that would be replacing one form of rigid, dogmatic thinking with another form of rigid dogmatic thinking.

Statistical testing is not the enemy. It's a tool. No one tool is appropriate for every occasion. Judgement is required, and peer review.