Thursday, August 17, 2017

Noah Smith — "Theory vs. Data" in statistics too


Important.

I think Noah has this right. Fit the tool to the job, rather than the job to the tool.

Aristotle defined speculative knowledge in terms of causal explanation. This definition stuck although Aristotle's analysis of causality did not.
In the Posterior Analytics, Aristotle places the following crucial condition on proper knowledge: we think we have knowledge of a thing only when we have grasped its cause (APost. 71 b 9–11. Cf. APost. 94 a 20). That proper knowledge is knowledge of the cause is repeated in the Physics: we think we do not have knowledge of a thing until we have grasped its why, that is to say, its cause (Phys. 194 b 17–20). Since Aristotle obviously conceives of a causal investigation as the search for an answer to the question “why?”, and a why-question is a request for an explanation, it can be useful to think of a cause as a certain type of explanation. (My hesitation is ultimately due to the fact that not all why-questions are requests for an explanation that identifies a cause, let alone a cause in the particular sense envisioned by Aristotle.) — Stanford Encyclopedia of Philosophy
There is a distinction between reasons and causes. Some types of explanation seek only reasons, while other seek causes. Causation subsequently came to be viewed in terms of articulating mechanisms or lines of transmission (models) that are substantiated in evidence.

Explanation by reasons is different since the strict criterion of articulating mechanisms or lines of transmission that can be checked against evidence is not required.

Explanation by reasons rather than strictly by establishing causation is based on the principle of sufficient reason, which is usually credited to Spinoza and Leibnitz.

In philosophical logic, two negative criteria are foundational. Valid reasoning is vitiated by 1) arguing in a circle and 2) infinite regress.

Without recourse to checking against evidence there is no stopping point in assigning causes other than stipulation, e.g. of a first cause.

However, there may be a reason for a stopping point that doesn't involve causality based on evidence from observation or only stipulation, for example, principles that are "self-evident" based on intuition such as Aristotle's conception of intellectual intuition, or Kant's synthetic a priori propositions as articulated in the Critique of Pure Reason

On the other hand, Hume argued that causality is merely over-interpretation of constant correlation, there being no knowledge of the world other than that based on sense data. There is no observable causal link.

Cutting to the chase, scientific explanation based on causality is grounded in models that articulate causal mechanisms or lines of transmission that show how things change invariantly, which is the basis for deterministic functions. Where this is not possible, then there are two other avenues. The first is explanation by giving reasons, which is the domain of speculative philosophy. The second is employing statistics to explore patters of correlation. The question then is to what degree causal models can be gained from statistical methods, or whether it is possible at all. 

This is the issue that Noah Smith's post is getting at.

Noahpinion
"Theory vs. Data" in statistics too
Noah Smith | Bloomberg View columnist

4 comments:

MRW said...

Tom, causation in science is physics, not philosophy. You can’t state causation without the physics.

I once asked a famous solar astrophysicist why he was not willing to say, or attribute, the sun to the increased warming on the earth. Why he still would only call it correlation after over 400 years of actual measurements that substantiated it.

His response was, “Because we don’t understand the physics.”

Tom Hickey said...

Tom, causation in science is physics, not philosophy. You can’t state causation without the physics.

That what I said. Natural science (physics, chemistry) is the only branch of science that is able to construct theory based on causal mechanisms (deterministic functions) that can be checked against evidence.

Even in natural science complete casual explanation (theory of everything) is still unachievable, and some issues must be treated stochastically and others based on reasoning alone.

In other instance causality is merely imputed, and history shows that getting the direction of causality wrong is not rare in disciplines like economics.

The no-model approach to stat is about admitting that it's hard explaining invariance discovered through statistical methods (correlation doesn't imply causation) and trying to wring causality out of this is fraught with obstacles and dangers of self-delusion. The reason for the observed invariance is unknown and serves as material for theoretical speculation and hopefully the eventual construction of a causal model that can be tested against evidence.

NeilW said...

Noah's problem is that he believes the data is pure.

It isn't. It is tainted by the policy environment that created it.

Natural science can get pure data against which they can measure their hypotheses. The social science have to get through a fog of ideology and belief.

Tom Hickey said...

Noah's problem is that he believes the data is pure.

This is also an acknowledged issue in natural science. In the first place, observation is theory-laden. Secondly, at the quantum level, observation (subjectivity) has an effect on the observed (objectivity).

Science is based on the assumption that the objective as the given can be separated from the subjective as relative inter-subjectively and changeable intra-subjectively. But observers are unable to stand outside themselves.

Data is given but information is the result of gathering data, which involves selection, and processing data, which involves imposing methodological choices.

Moreover, data in natural science is given "naturally" in a different way and form than in it is in social science, inducing economics, since the data is not given completely naturally but is partially constructed through behavior that involves interpretation, values, preferences and choices.