Thursday, January 31, 2013

David Hales — Lies, Damned Lies and Big Data

Almost everything we do these days leaves some kind of data trace in some computer system somewhere. When such data is aggregated into huge databases it is called “Big Data”. It is claimed social science will be transformed by the application of computer processing and Big Data. The argument is that social science has, historically, been “theory rich” and “data poor” and now we will be able to apply the methods of “real science” to “social science” producing new validated and predictive theories which we can use to improve the world.

What’s wrong with this?...

Looking for “patterns or regularities” presupposes a definition of what a pattern is and that presupposes a hypothesis or model, i.e. a theory. Hence big data does not “get us away from theory” but rather requires theory before any project can commence.
What is the problem here? The problem is that a certain kind of approach is being propagated within the “big data” movement that claims to not be a priori committed to any theory or view of the world. The idea is that data is real and theory is not real. That theory should be induced from the data in a “scientific” way[1].
I think this is wrong and dangerous. Why? Because it is not clear or honest while appearing to be so. Any statistical test or machine learning algorithm expresses a view of what a pattern or regularity is and any data has been collected for a reason based on what is considered appropriate to measure. One algorithm will find one kind of pattern and another will find something else. One data set will evidence some patterns and not others. Selecting an appropriate test depends on what you are looking for. So the question posed by the thought experiment remains “what are you looking for, what is your question, what is your hypothesis?”
It seems to me that one must at least try to answer this question if one is to pursue social science. Not just because it is good science but also because it has ethical and political implications.
The view one takes of social phenomena, either consciously or through algorithms and data, frames what is and is not conceivable for past and future social reality. If you doubt the importance of such ideas one should look that the history of the 20th century. Ideas matter. Theory matters. Big data is not a theory-neutral way of circumventing the hard questions. In fact it brings these questions into sharp focus and it’s time we discuss them openly.
Lies, Damned Lies and Big Data
David Hales


Unknown said...
This comment has been removed by the author.
Peter Pan said...

In remote sensing, patterns allow information to be extracted from raw data. The more patterns you can uncover, through theory or from practice, the more information will be yielded.
Is it the same for the social sciences?