Towards a Taxonomy of the Model-Ladenness of Data

This abstract has open access
Abstract Summary

Alisa Bokulich (Boston University)  Paul Edwards's (1999, 2010) notion of model-data symbiosis has two components: on the one hand models are data-laden, in that large amounts of data often go into the construction and calibration of theoretical models. On the other hand, data are also model-laden, or as Edwards puts it, 'model filtered.' While the data-ladenness of models is familiar and relatively uncontroversial, the model-ladenness of data has received far less philosophical attention and is potentially far more controversial, insofar as "model-tampered" data is often assumed to be "corrupted" data. However, Edwards's choice of the term "symbiosis" suggests instead that model-filtered data is in fact beneficial for science. This view is also defended by Stephen Norton and Frederick Suppe (2001), who write, “To be properly interpreted and deployed, data must be modeled” (p. 70). Apart from this important preliminary work, however, the issue of the model-ladenness of data has remained undertheorized in the philosophy of science.  Before one can begin to assess the epistemological implications of the model-ladenness of data—and determine, for example, when it is beneficial or problematic—one must first have a clearer picture of exactly where and how models are entering in to the construction and correction of data products, and hence the various forms that this model-ladenness of data can take. The aim of this talk is thus to begin to develop such a taxonomy of the different ways in which data can be model-laden, and elucidate what role(s) the models are playing in each of these contexts. In this talk I shall identify several different ways in which data are "model-filtered" and briefly illustrate each with examples drawn from across the geosciences. These will include the following: Data conversion: models are used to convert the data from one quantity or type (e.g., changes in electric current) used as a proxy, to the data type of interest (e.g., amount of vibration in seismology). The path from proxy to quantity of interest can be either straightforward or quite complicated, depending on the nature of the data conversion. Data correction: models are used to "vicariously" (Norton and Suppe 2001) remove unwanted elements or "noise" from the data that were not physically shielded during data collection (e.g., in geophysical gravity measurements, one needs to model and subtract drift, tidal, latitude, free-air, Bouger, terrain, Eötvös, and isostic factors to obtain relevant data signal). Data interpolation: in the geosciences data are often sparse, hence models are used to "fill-in" missing data points. Data scaling: observations often are not (or cannot) made at the spatial or temporal scale required for relevant theoretical purposes; hence the data must be upscaled or downscaled before the data can be used. Additionally models are used for data integration, data assimilation, and even for producing a substitute or benchmark for data in the case of synthetic data. Such a taxonomy, even if preliminary, provides an important foundation for further research into the epistemological implications of the model-ladeness of data

Abstract ID :
NKDR61366
Abstract Topics
Boston University
184 visits