April 12, 2018

A Contemporary Conceptual Framework for Initial Data Analysis

By Marianne Huebner, Saskia le Cessie, Carsten Schmidt and Werner Vach


Initial data analyses (IDA) are often performed as part of studies with primary-data collection, where data are obtained to address a prede fined set of research questions, and with a clear plan of the intended statistical analyses. An informal or unstructured approach may have a large and non-transparent impact on results and conclusions presented in publications.
Key principles for IDA are to avoid analyses that are part of the research question, and full documentation and transparency.

We develop a framework for IDA from the perspective of a study with primary-data collection and de fine and discuss six steps of IDA: (1) Metadata setup to properly conduct all following IDA steps, (2) Data cleaning to identify and correct data errors, (3) Data screening that consists of understanding the properties of the data, (4) Initial data reporting that informs all potential collaborators working with the data about insights, (5) Re fining and updating the analysis plan to incorporate the relevant findings, (6) Reporting of IDA in research papers to document steps that impact the interpretation of results. We describe basic principles to be applied in each step and illustrate them by example.

Initial data analysis needs to be recognized as an important part and independent element of the research process. Lack of resources or organizational barriers can be obstacles to IDA. Further methodological developments are needed for IDA dealing with multi-purpose studies or increasingly complex data sets.