Finding correlations in complex datasets

It is now almost three years since I moved to Boston to start working at Fathom Information Design and the Sabeti Lab at Harvard. As I noted back then, one of the goals of this work was to create new tools for exploring complex datasets -mainly of epidemiological and health data- which could potentially contain up to thousands of different variables. After a process that went from researching visual metaphors suitable to explore these kind of datasets interactively, learning statistical techniques that can be used to quantify general correlations (not necessarily linear or between numerical quantities), and going over several iterations of internal prototypes, we finally released the 1.0 version of a tool called “Mirador” (spanish word for lookout), which attempts to bridge the space between raw data and statistical modeling. Please jump to the Mirador’s homepage to access the software and its user manual, and continue reading below for some more details about the development and design process.
