Improving Interpretability. Philosophy of Science Perspectives on Machine Learning

These days, machine learning (ML) is all the rage, in science and beyond. ML is used to pre-process job applications, to automatically recognize faces in images or videos, and to classify astronomical objects – to name just a few examples. However, the ubiquitous use of ML also raises questions and concerns. One of the main problems is interpretability: We lack a theoretical understanding of ML models and, in particular of so-called deep neural networks (DNNs). For example, why are DNNs so successful in application? What, exactly, is it that they learn? What is the scope and limit of their success? Since we do not have answers to these questions, we do not understand how DNNs achieve their tasks – they remain black boxes, as it were. This is particularly a problem for science, which is supposed to explain and to help us understand phenomena in the world. Science cannot achieve these goals if its tools are black boxes. The problem has also been acknowledged in the political sphere: The EU’s “General Data Protection Regulation” postulates a “right to explanation” for automated decision-making. But what does this right amount to? And how can it be granted to citizens?

In the present project, we will address the problem of interpretability from a philosophical perspective. Our main working hypothesis is that philosophical work on explanation and understanding can help us to make sense of interpretability and to assess approaches that promise a better understanding of ML models. We will thus draw on insights from philosophical research and transfer them to recent discussions about interpretability of ML. One key philosophical finding is that explanation comes in many varieties and defies a straightforward analysis in terms of necessary and sufficient conditions. This suggests that interpretability is not one thing, but rather comes in many different flavors. Our main aim thus is to establish a conceptual framework for thinking about interpretability. The framework will help researchers to avoid confusion, to become clear about the expectations that loom behind the right to explanation and to classify and evaluate existing research programs that propose a better understanding of ML. The conceptual framework will be established by combining a top-down approach with a bottom-up strategy. As far as the top-down direction is concerned, we will draw on distinctions from the philosophy of explanation. As far as the bottom-up direction is concerned, our work will be informed by recent work on ML from computer science. In several case studies, e.g. about Statistical Learning Theory, the so-called Information Bottleneck method and work on causal descriptions, we will try to fit these approaches into our framework. This will not only help us to test our framework and to improve it, but also lead to a better understanding of the approaches and their prospects. The framework will in particular distinguish between explanatory vs. non-explanatory questions about ML, between different levels of generality, and between the subjective vs. objective components of interpretability. In a last step, we’ll bring this work to bear on the ethics of algorithms.

Of course, as philosophers, we cannot come up with new explanations of how ML works. But we can get clear about what people are after when they urge for interpretable ML, we can better understand how current approaches to interpretable software hang together, and we can trace the consequences that ML models have for our philosophical view of scientific research in the 21st century. Our aim is to do just this.



This project is funded by the Swiss National Science Foundation.