State-of-the-art machine learning (ML) models based on deep neural networks (DNNs) exhibit predictive accuracy that far outstrips that of previous hand-coded models. This success has generated widespread optimism at the prospect of using DNNs to enhance the march of scientific progress in a diverse range of fields. By using these models to explore massive collections of data, scientists might be able to produce novel discoveries with DNNs. However, DNNs are opaque in ways that preclude scientific understanding. Thus, their predictive accuracy comes at the cost of one of the central aims of scientific inquiry. This tension has brought about increased interest in explainable AI (XAI), a growing discipline that aims to understand and explain how these models work. However, I argue that XAI as it is now conceived cannot deliver on the promise of understanding in the scientific contexts in which DNNs purport to show the greatest promise, namely in exploratory contexts. Traditional mathematical and computational models typically depend on parameters whose values represent the properties of their target system. By contrast, DNNs lack any such interpretable mapping between parameters and properties of real systems. Instead, these models depend on hyperparameters that determine how a model will learn to generate input/output mappings in a way that optimizes some measure of performance. Currently, XAI aims to explain how these models optimize performance and make decisions. Yet, in exploratory contexts where scientists hope to produce discoveries, it will be the very content of a model’s output that requires understanding. In such contexts, scientists will deploy unsupervised models, which generate clustering patterns from unlabeled data. Given that the hope is to explore domains of inquiry that are not already well understood, we will lack the interpretative machinery to decide if the clusters of data produced by such a model correspond to real, scientific kinds rather than jerry-rigged ones tied to spurious correlations in big data. In exploratory contexts we lack the conceptual frameworks necessary to say what the output clusters of unsupervised learning models mean. I call this phenomenon semantic opacity. When confronted with semantic opacity, the knowledge required for interpreting the decisions of a model depends on theoretical assumptions about the very domain of inquiry about which we had hoped the model could teach us. I argue that this should be the understood as the more pressing problem facing XAI in the context of science. I go on to suggest that to address semantic opacity we must develop idealized models of data-driven measurement practices that relate the complex interactions between ML models, the data on which they are trained and tested, the high-throughput instruments used to collect that data, and the actual phenomena of interest. The possession of such a model would enable scientists to better estimate error and calibrate the outputs of ML models. This also suggests that XAI researchers should direct more resources towards refining methods uncertainty quantification (UQ) as a means of developing an idealized model of ML-based measurement practices.