MIT's MAIA automates AI model interpretability, generating hypotheses and conducting experiments to enhance understanding and address biases.
Connect with technology leaders today!
As artificial intelligence (AI) models become increasingly prevalent and integrated into diverse sectors like healthcare, finance, education, transportation, and entertainment, understanding how they work under the hood is critical. Interpreting the mechanisms underlying AI models enables us to audit them for safety and biases, with the potential to deepen our understanding of the science behind intelligence itself.
Imagine if we could directly investigate the human brain by manipulating each of its individual neurons to examine their roles in perceiving a particular object. While such an experiment would be prohibitively invasive in the human brain, it is more feasible in another type of neural network: one that is artificial. However, somewhat similar to the human brain, artificial models containing millions of neurons are too large and complex to study by hand, making interpretability at scale a very challenging task.
To address this, MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) researchers decided to take an automated approach to interpreting artificial vision models that evaluate different properties of images. They developed “MAIA” (Multimodal Automated Interpretability Agent), a system that automates a variety of neural network interpretability tasks using a vision-language model backbone equipped with tools for experimenting on other AI systems.
“Our goal is to create an AI researcher that can conduct interpretability experiments autonomously. Existing automated interpretability methods merely label or visualize data in a one-shot process. On the other hand, MAIA can generate hypotheses, design experiments to test them, and refine its understanding through iterative analysis,” says Tamar Rott Shaham, an MIT electrical engineering and computer science (EECS) postdoc at CSAIL and co-author on a new paper about the research. “By combining a pre-trained vision-language model with a library of interpretability tools, our multimodal method can respond to user queries by composing and running targeted experiments on specific models, continuously refining its approach until it can provide a comprehensive answer.”
The automated agent is demonstrated to tackle three key tasks: it labels individual components inside vision models and describes the visual concepts that activate them, it cleans up image classifiers by removing irrelevant features to make them more robust to new situations, and it hunts for hidden biases in AI systems to help uncover potential fairness issues in their outputs. “But a key advantage of a system like MAIA is its flexibility,” says Sarah Schwettmann PhD ’21, a research scientist at CSAIL and co-lead of the research. “We demonstrated MAIA’s usefulness on a few specific tasks, but given that the system is built from a foundation model with broad reasoning capabilities, it can answer many different types of interpretability queries from users, and design experiments on the fly to investigate them.”
In one example task, a human user asks MAIA to describe the concepts that a particular neuron inside a vision model is responsible for detecting. To investigate this question, MAIA first uses a tool that retrieves “dataset exemplars” from the ImageNet dataset, which maximally activate the neuron. For this example neuron, those images show people in formal attire, and closeups of their chins and necks. MAIA makes various hypotheses for what drives the neuron’s activity: facial expressions, chins, or neckties. MAIA then uses its tools to design experiments to test each hypothesis individually by generating and editing synthetic images — in one experiment, adding a bow tie to an image of a human face increases the neuron’s response. “This approach allows us to determine the specific cause of the neuron’s activity, much like a real scientific experiment,” says Rott Shaham.
MAIA’s explanations of neuron behaviors are evaluated in two key ways: synthetic systems with known ground-truth behaviors are used to assess the accuracy of MAIA’s interpretations, and for “real” neurons inside trained AI systems with no ground-truth descriptions, the authors design a new automated evaluation protocol that measures how well MAIA’s descriptions predict neuron behavior on unseen data.
MAIA presents a promising solution to the challenge of understanding neural models by automating interpretability tasks. MAIA streamlines the process of understanding model behavior by combining a pre-trained vision-language model with a set of interpretability tools. While human supervision is still necessary to avoid common pitfalls and maximize effectiveness, MAIA’s framework demonstrates high potential utility in the interpretability workflow, offering a flexible and adaptable approach to understanding complex neural systems. Overall, MAIA significantly helps in bridging the gap between human interpretability and automated techniques in model understanding and analysis.
For more insights into the world of AI, explore our offerings in AI Solutions, AI Consulting, and AI Software Development. Stay updated with the latest in AI Research and Machine Learning advancements through our Rapid Innovation Blogs.
For those interested in the intersection of AI Ethics and technology, our resources on AI Ethics and Automated AI are invaluable. Explore how Blockchain and Cryptocurrency are shaping the future of technology, including Smart Contracts and Decentralized Applications.
Join us in exploring the future of Deep Learning and Neural Networks as we continue to innovate and push the boundaries of what is possible with AI Applications.