A Multimodal Automated Interpretability Agent

curious_cat_163 · 2024-07-24T19:42:29 1721850149

> We think MAIA augments, but does not replace, human over- sight of AI systems. MAIA still requires human supervision to catch mistakes such as confirmation bias and image generation/editing failures. Absence of evidence (from MAIA) is not evidence of absence: though MAIA’s toolkit enables causal interventions on inputs in order to evaluate system behavior, MAIA’s explanations do not provide formal verification of system performance.

For folks who are more familiar with this branch of literature, given the above, why is this a fruitful line of inquiry? Isn't this akin to stacking turtles on top of each other?

yurimo · 2024-07-25T02:43:37 1721875417

I think what authors aimed for is perhaps a proof-of-concept work where they attempt to demonstrate that you can (to a degree) automate interpretability. Mech interpretability is challenging because it does not scale well at the moment, and there is a debate about whether localized structural discoveries on toy examples actually translate to patterns in large networks. My guess if you could build an automatic explainer system this would allow you to flag problems and find issues faster, basically as some sort of meta-heuristic for further investigation

Unfortunately, that title hypes it up, and as always, once you read the paper, the results are less impressive, but that is what the state of AI research is currently, speaking as a researcher myself.

In a similar vain: https://openai.com/index/language-models-can-explain-neurons...

visarga · 2024-07-25T08:51:51 1721897511

That's basically a known fact about LLMs, they need oversight. But if they make the task 100x easier, it's still useful as a starting point. This kind of neural net analysis is difficult to do manually.

I am curious if they just start making inventories for all neurons in all layers, then they can compare models based on neuron types, or even train them to achieve the right mix of concepts.

empath75 · 2024-07-24T15:41:58 1721835718

https://arxiv.org/pdf/2404.14394

Actual paper to save you from having to read the PR release.

dang · 2024-07-24T17:12:55 1721841175

Ok, we'll change the URL to that from https://news.mit.edu/2024/mit-researchers-advance-automated-.... Users may still want to read the latter for a quick intro.

benreesman · 2024-07-25T02:05:03 1721873103

We uncritically accept extraordinary claims on this. They might even be valid claims, but they are so rarely supported by evidence that is likewise extraordinary.

In my experience real, durable progress generally starts happening once we come back down to Earth and start iterating.

Are modern large models crucial to transportation? Maybe? Waymo is cool but it’s not yet an economic reality at scale, and I doubt there are 1.75T weight models running in cars. Are they crucial to finance? I’m quite sure that machine learning plays an important role in finance because I know people in finance who do it all day for serious firms, but I’m very skeptical that finance has been revolutionized in the last 18 months (unless you count the NVDA HODL).

Can we push back a little on the breathless hyperventilation? It was annoying a year ago, the AGI people were wrong, it’s offensive now, we got played for suckers.

“As artificial intelligence models become increasingly prevalent and are integrated into diverse sectors like health care, finance, education, transportation, and entertainment, understanding how they work under the hood is critical. Interpreting the mechanisms underlying AI models enables us to audit them for safety and biases, with the potential to deepen our understanding of the science behind intelligence itself.”

ainoobler · 2024-07-25T02:13:13 1721873593

Eventually both the hype and its criticism will be automated with AI as well so that we can all go to the beach and relax.