Hacker News new | past | comments | ask | show | jobs | submit login
A Multimodal Automated Interpretability Agent (arxiv.org)
83 points by el_duderino 10 months ago | hide | past | favorite | 7 comments



> We think MAIA augments, but does not replace, human over- sight of AI systems. MAIA still requires human supervision to catch mistakes such as confirmation bias and image generation/editing failures. Absence of evidence (from MAIA) is not evidence of absence: though MAIA’s toolkit enables causal interventions on inputs in order to evaluate system behavior, MAIA’s explanations do not provide formal verification of system performance.

For folks who are more familiar with this branch of literature, given the above, why is this a fruitful line of inquiry? Isn't this akin to stacking turtles on top of each other?


I think what authors aimed for is perhaps a proof-of-concept work where they attempt to demonstrate that you can (to a degree) automate interpretability. Mech interpretability is challenging because it does not scale well at the moment, and there is a debate about whether localized structural discoveries on toy examples actually translate to patterns in large networks. My guess if you could build an automatic explainer system this would allow you to flag problems and find issues faster, basically as some sort of meta-heuristic for further investigation

Unfortunately, that title hypes it up, and as always, once you read the paper, the results are less impressive, but that is what the state of AI research is currently, speaking as a researcher myself.

In a similar vain: https://openai.com/index/language-models-can-explain-neurons...


That's basically a known fact about LLMs, they need oversight. But if they make the task 100x easier, it's still useful as a starting point. This kind of neural net analysis is difficult to do manually.

I am curious if they just start making inventories for all neurons in all layers, then they can compare models based on neuron types, or even train them to achieve the right mix of concepts.


https://arxiv.org/pdf/2404.14394

Actual paper to save you from having to read the PR release.


Ok, we'll change the URL to that from https://news.mit.edu/2024/mit-researchers-advance-automated-.... Users may still want to read the latter for a quick intro.


We uncritically accept extraordinary claims on this. They might even be valid claims, but they are so rarely supported by evidence that is likewise extraordinary.

In my experience real, durable progress generally starts happening once we come back down to Earth and start iterating.

Are modern large models crucial to transportation? Maybe? Waymo is cool but it’s not yet an economic reality at scale, and I doubt there are 1.75T weight models running in cars. Are they crucial to finance? I’m quite sure that machine learning plays an important role in finance because I know people in finance who do it all day for serious firms, but I’m very skeptical that finance has been revolutionized in the last 18 months (unless you count the NVDA HODL).

Can we push back a little on the breathless hyperventilation? It was annoying a year ago, the AGI people were wrong, it’s offensive now, we got played for suckers.

“As artificial intelligence models become increasingly prevalent and are integrated into diverse sectors like health care, finance, education, transportation, and entertainment, understanding how they work under the hood is critical. Interpreting the mechanisms underlying AI models enables us to audit them for safety and biases, with the potential to deepen our understanding of the science behind intelligence itself.”


Eventually both the hype and its criticism will be automated with AI as well so that we can all go to the beach and relax.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: