The ARC Prize Foundation ran extensive ablations on HRM for their slew of reasoning tasks and noted that the "hierarchical" part of their architecture is not much more impactful than a vanilla transformer of the same size with no extra hyperparameter tuning:
https://arcprize.org/blog/hrm-analysis#analyzing-hrms-contri...