Hacker News new | past | comments | ask | show | jobs | submit login

I love the idea of the product! I would trust your solution to be the best for very simple use cases but not for multistep or ReAct agents. Any thoughts / insights on that?

I think the demo could be more exciting, the voice of the person talking sounds like he's bored haha




Ha - here's the advice I give to YC startups about making demo videos for HN:

"What works well for HN is raw and direct, with zero production values. Skip any introductions and jump straight into showing your product doing what it does best. Voiceover is good, but no marketing slickness—no fancy logos or background music!"

I guess there's zero production values and zero production values...


Totally agree. Raw is great, but energy matters too. If the person sounds bored, it's hard to get excited about the product—even if it's amazing. Passion is contagious.


That's true, thanks for the feedback! In the end, it wasn't boredom, but the long work - put too much energy into the platform ;) Taking it to heart for the next one!


Well... we took the rawness to heart, that's clear!


Which was exactly correct!


Yes, great point. We are currently working on multistep RL. The big problem with the trivial approach (give a single reward to the entire (ReAct) trajectory) is that the model receives a weak learning signal per decision (called credit assignment problem in literature), i.e. the individual decisions are not properly taken into account, which will then make the training unstable. I guess this has been an unsolved problem for a long time; however was not really looked at since generalist “planning” agents were not a big thing in RL until o1/DeepSeek.

IMO, the most promising approach to this is something along the lines of MA-RLHF (https://arxiv.org/abs/2410.02743) but adapted to the real world, i.e., spitting up the reward model to grade individual actions inside the trajectory to reduce the “attention distance” between the reward and the decision.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: