I have been working with LLMs and VLMs to automate browser based workflows among other things for the last couple of years. Given how good the vision models have gotten lately, the perception problem is solved to level where it opens up a lot of possibilities. Manipulation is not generally solved yet but there is a lot of activity in the field and there are promising approaches to solve (OpenVLA, π0). Given these, I'm trying to build an affordable robot that can help around with household chores using language and vision models. Idea is to ship capable enough hardware that can do a few things really well with the currently available models and keep upgrading the AI stack as manipulation models get better over time.
I have been working with LLMs and VLMs to automate browser based workflows among other things for the last couple of years. Given how good the vision models have gotten lately, the perception problem is solved to level where it opens up a lot of possibilities. Manipulation is not generally solved yet but there is a lot of activity in the field and there are promising approaches to solve (OpenVLA, π0). Given these, I'm trying to build an affordable robot that can help around with household chores using language and vision models. Idea is to ship capable enough hardware that can do a few things really well with the currently available models and keep upgrading the AI stack as manipulation models get better over time.